Skip to content

DekarLab

Blog about big data processing and data-driven investments

  • DATA PROCESSING
  • INVESTMENTS
  • MONEY BUILDER
  • INDEX TRACKING (ETF)
  • PDI/R PLUGIN
  • BOOKS
  • MISC
  • Impressum
DekarLab

Tag: data lake

Follow the source: usage of CQRS pattern in data lake

16. April 2020 karden DATA PROCESSING

There is a pattern in microservices architecture: Command and Query Responsibility Segregation (CQRS). This pattern helps to design multi-purpose data lake.

Read more
Post Views: 938

Meta data service and schema registry in data lake

13. June 2019 karden DATA PROCESSING

Maintaining data description is useful feature. There are some ideas, how to implement this.

Read more
Post Views: 1,048

ORM (object-relational mapping) analog for data in data lake

13. June 2019 karden DATA PROCESSING

We start saving data in HDFS using avro format. In previous post we have discussed about forward and backward compatibility of avro schemas. How to use this concept?

Read more
Post Views: 454

Why contracts are important in data intensive applications with microservices

12. October 2018 karden DATA PROCESSING

Main purpose of using microservices architecture is to increase velocity of development and reduce system complexity.
Read more


Post Views:
382

Hybrid cloud architecture for data lake applications

22. May 2018 karden DATA PROCESSING

Big data technologies nowadays are very mature. Typically you use HDFS, or another distributed file systems, like S3, for storing data, Spark as a processor engine, and YARN as a resource manager. Next steps, wich you probably would like to achieve, are implement CI/CD (continuous integration and delivery) and move workload on demand in cloud.
Read more


Post Views:
497

Book notes – Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

23. October 2017 karden BOOKS, DATA PROCESSING

You can buy this book from amazon.de.
Read more


Post Views:
560

Why it is a bad idea to stream data back from HDFS into Kafka

2. October 2017 karden DATA PROCESSING

I think, an idea to stream data back from HDFS into some streaming component, like Kafka, is coming from concept of Enterprise Service Bus (ESB). But after some thoughts, I have come to conclusion, that this concept is not useful in Big Data world.
Read more


Post Views:
361

Schema evolution and backward and forward compatibility for data in data lakes

10. September 2017 karden DATA PROCESSING

We have discussed before the format for clean and derived data in data lakes. One of the popular formats for this goal is an avro format. We will talk here why it is needed and how to achieve backward and forward compatibility by designing avro schemas.
Read more


Post Views:
6,199

HBase is next step in your big data technology stack

10. September 2017 karden DATA PROCESSING

Read more


Post Views:
417

Raw, clean, and derived data in data lakes based on HDFS

3. June 2017 karden DATA PROCESSING

You may think, that there is no need to structure data in HDFS. You can systemize it in the future. But I think this is a wrong way. We should always keep in mind: there is no free lunch. Therefore it is better to make desicions at the beginning.

Read more


Post Views:
914

Posts navigation

1 2 Next Posts»

Tags

all (50) article (3) auth (3) book (5) code on github (5) data lake (12) design (18) gui xmdm (1) hadoop (6) hbase (1) hive (1) index tracking (4) informatica (1) k8s (5) kafka (1) kylin (3) microservices (6) mondrian (1) money builder (5) OLAP in hadoop (3) pentaho (3) phd thesis (1) spark (4) zeppelin (2)
WordPress Theme: Poseidon by ThemeZee.