design – Page 2

Book notes – Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

23. October 2017 karden BOOKS, DATA PROCESSING

You can buy this book from amazon.de.
Read more

Post Views: 760

Why it is a bad idea to stream data back from HDFS into Kafka

2. October 2017 karden DATA PROCESSING

I think, an idea to stream data back from HDFS into some streaming component, like Kafka, is coming from concept of Enterprise Service Bus (ESB). But after some thoughts, I have come to conclusion, that this concept is not useful in Big Data world.
Read more

Post Views: 572

Microservices vs service oriented architecture (SOA) and how containers change the rules of the game

19. September 2017 karden DATA PROCESSING

Microservices approach gains recently popularity. Some time ago service oriented architecture (SOA) approach was very popular. But what is the difference?

Post Views: 743

Schema evolution and backward and forward compatibility for data in data lakes

10. September 2017 karden DATA PROCESSING

We have discussed before the format for clean and derived data in data lakes. One of the popular formats for this goal is an avro format. We will talk here why it is needed and how to achieve backward and forward compatibility by designing avro schemas.
Read more

Post Views: 7,203

HBase is next step in your big data technology stack

10. September 2017 karden DATA PROCESSING

Post Views: 627

Raw, clean, and derived data in data lakes based on HDFS

3. June 2017 karden DATA PROCESSING

You may think, that there is no need to structure data in HDFS. You can systemize it in the future. But I think this is a wrong way. We should always keep in mind: there is no free lunch. Therefore it is better to make desicions at the beginning.

Post Views: 1,238

Thoughts about schema-on-write and schema-on-read

2. June 2017 karden DATA PROCESSING

There are two approcahes, which we can select for designing storage of the data. They are schema-on-read and schema-on-write.

Post Views: 721

Short note about HDFS or why you need distributed file system

21. May 2017 karden DATA PROCESSING

Why do you need HDFS (Hadoop Distributed Files System)? If the amount of data is small and place on your computer is enough for this, then you do not need distributed file system. But if you like to process a large amount of data, which is not possible to save on one computer, then you need to think about distributed file system.

Post Views: 599