I think, an idea to stream data back from HDFS into some streaming component, like Kafka, is coming from concept of Enterprise Service Bus (ESB). But after some thoughts, I have come to conclusion, that this concept is not useful in Big Data world.
Microservices approach gains recently popularity. Some time ago service oriented architecture (SOA) approach was very popular. But what is the difference?
We have discussed before the format for clean and derived data in data lakes. One of the popular formats for this goal is an avro format. We will talk here why it is needed and how to achieve backward and forward compatibility by designing avro schemas.
If you would like to turn on basic authentication for mondrian cubes from excel you need to implement steps below.
In this post I will explain, how to implement Kylin dialect in Mondrian.
You may think, that there is no need to structure data in HDFS. You can systemize it in the future. But I think this is a wrong way. We should always keep in mind: there is no free lunch. Therefore it is better to make desicions at the beginning.
There are two approcahes, which we can select for designing storage of the data. They are schema-on-read and schema-on-write.
Apache Kylin is very powerfull OLAP engine. It supports ODBC driver to move the data in excel, however this driver is not user friendly. Users should wright sql queries for this.