Skip to content

DekarLab

Blog about big data processing and data-driven investments

  • DATA PROCESSING
  • INVESTMENTS
  • MONEY BUILDER
  • INDEX TRACKING (ETF)
  • PDI/R PLUGIN
  • BOOKS
  • MISC
  • Impressum
DekarLab

Category: DATA PROCESSING

We will discuss here solutions for data processing and distributed computing, like HDFS, spark, and kubernetes.

Microservices vs service oriented architecture (SOA) and how containers change the rules of the game

19. September 2017 karden DATA PROCESSING

Microservices approach gains recently popularity. Some time ago service oriented architecture (SOA) approach was very popular. But what is the difference?

Read more

[Total: 0   Average: 0/5]
Post Views: 401

Schema evolution and backward and forward compatibility for data in data lakes

10. September 2017 karden DATA PROCESSING

We have discussed before the format for clean and derived data in data lakes. One of the popular formats for this goal is an avro format. We will talk here why it is needed and how to achieve backward and forward compatibility by designing avro schemas.
Read more

[Total: 3   Average: 3.7/5]
Post Views: 5,011

HBase is next step in your big data technology stack

10. September 2017 karden DATA PROCESSING

Read more

[Total: 0   Average: 0/5]
Post Views: 285

Authentication and authorizaton for XMLA Connect and Mondrian

4. August 2017 karden DATA PROCESSING

If you would like to turn on basic authentication for mondrian cubes from excel you need to implement steps below.
Read more

[Total: 1   Average: 5/5]
Post Views: 577

How to implement Kylin dialect for Mondrian

18. June 2017 karden DATA PROCESSING

In this post I will explain, how to implement Kylin dialect in Mondrian.

Read more

[Total: 2   Average: 2.5/5]
Post Views: 737

Improving performance by reading data with Hive for HDFS using subfolders (partitioning)

6. June 2017 karden DATA PROCESSING

In ourĀ  previous article we have discussed the root structure for HDFS. In this article we will discuss next level of the file structure, which will help to improve the speed of reading data.

Read more

[Total: 1   Average: 5/5]
Post Views: 292

Raw, clean, and derived data in data lakes based on HDFS

3. June 2017 karden DATA PROCESSING

You may think, that there is no need to structure data in HDFS. You can systemize it in the future. But I think this is a wrong way. We should always keep in mind: there is no free lunch. Therefore it is better to make desicions at the beginning.

Read more

[Total: 2   Average: 3/5]
Post Views: 713

Thoughts about schema-on-write and schema-on-read

2. June 2017 karden DATA PROCESSING

There are two approcahes, which we can select for designing storage of the data. They are schema-on-read and schema-on-write.

Read more

[Total: 1   Average: 4/5]
Post Views: 385

How to integrate Apache Kylin OLAP In Excel (pivot) [XMLA Connect and Mondrian]

21. May 2017 karden DATA PROCESSING

Apache Kylin is very powerfull OLAP engine. It supports ODBC driver to move the data in excel, however this driver is not user friendly. Users should wright sql queries for this.

Read more

[Total: 4   Average: 4.5/5]
Post Views: 2,708

Short note about HDFS or why you need distributed file system

21. May 2017 karden DATA PROCESSING

Why do you need HDFS (Hadoop Distributed Files System)? If the amount of data is small and place on your computer is enough for this, then you do not need distributed file system. But if you like to process a large amount of data, which is not possible to save on one computer, then you need to think about distributed file system.

Read more

[Total: 0   Average: 0/5]
Post Views: 263

Posts navigation

«Previous Posts 1 2 3 4 Next Posts»

Tags

all (50) article (3) auth (3) book (5) code on github (5) data lake (12) design (18) gui xmdm (1) hadoop (6) hbase (1) hive (1) index tracking (4) informatica (1) k8s (5) kafka (1) kylin (3) microservices (6) mondrian (1) money builder (5) OLAP in hadoop (3) pentaho (3) phd thesis (1) spark (4) zeppelin (2)
WordPress Theme: Poseidon by ThemeZee.