Skip to content

DekarLab

Blog about big data processing and data-driven investments

  • DATA PROCESSING
  • INVESTMENTS
  • MONEY BUILDER
  • INDEX TRACKING (ETF)
  • PDI/R PLUGIN
  • BOOKS
  • MISC
  • Impressum
DekarLab

Category: DATA PROCESSING

We will discuss here solutions for data processing and distributed computing, like HDFS, spark, and kubernetes.

Authentication in Hadoop cluster: MIT Kerberos and Active Directory

23. May 2020 karden DATA PROCESSING

There are different options how to activate kerberos in Hadoop cluster.

Read more
Post Views: 608

Kerberos: overview

22. May 2020 karden DATA PROCESSING

Kerberos authentication protocol is needed to secure Hadoop cluster. This is the only way to make Hadoop cluster secure.

Read more
Post Views: 388

Call Apache Spark from your microservice: idea about implementation

2. May 2020 karden DATA PROCESSING

During implementation of your components as a microservice, you can come to an idea to use Apache Spark for data retrieval. I will describe ideas how to do this in this post.

Read more
Post Views: 547

Apache Zeppelin: behind spark interpreter

1. May 2020 karden DATA PROCESSING

Here is an overview, what is hidden behind spark interpreter in Apache Zeppelin.

Read more
Post Views: 1,115

Follow the source: usage of CQRS pattern in data lake

16. April 2020 karden DATA PROCESSING

There is a pattern in microservices architecture: Command and Query Responsibility Segregation (CQRS). This pattern helps to design multi-purpose data lake.

Read more
Post Views: 938

Book notes – Release It! Second Edition

13. February 2020 karden BOOKS, DATA PROCESSING

You can buy this book by amazon.com.

Read more
Post Views: 549

Authentication and authorization in Hadoop cluster

6. February 2020 karden DATA PROCESSING

Here we explain concepts behind activation of security in Hadoop cluster.

Read more
Post Views: 905

Meta data service and schema registry in data lake

13. June 2019 karden DATA PROCESSING

Maintaining data description is useful feature. There are some ideas, how to implement this.

Read more
Post Views: 1,048

ORM (object-relational mapping) analog for data in data lake

13. June 2019 karden DATA PROCESSING

We start saving data in HDFS using avro format. In previous post we have discussed about forward and backward compatibility of avro schemas. How to use this concept?

Read more
Post Views: 454

Running Apache Zeppelin in K8s cluster and integration with YARN cluster

23. December 2018 karden DATA PROCESSING

Useful way for implementing CI/CD pipeline is to pack code as docker and run in K8s cluster. One very practical application for data analytics is notebook based tool Apache Zeppelin. Every business department requires own configuration for zeppelin. Hence, there is an idea to create docker containers for every department and run in k8s cluster.

Read more
Post Views: 751

Posts navigation

1 2 3 4 Next Posts»

Tags

all (50) article (3) auth (3) book (5) code on github (5) data lake (12) design (18) gui xmdm (1) hadoop (6) hbase (1) hive (1) index tracking (4) informatica (1) k8s (5) kafka (1) kylin (3) microservices (6) mondrian (1) money builder (5) OLAP in hadoop (3) pentaho (3) phd thesis (1) spark (4) zeppelin (2)
WordPress Theme: Poseidon by ThemeZee.