DATA PROCESSING – DekarLab

Authentication in Hadoop cluster: MIT Kerberos and Active Directory

23. May 2020 karden DATA PROCESSING

There are different options how to activate kerberos in Hadoop cluster.

Post Views: 1,331

Kerberos: overview

22. May 2020 karden DATA PROCESSING

Kerberos authentication protocol is needed to secure Hadoop cluster. This is the only way to make Hadoop cluster secure.

Post Views: 830

Call Apache Spark from your microservice: idea about implementation

2. May 2020 karden DATA PROCESSING

During implementation of your components as a microservice, you can come to an idea to use Apache Spark for data retrieval. I will describe ideas how to do this in this post.

Post Views: 1,059

Apache Zeppelin: behind spark interpreter

1. May 2020 karden DATA PROCESSING

Here is an overview, what is hidden behind spark interpreter in Apache Zeppelin.

Post Views: 1,680

Follow the source: usage of CQRS pattern in data lake

16. April 2020 karden DATA PROCESSING

There is a pattern in microservices architecture: Command and Query Responsibility Segregation (CQRS). This pattern helps to design multi-purpose data lake.

Post Views: 1,542

Book notes – Release It! Second Edition

13. February 2020 karden BOOKS, DATA PROCESSING

You can buy this book by amazon.com.

Post Views: 1,020

Authentication and authorization in Hadoop cluster

6. February 2020 karden DATA PROCESSING

Here we explain concepts behind activation of security in Hadoop cluster.

Post Views: 1,516

Meta data service and schema registry in data lake

13. June 2019 karden DATA PROCESSING

Maintaining data description is useful feature. There are some ideas, how to implement this.

Post Views: 1,532

ORM (object-relational mapping) analog for data in data lake

13. June 2019 karden DATA PROCESSING

We start saving data in HDFS using avro format. In previous post we have discussed about forward and backward compatibility of avro schemas. How to use this concept?

Post Views: 874

Running Apache Zeppelin in K8s cluster and integration with YARN cluster

23. December 2018 karden DATA PROCESSING

Useful way for implementing CI/CD pipeline is to pack code as docker and run in K8s cluster. One very practical application for data analytics is notebook based tool Apache Zeppelin. Every business department requires own configuration for zeppelin. Hence, there is an idea to create docker containers for every department and run in k8s cluster.

Post Views: 1,119