There are different options how to activate kerberos in Hadoop cluster.
Read moreCategory: DATA PROCESSING
We will discuss here solutions for data processing and distributed computing, like HDFS, spark, and kubernetes.
Kerberos: overview
Kerberos authentication protocol is needed to secure Hadoop cluster. This is the only way to make Hadoop cluster secure.
Read moreCall Apache Spark from your microservice: idea about implementation
During implementation of your components as a microservice, you can come to an idea to use Apache Spark for data retrieval. I will describe ideas how to do this in this post.
Read moreApache Zeppelin: behind spark interpreter
Here is an overview, what is hidden behind spark interpreter in Apache Zeppelin.
Read moreFollow the source: usage of CQRS pattern in data lake
There is a pattern in microservices architecture: Command and Query Responsibility Segregation (CQRS). This pattern helps to design multi-purpose data lake.
Read moreBook notes – Release It! Second Edition
Authentication and authorization in Hadoop cluster
Here we explain concepts behind activation of security in Hadoop cluster.
Read moreMeta data service and schema registry in data lake
Maintaining data description is useful feature. There are some ideas, how to implement this.
Read moreORM (object-relational mapping) analog for data in data lake
We start saving data in HDFS using avro format. In previous post we have discussed about forward and backward compatibility of avro schemas. How to use this concept?
Read moreRunning Apache Zeppelin in K8s cluster and integration with YARN cluster
Useful way for implementing CI/CD pipeline is to pack code as docker and run in K8s cluster. One very practical application for data analytics is notebook based tool Apache Zeppelin. Every business department requires own configuration for zeppelin. Hence, there is an idea to create docker containers for every department and run in k8s cluster.
Read more