Running apache zeppelin in k8s cluster and integration with yarn cluster

Useful way for implementing CI/CD pipeline is to pack code as docker and run in k8s cluster. One very practical application for data analytics is notebook based tool apache zeppelin. Every business department requires own configuration for zeppelin. Hence, there is an idea to create docker containers for every department and run in k8s cluster.

Read more
[Total: 0    Average: 0/5]

Hybrid cloud architecture for data lake applications

Big data technologies nowadays are very mature. Typically you use HDFS, or another distributed file systems, like S3, for storing data, Spark as a processor engine, and YARN as a resource manager. Next steps, wich you probably would like to achieve, are implement CI/CD (continuous integration and delivery) and move workload on demand in cloud.
Read more

[Total: 0    Average: 0/5]

Remote submit of spark jobs

Remote submit is a powerful feature of Apache Spark. Why it is needed? For example, you can experiment with different versions of Spark, independent of what you have in the cluster. Or if you have no direct access to cluster you can start your spark jobs remotely.
Read more

[Total: 0    Average: 0/5]