DekarLab – Page 2 – Blog about big data processing and data-driven investments

Running Apache Zeppelin in K8s cluster and integration with YARN cluster

23. December 2018 karden DATA PROCESSING

Useful way for implementing CI/CD pipeline is to pack code as docker and run in K8s cluster. One very practical application for data analytics is notebook based tool Apache Zeppelin. Every business department requires own configuration for zeppelin. Hence, there is an idea to create docker containers for every department and run in k8s cluster.

Book notes – Microservice patterns: with examples in java

22. December 2018 karden BOOKS, DATA PROCESSING

You can buy this book from amazon.de.

Structural minimization of tracking error

15. December 2018 karden INDEX TRACKING (ETF), INVESTMENTS

The structural tracking error minimization (STEM) approach produces stable tracking portfolios out-of-sample in the crucial investment period. Full version:
Quantitative Finance

Two infrastructure layers for distributed systems

8. November 2018 karden DATA PROCESSING

It looks like, that separation between two infrastructure layers is increasing.
Read more

Why contracts are important in data intensive applications with microservices

12. October 2018 karden DATA PROCESSING

Main purpose of using microservices architecture is to increase velocity of development and reduce system complexity.
Read more

Protected: Detecting fraud in financial reporting with ML

10. October 2018 karden INVESTMENTS

Book notes – Kubernetes: Up and Running: Dive into the Future of Infrastructure

16. September 2018 karden BOOKS, DATA PROCESSING

You can buy this book from amazon.de.
Read more

Hybrid cloud architecture for data lake applications

22. May 2018 karden DATA PROCESSING

Big data technologies nowadays are very mature. Typically you use HDFS, or another distributed file systems, like S3, for storing data, Spark as a processor engine, and YARN as a resource manager. Next steps, wich you probably would like to achieve, are implement CI/CD (continuous integration and delivery) and move workload on demand in cloud.
Read more

Remote submit of spark jobs

1. May 2018 karden DATA PROCESSING

Remote submit is a powerful feature of Apache Spark. Why it is needed? For example, you can experiment with different versions of Spark, independent of what you have in the cluster. Or if you have no direct access to cluster you can start your spark jobs remotely.
Read more

Tuning spark parameters

27. December 2017 karden DATA PROCESSING

Tuning spark parameters is not a trivial task. In this short post I will explain how to tune some of the important parameters.
Read more