Maintaining data description is useful feature. There are some ideas, how to implement this.Read more
We start saving data in HDFS using avro format. In previous post we have discussed about forward and backward compatibility of avro schemas. How to use this concept in praxis?Read more
Useful way for implementing CI/CD pipeline is to pack code as docker and run in k8s cluster. One very practical application for data analytics is notebook based tool apache zeppelin. Every business department requires own configuration for zeppelin. Hence, there is an idea to create docker containers for every department and run in k8s cluster.Read more
The structural tracking error minimization (STEM) approach produces stable tracking portfolios out-of-sample in the crucial investment period. Full version:
It looks like, that separation between two infrastructure layers is increasing.
Main purpose of using microservices architecture is to increase velocity of development and reduce system complexity.
Big data technologies nowadays are very mature. Typically you use HDFS, or another distributed file systems, like S3, for storing data, Spark as a processor engine, and YARN as a resource manager. Next steps, wich you probably would like to achieve, are implement CI/CD (continuous integration and delivery) and move workload on demand in cloud.