Book notes – Kubernetes: Up and Running: Dive into the Future of Infrastructure

You can buy this book from

This book descirbes not only, how Kubernetes works, but also design principles and ideas behind it. Notes below are ideas from this book with my own understanding. Therefore it is recommended also to read this nice book.

Containers should be immutable. That means, you should change them only during build process, not during runtime. Runtime changes only acceptable, if you would like quick recover running system, but these changes should be then integrated in build scripts.

State handling is very important in Kubernetes. You define state of your systems, send this state to Kubernetes, and Kubernetes will apply this state to running system. For example, you need 2 parallel running containers instead of 1. You inform about this Kubernetes and it starts additional container for you. State is better to save in git, so anytime you are able to see in what state the system right now, and what state was before.

Self-healing is one very attractive property. Kubernetes has possibility to perform automatic checks for health of the components. But it is clear, that development of components should be done with this idea in mind. In world of data applications, that means, components should be restartable, without the need of cleaning up the data storage.

Define microservices independent from each other. They should share data only through API. This helps to reduce teams dependency and coordination. In world of data applications your microservice can have own domain model and should not share it with other applications, like it is done in data warehousing.

Teams in Kubernetes world: Application, Cluster, Kernel, Hardware.

Namespaces is very powerfull in Kubernetes. For example, if you have DEV/TEST/PROD environments, they can share same hardware, but can be separated on logical view using dedicated namespaces. For data intensive analytical applications this is a very nice feature, since for copying data from one to another environment and keeping it in sync is very time consuming task. And simply reusing data but in logically separated way will help to reduce development time.

Layers in docker should be created from less likely changed layer to more often changed layers. Be careful, files, which are deleted on upper layers in any case occupy space.

Secrets and images should never be mixed.

Use registry for storing containers.

kubectl version
kubectl get componentstatuses
kubectl get nodes
kubectl describe nodes

Request vs limit for resources. Request is minimum needed, but can be more, limit is maximum allowed. CPU – easy to reduce, Memory, container will be killed.

from network to services
kubectl get daemonSets –namespace=kube-system kube-proxy

Names for services
kubectl get deployments –namespace=kube-system kube-dns
kubectl get services –namespace=kube-system kube-dns

kubectl get deployments –namespace=kube-system kubernetes-dashboard
kubectl get services –namespace=kube-system kubernetes-dashboard

Cluster-IP vs Network IP

kubectl config set-context my-context –namespace=abc
kubectl config use-context use-context my-context

As it was said before kubernetes check state of the system. System consists of objects, therefor to see current state or change state the following command can be used:
kubectl get [resource-name] [object-name] -o [json or yaml]
kubectl describe [resource-name] [object-name] -o [json or yaml]
kubectl apply -f obj.yaml
kubectl edit [resource-name] [object-name]
kubectl delete -f obj.yaml

Labels/Annotations (add and remove)
kubectl label pods bar color=red
kubectl label pods bar -color

kubectl logs [pod-anme] -c [container]

Execution of commands in container:
kubectl exec -it [pod-name] — bash

kubectl cp [pod-name]:/path/to/remote /path/to/local

All containers in pod always run on the same machine.
“Will these containers work correctly if they land on different machines?” no => use same pod. yes => use multiple pods.
Possible create pod as:
kubectl run [pod-name] –immage=[image path]
kubectl delete deployments/[pod-name]
Better is to change state and kubernetes will do this for you:
kubectl apply -f [pod-name].yaml
kubectl get pods
kubectl describe pods [pod-name]
kubectl delete pods/[pod-name]
kubectl delete -f [pod-name].yaml

Access pod from outside:
kubectl port-forward [pod-name] [port-pod]:[ext:pod]

Health Checks
liveness – checks app specific logic
readiness – ready to serve user requests
tcpSocket – checks socket
exec – any custom check

Resource Management
Requests vs Limits

Persisting Data with Volumes
Container deleted => data is also deleted. Volumes are needed for persisted data.
Pods => volumes => volumes may be accessed
Containers => volumeMounts => volume accessed
Special volume emptyDir => for caching, shared folder between containers. Will be deleted after pod is deleted.

Labels => grouping objects in Kubernetes
Annotations => description of objects for using by tools.
Labels can be used to separate between different environments: dev/test/prod. As labels we can mark different versions of application.
kubectl get pods –show-labels
Pods can be selected by labels
kubectl get pods –selector=”[label=value]”
Different selectors are possible.
Selectors can be used also in yaml.

Annotations can be used for
– track of changes
– what tool updates object
– build/release/image info
– track rollbacl for Deployment object

Services and Service Discovery
Service object in Kubernetes is for service discovery.
kubectl run [deploymnet-name]
kubectl expose deployment [deploymnet-name]
kubectl port-forward [pod] [pod port]:[external pod]

Cluster IP is a virtual IP.
svc means service.

kubectl get endpoints [service name] –watch see changes of object overtime.
NodePort – access cluster from outside.
LoadBalancer – cloud integration
Endpoints – for accessing services without cluster IP

Old way of service discovery using labels, but they should be in sync. New way using service object.

Request => kube-proxy => Cluster IP (load balance + iptables rules to redirect) => Endpoints => Service

ReplicaSets – pod manager replicates needed pods
– Redundancy
– Scale
– Sharding
ReplicaSets are for self healing.

Reconciliation Loops => check of desired state and current state. ReplicSet uses labels for this.

Action for misbehaved pods => put them in quarantine using ReplicaSet.
kubectl describe rs [replica set name]

HPA horizontal (creates more pods) pod autoscaling. Special pod heapster communicates current load.

Vertical (increases CPU) scaling is planned for implementation in Kubernetes.

Always use ReplicaSet, also if only one pod is needed. This will help with scaling in the future.

DaemonSet is used to run one instance of specific pod on every node on cluster. This is like agents, for example logs collecting agents.

kubectl describe daemonset [ds name]
It is possible to label also nodes, not pods. And then use them in DaemonSet.
DaemonSets support rolling update.

Jobs are useful for batch tasks.
Patterns: one shot, parallel fixed completions (parallelism and completions), work queue: parallel Jobs (producers and consumers).

ConfigMaps and Secrets
ConfigMap are supposed to be pod configuration for different environments. ConfigMap and Pod are combined together during start up.
ConfigMap can be created from file, environment varible, command line arguments.
Secrets can be created in a cluster. The can be accessible throug secrets volume.

Deployment manages ReplicaSets.
Strategies: Recreate and RollingUpdate

kubectl rollout status deployment [name]
kubectl rollout pause deployment [name]
kubectl rollout resume deployment [name]
kubectl rollout history deployment [name]
kubectl rollout undo deployment [name]
Instead of rollout undo it is better to revert YAML files from version control system.

Dynamic Volume Provisioning

– each replica gets persistent hostname name-i
– each replica created in order of i
– each replica delted in reverse order of i