Book notes – Release It! Second Edition

You can buy this book by amazon.com.

Book Release It! is about how to architect, design, and build software. Reading these notes will not replace reading the book. I collect these notes to come back to them in future to reference them in current projects.

Chapter 1 Living in Production

Part I — Create Stability

Chapter 2 Case Study: The Exception That Grounded an Airline

Cause of problem is stmt.close() can throw an exception, therefore connection was not closed.

Chapter 3 Stabilize Your System

Systems should be tested also during long run without rebooting.

Think how to stop crack propagation, for example, by setting timeouts.

The more tightly coupled the architecture, the greater the chance the coding error can propagate.

Fault => Errors => Failures

Chapter 4 Stability Anti-patterns

Tight coupling ruins stability.

There are two extremes by integration points: monolith vs spiderweb.

New architect focuses on components, an experienced one on interconnections.

There is Oracle feature dead connection detection.

Treat response as data and only if you know if you need it start parsing.

Stability patterns to make integration points safer: Circuit Breaker and Decoupling Middleware.

Chain reaction in horizontal scaled architectures: if one machine has failed, load is distributed to other machines.

Keep as little in user session as possible. For example, you can use weak references.

Move memory outside of current process, for example using Redis.

Speed of memory is an important factor: registers, cache, local memory, disk, remote memory, and so on.

Think about sockets: open and closed.

Pay attention to user behavioral patterns: valuable users, accidental users, unwanted users.

Pay attention to blocked threads and synchronized functions.

Avoid deadlocks using timeouts.

Shared nothing architecture is an ideal case of horizontal scaling. It is most scalable architecture.

Prepare for expected load using autoscaling feature.

Point-to-point communication can be dangerous. There is a way to avoid it using concept of farms and load balancers between them, or using broadcasting, publish/subscribe, or messaging patterns.

Shared common services can become a bottleneck.

To avoid Dogpile you should schedule start of demand in random fashion, not at once.

If you have observer in system, observer should differentiate, between true state of the system and current state of the system available for observer. This can help avoid system crash, during usage of automatic services.

Always specify in request maximum limit of rows to retrieve.

List of anti-patterns:

  • Integration Points
  • Chain Reactions
  • Cascading Failures
  • Users
  • Blocked Threads
  • Self-Denial Attacks
  • Scaling Effects
  • Unbalanced Capacities
  • Dogpile
  • Force Multiplier
  • Slow Responses
  • Unbounded Result Sets

Chapter 5 Stability Patterns

List of stability patterns:

  • Timeouts
  • Circuit Breaker
  • Bulkheads
  • Steady State
  • Fail Fast
  • Let It Crash
  • Handshaking
  • Test Harnesses
  • Decoupling Middleware
  • Shed Load
  • Create Back Pressure
  • Governor

Well placed timeouts helps fault isolation.

Circuit breakers check system state and if operation possible (state closed) execute it, in other case do nothing (state open). Closed circuit breaker counts failed operations and after threshold changes state to open.

In a ship, bulkheads are partitions that, when sealed, divide the ship into separate, watertight compartments. You can partition your system in the same way.

Steady states: data purging, log files, in-memory caching.

Fail Fast: better is no responses, than slow responses.

Let It Crash (Akka): limited granularity, fast replacement, supervision, reintegration.

Handshaking is about rejecting of incoming work, because of full load.

Decoupling Middleware: irreversible decision.

Shed Load: refuse new requests, show that system is overloaded.

Create Back Pressure: block producers to add new item in queue.

Governor: limits speed.

Part II — Design for Production

Chapter 6 Case Study: Phenomenal Cosmic Powers, Itty-Bitty Living Space

Below is an explanation of design for production principle.

Chapter 7 Foundations

Concerns and levels of responsibility:

  • Operations – Security, availability, capacity, status, communication
  • Control Plane – System monitoring, deployment, anomaly detection, features
  • Interconnect – Routing, load balancing, failover, traffic management
  • Instances – Services, processes, components, instance monitoring
  • Foundation – Hardware, VMs, IP addresses, physical network

One machine can have many physical interfaces, as a result different names.

Any physical host resources is typically over subscribed with VMs resources.

System clock can be not stable during migration of VM from one host to another.

Some words how to design container applications, like, they have no identity, startup/shutdown should be quick, externalize networking and so on.

The 12-Factor App

  1. Codebase – Track one codebase in revision control. Deploy the same build to every environment.
  2. Dependencies – Explicitly declare and isolate dependencies.
  3. Config – Store config in the environment.
  4. Backing services – Treat backing services as attached resources.
  5. Build, release, run – Strictly separate build and run stages.
  6. Processes – Execute the app as one or more stateless processes.
  7. Port binding – Export services via port binding.
  8. Concurrency – Scale out via the process model.
  9. Disposability – Maximize robustness with fast startup and graceful shutdown.
  10. Dev/prod parity – Keep development, staging, and production as similar as possible.
  11. Logs – Treat logs as event streams.
  12. Admin processes – Run admin/management tasks as one-off processes.

Chapter 8 Processes on Machines

Code, config, and connection.

Be carefully about log messages. Build process should set log level to WARN automatically to avoid debug messages on prod.

Chapter 9 Interconnect

DNS for service discovery

Load Balancing (Software Load Balancing (Reverse Proxy), Hardware Load balancing (F5), Health Checks, Sttickiness (repeated requests, stateful), Partitioning Request Types (content-based routing))

Demand Control

too busy, try later; residence time;

Network Routing

Discovering Services

Migratory Virtual IP Addresses

Chapter 10 Control Plane

Mechanical Advantage

Postmortem review

  • Explain what happened
  • Apologize
  • Commit to improvement

Goal for the platform team is to enable their customers.

Provisioning and Deployment Services – pull vs push

List of services, which might be needed:

  • Log collection and search
  • Metrics collection and visualization
  • Deployment
  • Configuration service
  • Instance placement
  • Instance and system visualization
  • Scheduling
  • IP, overlay network, firewall, and route management
  • Autoscaler
  • Alerting and notification

Chapter 11 Security

Part III Deliver Your System

Chapter 12 Case Study: Waiting for Godot

Chapter 13 Design for Deployment

Key concerns: automation,orchestration, and zero-downtime deployment.

Ideal deployment tool matches current state and desired state.

Zero downtime, smaller and frequent deployments

Chapter 14 Handling Versions

Compatible vs incompatible API changes.

Part IV Solve Systemic Problems

Chapter 15 Case Study: Trampled by Your Own Customers

Chapter 16 Adaptation

Platform team.

Trade off efficiency for flexibility.

Evolutionary Architecture: Microservices, Microkernel and plugins, Event-based.

Loose Clustering

Six modular operators: Splitting, Substituting, Augmenting and Excluding, Inversion, Porting.

Create options for the future.

Information Architecture

Messages, Events, and Commands

  • Event notification
  • Event-carried state transfer
  • Event sourcing
  • Command-query responsibility segregation (CQRS)

Treat the messages like data instead of objects to support schema evolution.

There is no such thing like natural data model. It’s important to make deliberate choices about when to use relational, document, graph, key-value, or temporal databases.

Chapter 17 Chaos Engineering

  • Limit of capacity
  • Limit of safety
  • Limit of economy

Bibliography

William Kent. Data and Reality. 1st Books, Bloomington, IL, 1998

Neal Ford, Rebecca Parsons, and Pat Kua. Building Evolutionary Architectures.O’Reilly & Associates, Inc., Sebastopol, CA, 2017.

and more.