Step by Step process to identify Failure modes in distributed systems.

Example application architecture

Let’s begin, shall we?

Step 1 :

Step 2:

Step 3:

Step 4:

Possible failures in Kafka broker
Schema Registry Failure
Control Center Failures
Example of possible failure mode of the architecture

Step 5:

Step 6:

Step 7:

Why categorising you ask ?

Good question. Categorisation allows you to consolidate steps to fix common failures into one document (runbook), which can be used by SRE or the support team.

Step 8:

Step 9:

Step 10:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Software engineer, Big data application architect and programming language enthusiast. A guy who like technical discussions . Author on