Step by Step process to identify Failure modes in distributed systems.

Example application architecture

Let’s begin, shall we?

Step 1 :

Step 2:

Step 3:

Step 4:

Possible failures in Kafka broker
Schema Registry Failure
Control Center Failures
Example of possible failure mode of the architecture

Step 5:

Step 6:

Step 7:

Why categorising you ask ?

Good question. Categorisation allows you to consolidate steps to fix common failures into one document (runbook), which can be used by SRE or the support team.

Step 8:

Step 9:

Step 10:



