
Kubernetes Failure Lab
Interactive lab for learning Kubernetes troubleshooting.
Screenshots


The Problem
Most engineers learn how to deploy to Kubernetes, but struggle when things go wrong in production.
The Solution
Interactive platform for diagnosing Kubernetes failures.
Implementation Details
The best way to learn is to fix a broken system. The Kubernetes Failure Lab provides a set of "Broken Cluster" scenarios that you have to investigate and repair.
Scenario Engine
I built an automated agent that breaks specific parts of a K8s cluster (e.g., misconfiguring a CoreDNS ConfigMap or deleting a CNI binary) and then challenges the user to find the root cause using standard kubectl and observability tools.
Real-world Simulations
Scenarios include "The Stealthy CPU Leak," "The Flaky Webhook," and "The OOMKill Mystery"—all based on real incidents I've encountered or studied.