← All posts

Feb 22, 2025

Liveness vs Readiness Probes Explained

A practical explanation of probe design, rollout safety, and common Kubernetes mistakes.

KubernetesSREReliability

Kubernetes gives teams powerful deployment primitives, but it does not stop bad assumptions. One of the most common mistakes is treating liveness and readiness probes like a box to tick instead of a reliability control.

That mistake causes cascading restarts, failed rollouts, and services that look healthy from the outside while silently dropping user traffic.

The difference in plain English

Liveness probe: Is this process unhealthy enough that it should be restarted?

Readiness probe: Is this instance ready to receive real traffic right now?

If you blur those two questions, Kubernetes will make the wrong decision very quickly.

What readiness should check

A readiness probe should answer whether the application can serve a normal request safely.

Good readiness checks often validate:

  • application boot completed
  • DB connection pool is usable
  • required caches are warm enough
  • critical dependencies are reachable or degraded safely
  • migrations are not blocking startup

A readiness endpoint is not just HTTP 200 from process. That tells you almost nothing.

What liveness should check

Liveness is there to detect a stuck or dead process. It should be conservative.

It should not fail because:

  • one downstream service is temporarily slow
  • a database has a transient blip
  • startup is still in progress
  • traffic is high for a short period

If liveness is too aggressive, Kubernetes becomes your outage amplifier. It will restart an app that may have recovered naturally.

Final thought

Readiness protects users. Liveness protects the platform. Startup protects initialization.

When teams understand those roles clearly, Kubernetes rollouts become far safer. When they do not, probes become self-inflicted outages with YAML attached.