Microservices Re-Explained · 108 Circuit Breakers, Retries, and Timeouts Are Not Optional

The mechanics of resource starvation, retry amplification, and why a slow downstream service is categorically more dangerous than a dead one.

May 27, 2026

An in-process method call that fails does so in nanoseconds — stack unwind, exception caught, thread freed. A network call that stalls holds an OS thread, an open socket file descriptor, a connection pool slot, and any database transaction attached to that request context — for the entire duration of the stall. There is no error signal. The socket is open. From the JVM’s perspective, work is in progress.

At 400 rps with a 5-second downstream stall, a thread pool of 200 saturates in under 3 seconds. Every subsequent request queues, then times out at the gateway, then surfaces as a 504 to the user — while the actual downstream service is still alive, just slow. Latency without bounds is operationally equivalent to a total outage, but harder to detect and slower to recover from.

Fault tolerance is not error handling. It is resource consumption bounding — a set of hard constraints that cap how much of your system’s capacity any single downstream failure can consume.

Timeouts — Bound the Socket, Not the Request

There are two distinct timeout phases engineers routinely conflate into a single configuration value.

Connect Timeout bounds TCP handshake and TLS negotiation — the time to establish the socket. Keep this short: 500ms–1s. A host that can’t complete a handshake in one second has a routing or capacity problem that won’t resolve itself in 30 more seconds.

Read (Request) Timeout bounds the time from first request byte sent to last response byte received. This is the timeout that prevents thread starvation. It must be set per-operation, not per-client, because GET /products/{id} and POST /reconcile have structurally different latency profiles.

Production Warning

Never guess timeout values. Derive them from observability data:

Instrument your HTTP clients. Emit latency histograms per endpoint.
Set Read Timeout = P99.9 latency × 1.5–2× operational buffer.
If your P99.9 is 900ms, a 30s timeout is not protection — it is a 30-second blast radius.

Deadline Propagation Across Call Chains

Static per-service timeouts break in multi-hop graphs. If Service A has a 2s budget and spends 1.6s before calling B, Service B has no awareness of the 400ms residual. It applies its own 2s timeout, continues doing work, and returns a response that Service A will never read — because A already timed out upstream.

Unprotected vs. Deadline-Propagated Call Chain


── WITHOUT DEADLINE PROPAGATION ──────────────────────────────────

 Client (budget: 2000ms)
   └─► [A] static timeout=2000ms
         └─► [B] static timeout=2000ms   ← B has no idea A is gone
               └─► [C] static timeout=2000ms
                     └─► [D]
                                           Ghost work: B, C, D continue
                                           for up to 6s after client timed out.

── WITH X-Request-Deadline PROPAGATION ───────────────────────────

 Client sets header: X-Request-Deadline: 

   └─► [A] reads header → remaining=2000ms
         spends 200ms → remaining=1800ms
         propagates: X-Request-Deadline: 
         sets outbound timeout = min(1800ms - reserve, maxAllowed)

         └─► [B] reads header → remaining=1600ms
               spends 300ms → remaining=1300ms
               propagates header downstream

               └─► [C] reads header → remaining=950ms
                     if remaining < MIN_VIABLE_BUDGET → abort immediately
                     no call to D, no wasted compute

Each service reads the absolute epoch deadline from the incoming header, computes remaining time, and uses min(remaining − reserve_ms, configured_max) as its outbound timeout. If remaining time is below a minimum viable threshold — typically 50ms — the service aborts immediately with 504. No downstream call is attempted.

Retries — The Math of Amplification

At stable-state, retries absorb transient blips. During systemic degradation — which is when you actually need them — retries are a force multiplier on the failing service’s inbound load.

The amplification factor: 1,000 rps baseline × (1 original + 3 retries) = 4,000 rps hitting a downstream that is already failing. Five upstream callers doing the same: up to 20,000 rps from 4,000 of legitimate traffic. This is not a theoretical edge case — it is the default behavior of any service with maxAttempts=4 and no backoff.

Exponential Backoff with Full Jitter

Plain exponential backoff spaces retries out — but synchronized callers that all timed out at T=0 will all retry at T=1s, T=2s, T=4s together. The herd is just periodic instead of immediate. Full jitter breaks synchronization by randomizing within each backoff window:

Full Jitter Formula

Sleep = random(0, min(MaxSleep, Base × Factor^Attempt))

Base=100ms · Factor=2 · MaxSleep=30,000ms
Attempt 1: random(0, 200ms) · Attempt 2: random(0, 400ms) · Attempt 3: random(0, 800ms)
500 callers compute random(0, 400ms) independently → uniform distribution, no spike.

Token-Bucket Retry Budget

Per-request retry limits don’t cap aggregate retry volume. The rule: retry only if recent retry attempts represent less than 10% of total request volume in the last 60 seconds. When the budget is exhausted, return a fast failure immediately — no retry attempt, no delay. A fast 503 to the caller is better than hammering a degraded downstream until it crashes completely.

Idempotency is a Prerequisite, Not a Detail

Never enable retries on a mutation endpoint without server-side deduplication. A timed-out POST /payments may have already executed. Retrying without an idempotency key creates a duplicate charge.
The client generates a stable UUID before the first attempt and carries it across all retries as Idempotency-Key.
The server checks this key against a short-lived store (Redis, DB) before executing. Ship the server-side guard first. Enable retries second.

The Complete Backend Interview Kit

System design deep dives, distributed systems internals, Java/Spring Boot, and the questions asked at Uber, Atlassian, Mastercard, and JPMC. 300+ pages, production-grade.

International Link → India Link →

Circuit Breakers — Trip on Rate, Not Just Errors

A circuit breaker in Open state fails requests in microseconds — no socket open, no thread parked, no connection pool slot consumed. It protects the caller’s resource pool as much as it protects the downstream service’s recovery window.

Configure on a sliding time-window error rate, not an absolute error count. An absolute count fires on a quiet service at 3 AM after two failures out of two calls. A percentage-based window on 30 seconds with a minimum call volume of 20 requests behaves correctly across traffic patterns. Trip at 50% error rate. Also trip at 80% slow-call rate — a service returning 200 OK in 6 seconds is failing your SLA even though it isn’t throwing errors.

Fallback Strategies: The Circuit Breaker is the Trigger, Not the Feature

A tripped breaker that throws CallNotPermittedException to the user is an incomplete implementation. The fallback is the engineering work:

Stale Cache Read: For idempotent reads, return the last known good response from Redis or a local cache on breaker open. Serve a stale product catalogue rather than a 500 homepage. Mark the response with a staleness flag if the consumer needs to know.

Java · Resilience4j — Cache Fallback on Open Circuit

Supplier<CatalogueResponse> decorated =
    CircuitBreaker.decorateSupplier(breaker,
        () -> catalogueClient.fetch(categoryId));

CatalogueResponse result = Try.ofSupplier(decorated)
    .recover(CallNotPermittedException.class, ex ->
        localCache.getIfPresent(categoryId)        // serve stale
            .orElse(CatalogueResponse.empty())     // or degrade gracefully
    ).get();

Transactional Outbox: For non-critical writes — notifications, audit events, analytics — write to a local outbox table inside the same transaction as your core operation. The outbox processor retries delivery out-of-band when the downstream recovers. The user’s action succeeds; the side effect is eventually consistent.

Java · Outbox on Circuit Open

@Transactional
public OrderConfirmation createOrder(OrderRequest req) {
    Order order = orderRepository.save(Order.from(req));

    try {
        CircuitBreaker.decorateRunnable(notificationBreaker,
            () -> notificationClient.send(order.toEvent())
        ).run();
    } catch (CallNotPermittedException e) {
        // Breaker open — defer, don't fail.
        outboxRepository.enqueue(OutboxEvent.pending(order));
    }

    return OrderConfirmation.of(order); // Always succeeds.
}

Engineering Principle

Design the fallback before configuring the breaker. If you cannot name the fallback strategy for a given circuit, the breaker is not production-ready — it will fast-fail your users at the exact moment the downstream is struggling.

Service Mesh vs. Application Layer

Istio and Linkerd handle network-level timeout and retry policies uniformly across services you may not control. They are the right home for mTLS, baseline connection pool limits, and generic 503-retry policies on idempotent routes. But they cannot see your application domain:

Use the mesh for: uniform network timeouts, mTLS, traffic shaping, and retry on safe HTTP verbs (GET, HEAD) where no idempotency risk exists.
Use the application layer for: fallback strategies (cache reads, outbox writes), idempotency-key-gated retries on mutations, token-bucket retry budgets, slow-call-based circuit tripping, and deadline header propagation logic.
Never mesh-only: Envoy retrying a POST /payments on 503 without an idempotency key is a billing disaster waiting for a degraded downstream to trigger it.

The Complete Backend Interview Kit

System design deep dives, distributed systems internals, Java/Spring Boot, and the questions asked at Uber, Atlassian, Mastercard, and JPMC. 300+ pages, production-grade.

International Link → India Link →

Production Checklist

Run this against every service before it handles production traffic:

Connect timeout and Read timeout configured separately on every HTTP/gRPC client.
Read timeout derived from P99.9 latency × 1.5–2×, not from intuition or documentation defaults.
Per-operation timeout overrides exist for endpoints with structurally different latency profiles.
X-Request-Deadline header read on ingress, propagated on every outbound call, with a minimum viable budget guard.
Retries use full jitter backoff — not fixed delay, not plain exponential.
Global retry budget capped at ≤10% of request volume over a 60-second window.
Every retried mutation endpoint has server-side idempotency key deduplication in place before retries were enabled.
Circuit breaker configured with sliding time-window percentage — not absolute count.
Slow-call threshold configured alongside error-rate threshold.
Every circuit breaker has a named fallback: stale cache, outbox, or explicit degraded response — never a raw exception to the user.
Mesh-level retries disabled on all non-idempotent routes.

These patterns are not defensive additions you bolt on after the first outage. They are the contract your service makes with every caller: I will consume a bounded, predictable share of shared infrastructure, regardless of what my dependencies do. Without that contract, you have not built a service. You have built a liability.

The Modern Backend publishes deep technical writing for senior engineers, staff engineers, and engineering leaders who are tired of advice that sounds good in theory and collapses under delivery pressure.

Buy me Cofee

The Modern Backend

Discussion about this post

Ready for more?