Reading Claude's 529 Overloaded errors: a triage playbook

If you build on top of Claude, you have seen this:

HTTP/1.1 529
{"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}}

The 529 status code is the most common load-related response from Anthropic’s API. It is not a server bug, it is not a misconfiguration on your side, and it is not the same as a 500 or a 503 — though many client libraries treat all three identically, which is one of the reasons retries can make outages worse rather than better.

This is a working playbook. It covers what 529 means, how to retry it correctly, when to stop retrying, and how to use the public Anthropic Statuspage feed to decide whether the burst you are seeing is local-to-you or global.

What 529 actually means

529 is a non-standard HTTP status code originally popularized by Cloudflare as “Site is overloaded.” Anthropic returns it on the API when the upstream model-serving capacity is saturated and a request cannot be admitted. It is conceptually distinct from:

Status	Meaning at Anthropic	Client behavior
`429`	Rate limit (you exceeded your org’s quota or burst window)	Back off and retry; consider quota increase.
`500`	Server-side bug or unhandled error	Retry once with caution; log and report.
`503`	Service unavailable (often during deploys or front-end hiccups)	Retry with backoff.
`529`	Capacity overloaded	Retry with backoff and jitter — but with stricter caps.

The key behavioral difference is that 529 indicates a stochastic, possibly short-lived congestion event rather than a quota you have crossed. You are not being throttled because your API key is doing something wrong. The next request from the same key, a few hundred milliseconds later, has a real chance of succeeding.

The implication is that retries are appropriate, but only with discipline.

When 529 happens

Three operational contexts produce 529 bursts:

Model-tier saturation. A specific model (commonly the most recent flagship) is at capacity. Switching to an older model often clears the error immediately, because the saturation is per-model, not per-account.
Region-wide congestion. Spikes that affect every customer in a region, often correlated with viral product launches or cross-vendor outages that send traffic from elsewhere to Claude.
Account-shaped patterns. Less common, but a single customer suddenly issuing thousands of long-context requests can create localized backpressure that returns 529 to the same customer’s other requests.

Telling these apart matters because the right response is different in each case. In the first, fall back to a smaller model. In the second, back off and retry. In the third, look at your own request shape.

A correct retry policy

The wrong retry policy is the one almost every quickstart shows you: a for i in range(5) loop with time.sleep(1). This is the textbook example of a retry storm amplifier. When upstream is overloaded, every client in the world doing that loop is now hammering the same overloaded endpoint with synchronized retries.

The correct policy has four properties:

1. Exponential backoff

Each successive retry waits longer than the last. A common base is delay = base * 2^attempt, with base somewhere between 200ms and 1s. After three failures you are waiting 4 seconds; after five, 16 seconds.

2. Full jitter

Random the delay into [0, computed_delay], not the deterministic delay. Synchronized retries are the failure mode you are avoiding. The AWS Architecture Blog math on this is well known: full jitter consistently beats exponential-backoff-without-jitter in every congestion scenario, because it desynchronizes clients.

function nextDelayMs(attempt: number): number {
  const base = 500;        // ms
  const cap = 30_000;      // 30s ceiling
  const exp = Math.min(cap, base * 2 ** attempt);
  return Math.random() * exp;
}

3. A hard cap on total elapsed time, not just attempt count

A loop that retries 7 times with exponential backoff can wait over a minute, which is probably longer than your end-user is willing to wait at a chat prompt. Cap on wall-clock budget instead:

const deadline = Date.now() + 15_000; // 15s budget
let attempt = 0;
while (Date.now() < deadline) {
  const res = await callClaude();
  if (res.status !== 529 && res.status !== 503) return res;
  await sleep(Math.min(nextDelayMs(attempt), deadline - Date.now()));
  attempt++;
}
return giveUpResponse();

4. A circuit breaker at the application layer

If 529 has been the dominant response code for the last 30 seconds, every additional request you send is making the situation marginally worse for everybody, including yourself. Open a circuit breaker — refuse to even attempt outbound requests for a short cooldown — and your service becomes part of the recovery instead of part of the spike. Most resilience libraries have one built in; if you do not have one, the simplest possible version is a counter.

When to fall back instead of retrying

Retrying makes sense when the most-likely cause is short-term congestion that will clear within seconds. Retrying does not make sense when the most-likely cause is sustained capacity exhaustion that will not clear within your latency budget.

Heuristics that suggest “fall back, do not keep retrying”:

The same request has now received 529 three times in a row across more than five seconds of wall-clock time. Congestion that lasts five seconds is not a microspike.
The official Statuspage indicator is currently major or critical for the API component. The system as a whole is experiencing a publicly acknowledged event; your retry is not going to be the one that succeeds.
Your latency budget for this call is under 2 seconds (interactive UI) and you have already burned 1 second.

A good fallback is a smaller, cheaper model on the same family, or — if the use case allows — a degraded path that does not invoke the model at all (cached response, simpler heuristic, “results temporarily unavailable” message).

Correlating 529 with the public uptime feed

The public Anthropic Statuspage exposes a summary.json endpoint that lists per-component status indicators. When the API component’s indicator is none, your 529 bursts are likely local — your retry policy is the right tool. When the indicator is minor or major, your 529s are part of a systemic event — your fallback policy is the right tool.

Polling that endpoint inside your application is fine; the public Statuspage CDN caches it for ~10 seconds and Atlassian explicitly states the feed is unrate-limited for public consumers. A pattern we like:

Maintain an in-process cache of the last fetch (e.g., 30 seconds) of summary.json.
On 529 from the API, check the cache. If api.anthropic.com is currently major+, skip retries entirely and trigger your fallback.
Otherwise, run your normal retry-with-jitter loop.

This costs you one extra HTTP request per ~30 seconds during a burst, which is negligible, and it converts an opaque “the system seems slow” outage into a deterministic “the system is publicly degraded, fail fast” decision.

You can also subscribe to the RSS feed of incidents — pushing the question into your incident channel rather than polling — but polling is fine for most application-tier code.

Tracking your own 529 rate

Three metrics, easy to add to whatever you already have:

anthropic_request_status_total{status} — counter, labeled by HTTP status. Watch the ratio of 529 to 200.
anthropic_request_duration_seconds — histogram, label by status so you can see whether 529 retries are eating your tail latency.
anthropic_circuit_breaker_state — gauge, 0/1. Watch how often you trip.

If your 529 rate ever crosses ~5% of total requests sustained over a five-minute window, your retry policy is keeping a struggling upstream slightly more struggling. Time to look at the fallback path.

What 529 is not

A few common misreads worth flagging:

529 is not a quota error. Asking for a quota increase will not reduce it. The throttling that quota controls is 429, not 529.
529 is not a model error. It does not mean your prompt was malformed, your tools were misconfigured, or anything else at the prompt layer. Resending the same exact request after a backoff is a valid recovery.
529 is not always Claude’s fault. Some 529s on the api.anthropic.com surface are upstream infrastructure congestion (Cloudfront edges, regional networking) that no amount of model capacity would fix.

What we do on this site

Internally, the dashboard you are reading polls the public Statuspage feed every two minutes and surfaces the API component’s current indicator at the top of the page. If you are seeing 529s right now and the API component is green here, your retry-with-jitter policy is doing the right thing. If the API component is amber or red, your fallback path is the right one. The whole point of an external status dashboard is to give you that decision in less than a second.

Most of the time, the answer is: retry with jitter, cap at 15 seconds wall-clock, and move on. The harder discipline is knowing when to stop retrying. That is the only thing that separates a code path that recovers from one that amplifies the outage.

Subscribe to Updates

What 529 actually means#

When 529 happens#

A correct retry policy#

1. Exponential backoff#

2. Full jitter#

3. A hard cap on total elapsed time, not just attempt count#

4. A circuit breaker at the application layer#

When to fall back instead of retrying#

Correlating 529 with the public uptime feed#

Tracking your own 529 rate#

What 529 is not#

What we do on this site#

Share this post