← Blog

Claude API error code reference — when to retry, when to fail fast, when to call humans

· 7 min read
apierrorsreferenceretries

If you call the Anthropic API in production, you will eventually meet every HTTP status code in this table. Each means something specific, and each has a different right answer at the client side. Treating them all the same — the most common mistake — produces code that retries the wrong errors, gives up on the right ones, and makes outages worse.

This is a working reference. Pair it with the 529 Overloaded playbook for the deeper treatment of the most common load-related code.

The full table

CodeMeaning at AnthropicRetry?Action
200Successn/aUse the response.
400Invalid request (malformed payload)NoFix the request, alert in your code.
401Unauthorized (bad / missing API key)NoRefresh credentials, alert ops.
403Forbidden (key valid but lacks permission)NoCheck organization permissions.
404Not found (typically wrong endpoint)NoFix the URL.
413Payload too largeNoReduce payload (truncate context).
422Unprocessable entity (semantic validation)NoFix the request.
429Rate limited (your quota exceeded)YesBackoff with jitter; consider quota increase.
500Server errorCautiouslyRetry once with delay; report if persistent.
502Bad gatewayYesBackoff with jitter.
503Service unavailable (deploys, edge issues)YesBackoff with jitter.
504Gateway timeoutYesBackoff, but consider request shape — long contexts can naturally hit this.
529OverloadedYes, with disciplineSee 529 playbook.

The rest of this post walks through the codes that have non-obvious behavior.

4xx — your fault, do not retry

The 4xx range is the API telling you that the request itself is wrong. Retrying without changing the request is meaningless — it will fail the same way every time, just incurring more cost and noise.

400 vs 422

Both indicate “your request is malformed,” but they discriminate differently.

Practically, the distinction does not change your action — both require a code fix. But it does help your alerting: 400 spikes usually mean a deploy regressed your request construction; 422 spikes often mean the API has changed its accepted shapes (rare, but it happens during major version transitions).

401 and 403 are different

Code that treats 401 and 403 identically (e.g., “refresh the API key”) will keep failing on 403 because the key is fine — the permission is not.

413 is sneakier than it looks

The Anthropic API accepts very long inputs, especially for the long-context models. But payload size is bounded — by message-count limits, by token-count limits, and by raw byte limits on the HTTP body.

A 413 response usually means you exceeded the byte limit, which can happen if you are passing very large image attachments inline rather than as references. The fix is to compress, downsample, or use the Files API for large attachments rather than embedding them in the message.

429 has structure

Rate-limit responses include headers describing the limit you hit:

Honor retry-after. Do not back off less than the value it says. Most clients implement this correctly; some quickstart-derived code does not, which produces a thundering-herd of clients waiting milliseconds before retrying after a 429.

429 is conceptually distinct from 529 — 429 says “you specifically have used your quota,” 529 says “the system is overloaded for everyone.” Asking for a quota increase fixes 429. It does not fix 529.

5xx — their fault, retry with discipline

The 5xx range is the API telling you it failed for reasons unrelated to the shape of your request. Most are retryable; the discipline is in how.

500 is the catch-all

A 500 from the Anthropic API is rare and usually short-lived. It typically means an unhandled exception inside the API server — not a routine load problem. Retry once with a small delay (a few seconds). If the second request also returns 500, the underlying issue is real and your retry is unlikely to help; report and fall back.

The most common 500 patterns we have seen in customer error logs:

502 and 503 are routing problems

Both should be retried with exponential backoff and full jitter, as described in the 529 playbook. The same retry policy works for 502/503/529. The same code can handle all three.

504 has a request-shape signal

A 504 says the gateway timed out waiting for the backend. This can be a server-side problem, but it can also be a request-shape problem. If you are sending a very long context with many tool turns, the model can take genuinely long to produce a complete response, and the gateway has a finite patience.

If your 504s correlate with your longest-context requests, the problem may be on your side: chunk the work, use streaming, or reduce the context length. If your 504s are uncorrelated with your request shape, treat them like 502/503.

529 deserves its own discipline

We have written about 529 at length elsewhere. The short version: retry with exponential backoff and full jitter, capped on wall-clock budget rather than attempt count, with a circuit breaker that opens during sustained 529 bursts.

A unified retry policy

The simplest production-grade policy:

On status code:
  200       → use response
  400-422   → do not retry, alert in code
  429       → honor retry-after, then retry up to budget
  500-504   → exponential backoff with full jitter, cap at 15s wall clock
  529       → same as 500-504, but check upstream status for early fail-fast
  other     → log, alert, do not retry

The 80% case is exactly this. Layer on a circuit breaker that opens if the rolling rate of 5xx exceeds a threshold (e.g., 30% of the last 30 seconds of requests), and you have a robust client.

When to call humans

Some failures are not fixed by waiting. Page on-call when:

Do not page on-call for routine 529 or 503 bursts that resolve within a few minutes. Those are the system doing what it does. Your retry-with-jitter handles them; humans add no value above the automated policy.

How this dashboard helps

Two specific decisions you can make faster with the dashboard open:

  1. Should I retry or fall back? Read the API component on the homepage. If green and you are seeing 5xx, the issue is local or very narrow — retry is fine. If amber or red, your retries are competing with thousands of other clients’ retries — switch to fallback.
  2. Is this a known issue? Read the active incident card or scroll the historical events feed. If your 5xx pattern matches an open incident, the right wait time is “until Anthropic posts Resolved.” If your 5xx pattern does not match anything, you may be looking at a local issue.

The dashboard is not a debugger. But it can answer the binary question — known incident vs. local issue — in under a second, and that decision changes the rest of your response.

Beyond HTTP — the response body

A 200 status code does not mean the response is what you asked for. The response body can still contain:

These are not errors at the HTTP layer. They are API-shaped responses that your application logic needs to handle. The HTTP error code reference above gets you most of the way; the response-body inspection is the rest.

Treat HTTP errors as transport-layer signals and response-body fields as semantic-layer signals. They are independent, and a complete client handles both.

Share this post