If you call the Anthropic API in production, you will eventually meet every HTTP status code in this table. Each means something specific, and each has a different right answer at the client side. Treating them all the same — the most common mistake — produces code that retries the wrong errors, gives up on the right ones, and makes outages worse.
This is a working reference. Pair it with the 529 Overloaded playbook for the deeper treatment of the most common load-related code.
The full table
| Code | Meaning at Anthropic | Retry? | Action |
|---|---|---|---|
| 200 | Success | n/a | Use the response. |
| 400 | Invalid request (malformed payload) | No | Fix the request, alert in your code. |
| 401 | Unauthorized (bad / missing API key) | No | Refresh credentials, alert ops. |
| 403 | Forbidden (key valid but lacks permission) | No | Check organization permissions. |
| 404 | Not found (typically wrong endpoint) | No | Fix the URL. |
| 413 | Payload too large | No | Reduce payload (truncate context). |
| 422 | Unprocessable entity (semantic validation) | No | Fix the request. |
| 429 | Rate limited (your quota exceeded) | Yes | Backoff with jitter; consider quota increase. |
| 500 | Server error | Cautiously | Retry once with delay; report if persistent. |
| 502 | Bad gateway | Yes | Backoff with jitter. |
| 503 | Service unavailable (deploys, edge issues) | Yes | Backoff with jitter. |
| 504 | Gateway timeout | Yes | Backoff, but consider request shape — long contexts can naturally hit this. |
| 529 | Overloaded | Yes, with discipline | See 529 playbook. |
The rest of this post walks through the codes that have non-obvious behavior.
4xx — your fault, do not retry
The 4xx range is the API telling you that the request itself is wrong. Retrying without changing the request is meaningless — it will fail the same way every time, just incurring more cost and noise.
400 vs 422
Both indicate “your request is malformed,” but they discriminate differently.
- 400 is the catch-all for syntactic problems: malformed JSON, missing required fields, fields with wrong types.
- 422 is the catch for semantic problems: the JSON is parseable and structurally valid, but the values are not allowed (e.g., an empty messages array, a tools array with a malformed tool definition, a model name that does not exist).
Practically, the distinction does not change your action — both require a code fix. But it does help your alerting: 400 spikes usually mean a deploy regressed your request construction; 422 spikes often mean the API has changed its accepted shapes (rare, but it happens during major version transitions).
401 and 403 are different
- 401 is “we do not know who you are.” Bad or missing API key. Treat as a credentials problem.
- 403 is “we know who you are and you are not allowed to do this.” Common cause: trying to use a model your organization does not have access to, or a feature gated behind a billing tier you do not have.
Code that treats 401 and 403 identically (e.g., “refresh the API key”) will keep failing on 403 because the key is fine — the permission is not.
413 is sneakier than it looks
The Anthropic API accepts very long inputs, especially for the long-context models. But payload size is bounded — by message-count limits, by token-count limits, and by raw byte limits on the HTTP body.
A 413 response usually means you exceeded the byte limit, which can happen if you are passing very large image attachments inline rather than as references. The fix is to compress, downsample, or use the Files API for large attachments rather than embedding them in the message.
429 has structure
Rate-limit responses include headers describing the limit you hit:
anthropic-ratelimit-requests-limit— your per-minute request budget.anthropic-ratelimit-requests-remaining— how much is left in the current window.anthropic-ratelimit-tokens-limitand corresponding-remaining— your token-per-minute budget.retry-after— seconds until the limit window resets.
Honor retry-after. Do not back off less than the value it says. Most clients implement this correctly; some quickstart-derived code does not, which produces a thundering-herd of clients waiting milliseconds before retrying after a 429.
429 is conceptually distinct from 529 — 429 says “you specifically have used your quota,” 529 says “the system is overloaded for everyone.” Asking for a quota increase fixes 429. It does not fix 529.
5xx — their fault, retry with discipline
The 5xx range is the API telling you it failed for reasons unrelated to the shape of your request. Most are retryable; the discipline is in how.
500 is the catch-all
A 500 from the Anthropic API is rare and usually short-lived. It typically means an unhandled exception inside the API server — not a routine load problem. Retry once with a small delay (a few seconds). If the second request also returns 500, the underlying issue is real and your retry is unlikely to help; report and fall back.
The most common 500 patterns we have seen in customer error logs:
- Bursts during deploy windows, lasting under a minute.
- Sustained 500s on a specific endpoint while others work — usually indicates a partial-deploy issue and is fixed by Anthropic within minutes.
- A small floor of 500s during normal operation — typically less than 0.01% of requests, and not worth alerting on individually.
502 and 503 are routing problems
- 502 is “the gateway could not reach the backend.” Often during deploy rollovers.
- 503 is “the service is temporarily unavailable.” Could be planned (rare for the Anthropic API) or unplanned.
Both should be retried with exponential backoff and full jitter, as described in the 529 playbook. The same retry policy works for 502/503/529. The same code can handle all three.
504 has a request-shape signal
A 504 says the gateway timed out waiting for the backend. This can be a server-side problem, but it can also be a request-shape problem. If you are sending a very long context with many tool turns, the model can take genuinely long to produce a complete response, and the gateway has a finite patience.
If your 504s correlate with your longest-context requests, the problem may be on your side: chunk the work, use streaming, or reduce the context length. If your 504s are uncorrelated with your request shape, treat them like 502/503.
529 deserves its own discipline
We have written about 529 at length elsewhere. The short version: retry with exponential backoff and full jitter, capped on wall-clock budget rather than attempt count, with a circuit breaker that opens during sustained 529 bursts.
A unified retry policy
The simplest production-grade policy:
On status code:
200 → use response
400-422 → do not retry, alert in code
429 → honor retry-after, then retry up to budget
500-504 → exponential backoff with full jitter, cap at 15s wall clock
529 → same as 500-504, but check upstream status for early fail-fast
other → log, alert, do not retry
The 80% case is exactly this. Layer on a circuit breaker that opens if the rolling rate of 5xx exceeds a threshold (e.g., 30% of the last 30 seconds of requests), and you have a robust client.
When to call humans
Some failures are not fixed by waiting. Page on-call when:
- Sustained 500-level errors for more than 5 minutes against a single endpoint.
- 401 spikes that suggest a credentials rotation issue on your side.
- 429 with quotas you did not expect — your traffic shape changed and you may need to ramp quota.
- Persistent 422 after a deploy — your request construction regressed.
Do not page on-call for routine 529 or 503 bursts that resolve within a few minutes. Those are the system doing what it does. Your retry-with-jitter handles them; humans add no value above the automated policy.
How this dashboard helps
Two specific decisions you can make faster with the dashboard open:
- Should I retry or fall back? Read the API component on the homepage. If green and you are seeing 5xx, the issue is local or very narrow — retry is fine. If amber or red, your retries are competing with thousands of other clients’ retries — switch to fallback.
- Is this a known issue? Read the active incident card or scroll the historical events feed. If your 5xx pattern matches an open incident, the right wait time is “until Anthropic posts Resolved.” If your 5xx pattern does not match anything, you may be looking at a local issue.
The dashboard is not a debugger. But it can answer the binary question — known incident vs. local issue — in under a second, and that decision changes the rest of your response.
Beyond HTTP — the response body
A 200 status code does not mean the response is what you asked for. The response body can still contain:
- A
stop_reasonofmax_tokens— the response was truncated; you may need to retry with a larger budget. - A
stop_reasonoftool_usewhen you did not expect tool use — your prompt may have triggered a tool call you need to handle. - A content array that contains a refusal — the model declined to answer; not a failure of the API, a behavior of the model.
These are not errors at the HTTP layer. They are API-shaped responses that your application logic needs to handle. The HTTP error code reference above gets you most of the way; the response-body inspection is the rest.
Treat HTTP errors as transport-layer signals and response-body fields as semantic-layer signals. They are independent, and a complete client handles both.