Field notes on Claude — uptime, errors, latency, and the math behind the dashboard.
Working posts written when something on the dashboard is interesting enough to need explaining at length. Topics
rotate between methodology, playbooks, and observations from running a continuous global probe of
claude.ai.
Two official surfaces from Anthropic carry different kinds of operational information. The Trust Center talks to compliance and procurement; the Statuspage talks to engineers in the middle of an incident. A guide to reading both.
The marketing page renders fine; the chat interface spins forever. Latency probes look green; the dashboard is bored. Why this gap exists, what causes it, and how to read it.
Uptime numbers feel small until you convert them to real time. A guided tour from 99% to 99.999%, with the budget each percentage gives you per month, per quarter, and per year.
When Claude is having a bad hour, can you fall back to GPT or Gemini and keep your product running? The architectures that work, the ones that look like they work but do not, and the cost of resilience.
A working reference of every HTTP status code the Anthropic API returns, what each actually means at the model layer, and the right client-side reaction. Pairs with the 529 playbook.
If your product depends on Claude and Claude is having a bad day, your users want to know within minutes — and they want to know it in plain language. Six templates, one for each stage of an incident, with the parts that matter.
An 80-character limit on the community report description was not a typewriter joke. It was a deliberate choice that shaped the moderation profile, the spam profile, and the readability of the sidebar.
When you sample three probe nodes per country, you have a choice: report the best, the median, the worst, or some average. Each tells you a different thing. The choice we made and why.
Statuspage incidents pass through five states, and each state's update is written for a different audience. A guide to reading them in order, what to expect from each, and which are worth the time.
Claude Code joined the public Statuspage as a separate component in mid-2025. A reading of how its uptime profile differs from the API and the chat surface, and what that tells you about CLI tool reliability.
The /rss.xml endpoint on this site is not a blog feed. It is the cheapest way to wire Claude incident updates into a Slack channel, a PagerDuty rule, or your own dashboard. The design choices that made it useful.
The thirty colored squares under each component encode more information than they look like. A guide to reading them, the partial-day rule, and what each shade actually means.
Most of what a status dashboard does is record the absence of news. Here is what the data actually looks like when nothing is wrong, and why that baseline matters more than the incident days.
A status dashboard that is itself slow during outages is worse than no dashboard at all. The two-layer caching choice that keeps every page render under 100ms even when upstream Statuspage is being hammered.
Anonymous, rate-limited, three-day rolling. The sidebar on the homepage looks like noise until you watch it during a real incident. A guide to reading the signal in user reports.
All three frontier-model providers publish a public status page. They communicate very different things. A comparison of what each tells you, what each leaves out, and which to trust during an outage.
The four impact levels Anthropic uses on its public Statuspage are tightly defined, but most users misread them. A guided tour, with examples, of what each level looks like in practice and which ones to actually worry about.
What it actually takes to reach claude.ai from Singapore, São Paulo, Frankfurt, or Mumbai — measured every five minutes for thirty days, with the regional anomalies that show up only when you sample widely enough.
What HTTP 529 actually means at the Anthropic API layer, when to retry vs. fail fast, the backoff math that minimizes your bill without making the spike worse, and how to correlate 529 bursts with the public uptime feed.
A walkthrough of interval-merge downtime math, with a worked example from a real Claude incident, and a comparison with how AWS, GCP, and OpenAI report uptime in their marketing.