Blog

Field notes on Claude — uptime, errors, latency, and the math behind the dashboard.

Working posts written when something on the dashboard is interesting enough to need explaining at length. Topics rotate between methodology, playbooks, and observations from running a continuous global probe of claude.ai.

Latest May 30, 2026 · 5 min read

Anthropic's Trust Center vs the public Statuspage — when to read which

Two official surfaces from Anthropic carry different kinds of operational information. The Trust Center talks to compliance and procurement; the Statuspage talks to engineers in the middle of an incident. A guide to reading both.

trust-centerstatuspagecompliance

Read post

All posts

May 28, 2026 · 6 min

When claude.ai loads but the chat doesn't — debugging the gap between front door and inference

The marketing page renders fine; the chat interface spins forever. Latency probes look green; the dashboard is bored. Why this gap exists, what causes it, and how to read it.

debuggingclaude-ai

Read

May 26, 2026 · 5 min

What 99.9% uptime actually buys you — three weeks of math

Uptime numbers feel small until you convert them to real time. A guided tour from 99% to 99.999%, with the budget each percentage gives you per month, per quarter, and per year.

uptimesla

Read

May 24, 2026 · 7 min

Multi-provider LLM failover — the patterns that actually work in production

When Claude is having a bad hour, can you fall back to GPT or Gemini and keep your product running? The architectures that work, the ones that look like they work but do not, and the cost of resilience.

resiliencefailover

Read

May 22, 2026 · 7 min

Claude API error code reference — when to retry, when to fail fast, when to call humans

A working reference of every HTTP status code the Anthropic API returns, what each actually means at the model layer, and the right client-side reaction. Pairs with the 529 playbook.

apierrors

Read

May 20, 2026 · 6 min

What to tell your users when Claude is down — communication templates that actually work

If your product depends on Claude and Claude is having a bad day, your users want to know within minutes — and they want to know it in plain language. Six templates, one for each stage of an incident, with the parts that matter.

incidentscommunication

Read

May 18, 2026 · 5 min

Why 80 characters is the right cap for a community report

An 80-character limit on the community report description was not a typewriter joke. It was a deliberate choice that shaped the moderation profile, the spam profile, and the readability of the sidebar.

communitymoderation

Read

May 16, 2026 · 6 min

Best-of-three vs median — picking the right summary statistic for global latency probes

When you sample three probe nodes per country, you have a choice: report the best, the median, the worst, or some average. Each tells you a different thing. The choice we made and why.

latencystatistics

Read

May 14, 2026 · 6 min

How to read a Statuspage postmortem — what each section actually tells you

Statuspage incidents pass through five states, and each state's update is written for a different audience. A guide to reading them in order, what to expect from each, and which are worth the time.

incidentspostmortems

Read

May 12, 2026 · 7 min

Claude Code's stability curve — observations from a CLI tool's first year on the status page

Claude Code joined the public Statuspage as a separate component in mid-2025. A reading of how its uptime profile differs from the API and the chat surface, and what that tells you about CLI tool reliability.

claude-codecli

Read

May 10, 2026 · 6 min

RSS for status pages — why our feed is built for machines, not humans

The /rss.xml endpoint on this site is not a blog feed. It is the cheapest way to wire Claude incident updates into a Slack channel, a PagerDuty rule, or your own dashboard. The design choices that made it useful.

rssfeeds

Read

May 8, 2026 · 6 min

Reading 30-day uptime bars — a visual literacy guide

The thirty colored squares under each component encode more information than they look like. A guide to reading them, the partial-day rule, and what each shade actually means.

uptimevisualization

Read

May 6, 2026 · 6 min

A quiet weekend with Claude — what the data looks like during normal times

Most of what a status dashboard does is record the absence of news. Here is what the data actually looks like when nothing is wrong, and why that baseline matters more than the incident days.

baselinedata

Read

May 4, 2026 · 7 min

Why our status page does not lag — the cache strategy in two layers

A status dashboard that is itself slow during outages is worse than no dashboard at all. The two-layer caching choice that keeps every page render under 100ms even when upstream Statuspage is being hammered.

cachingperformance

Read

May 2, 2026 · 7 min

What the community report sidebar tells us — and what it does not

Anonymous, rate-limited, three-day rolling. The sidebar on the homepage looks like noise until you watch it during a real incident. A guide to reading the signal in user reports.

communityreports

Read

April 30, 2026 · 9 min

Claude vs OpenAI vs Gemini status pages — a side-by-side reading

All three frontier-model providers publish a public status page. They communicate very different things. A comparison of what each tells you, what each leaves out, and which to trust during an outage.

comparisonstatus

Read

April 28, 2026 · 8 min

What 'partial outage' actually means at Anthropic — a Statuspage taxonomy

The four impact levels Anthropic uses on its public Statuspage are tightly defined, but most users misread them. A guided tour, with examples, of what each level looks like in practice and which ones to actually worry about.

statuspagetaxonomy

Read

April 26, 2026 · 10 min

Latency to claude.ai from 17 countries: a 30-day baseline

What it actually takes to reach claude.ai from Singapore, São Paulo, Frankfurt, or Mumbai — measured every five minutes for thirty days, with the regional anomalies that show up only when you sample widely enough.

latencyglobal

Read

April 24, 2026 · 9 min

Reading Claude's 529 Overloaded errors: a triage playbook

What HTTP 529 actually means at the Anthropic API layer, when to retry vs. fail fast, the backoff math that minimizes your bill without making the spike worse, and how to correlate 529 bursts with the public uptime feed.

apierrors

Read

April 22, 2026 · 8 min

How we calculate 30-day uptime — and why our number is lower than the marketing figure

A walkthrough of interval-merge downtime math, with a worked example from a real Claude incident, and a comparison with how AWS, GCP, and OpenAI report uptime in their marketing.

methodologyuptime

Read

Subscribe to Updates

All posts