← Blog

What to tell your users when Claude is down — communication templates that actually work

· 6 min read
incidentscommunicationplaybooktemplates

If your product calls the Anthropic API in the user’s path, then when Anthropic has a bad hour, you have a bad hour. The users on your end do not see Anthropic’s status page. They see your product failing. The single highest-leverage thing you can do during an upstream outage is communicate quickly and honestly to your own users.

This post is six communication templates — one for each stage of a typical incident — with the parts that turn out to matter and the parts that turn out to backfire.

The principle

Three rules drive every template below:

  1. Acknowledge before you explain. Users who are currently broken want acknowledgment first, explanation second. A long technical paragraph that does not start with “yes, this is real” reads as evasion.
  2. Name the dependency. “We are experiencing issues” is worse than “our AI provider is experiencing issues.” Naming Anthropic explicitly does not pass blame; it tells the user what the failure mode is and lets them assess whether their workflow has a workaround.
  3. Give a what-to-do, not a what-to-feel. Telling users you are sorry is fine but optional; telling them what to do is required.

Template 1 — Within 5 minutes of detection (in-app banner)

We're seeing elevated errors from our AI provider (Anthropic).
Some [feature/action] requests may fail or take longer than usual.
We're tracking the situation. Status: claudestatus.com

What this template does well:

What to avoid:

Template 2 — Within 15 minutes (status channel post / email)

Subject: Service degradation — AI features

What's happening:
Anthropic's API is reporting elevated error rates and longer response times.
This affects: [list of your specific features]

What we're doing:
- Monitoring the situation
- Applying retries with backoff to recover transient failures
- [If applicable: switching to a fallback model where possible]

What you can do:
- Wait — most requests will succeed on retry
- For urgent work, [specific workaround or fallback in your product]

We'll update this thread when the situation changes. Track upstream:
https://status.claude.com  |  https://claudestatus.com

The structure is borrowed from incident-response writing: what’s happening, what we’re doing, what you can do. Each section is short and concrete. The “what you can do” section is the part most templates skip and the part users find most useful.

Template 3 — During a long incident (1+ hours, periodic update)

Update [HH:MM UTC]: AI provider degradation continues.

Latest from Anthropic: [paste their most recent Statuspage update verbatim,
shortlink to the official incident].

Our metrics show [specific number — e.g., "API error rate at 12%, down from
peak of 38%"]. [Specific feature] is recovering; [other feature] is still
affected.

We are continuing to monitor. Next update at [time + 30/60 minutes,
whichever cadence you committed to].

Two things this gets right:

The next-update commitment is binding. If you say “next update at 14:30 UTC” and 14:30 passes without an update, you have just lost trust. Pick a cadence you can keep — 30 minutes minimum during active events.

Template 4 — After resolution (within 30 minutes of upstream “Resolved”)

Resolved: AI features are operating normally.

Anthropic posted resolution at [HH:MM UTC]. Our metrics confirm full recovery
across [feature, feature, feature].

If you experienced specific failures during the incident, those requests
will retry automatically on next use; no action needed on your part.

Postmortem to follow once Anthropic publishes their detailed report.
We'll link to it here.

What this gets right:

Template 5 — Postmortem follow-up (1-7 days after)

Postmortem: AI provider incident on [date]

Anthropic published their full postmortem: [link]

Summary: [2-3 sentences in plain language about what happened and why,
sourced from Anthropic's writeup, not invented].

Impact on us: During the incident, [specific number] of requests failed
between [start] and [end]. [Specific feature] was unavailable for [duration].

What we're changing on our end:
- [Concrete improvement — e.g., "Adding fallback to a smaller model for
  this specific feature"]
- [Another concrete improvement — e.g., "Tightening our retry budget so
  long-tail incidents do not eat user latency"]

Thanks to everyone who reported issues during the event.

The postmortem follow-up is optional but highly leveraged. Most users will not read it. The ones who do are typically your highest-trust accounts — engineers, ops folks, decision-makers at your customer organizations. A clear postmortem signals operational maturity.

What to avoid in postmortems:

Template 6 — When upstream is up but your users are still broken

The trickiest case. Anthropic reports green; your error rate is still elevated. Possible causes: cached errors at your edge, regional routing issues, your own retry queue still draining, something at a layer you have not investigated.

Most of our AI features have recovered. A small number of users may still
see slow responses or occasional errors as caches and queues drain. If
you are still seeing problems, please [link to support form / refresh
button / specific action]. Investigating the residual issues now.

This is not a great template — there is no good template for this case. The best you can do is acknowledge that some users are still broken even though the official upstream is green, and give them a way to surface that to you so you can investigate. The honesty here is more important than the polish.

What not to send

A few patterns that consistently backfire:

The cadence rule

The cadence of updates matters more than the eloquence of any single update.

For incidents under 30 minutes: one acknowledgment + one resolution. Two messages.

For incidents 30 minutes to 2 hours: one acknowledgment + updates every 30 minutes + resolution. Average 4–6 messages.

For incidents over 2 hours: one acknowledgment + updates every 60 minutes + resolution. Long-tail postmortem follow-up.

Skipping cadence to “wait until we have something definitive” is the most common mistake. Users would rather hear “still investigating, no new information” every 30 minutes than hear nothing for 90 minutes followed by a long writeup. The cost of a no-news update is small. The cost of silence is real.

For the link your users follow during incidents, the official Anthropic Statuspage is canonical, this dashboard is supplementary. Linking both gives users two independent ways to verify what you have told them, which is the strongest possible signal that you are not making it up.

Share this post