What to tell your users when Claude is down — communication templates that actually work

If your product calls the Anthropic API in the user’s path, then when Anthropic has a bad hour, you have a bad hour. The users on your end do not see Anthropic’s status page. They see your product failing. The single highest-leverage thing you can do during an upstream outage is communicate quickly and honestly to your own users.

This post is six communication templates — one for each stage of a typical incident — with the parts that turn out to matter and the parts that turn out to backfire.

The principle

Three rules drive every template below:

Acknowledge before you explain. Users who are currently broken want acknowledgment first, explanation second. A long technical paragraph that does not start with “yes, this is real” reads as evasion.
Name the dependency. “We are experiencing issues” is worse than “our AI provider is experiencing issues.” Naming Anthropic explicitly does not pass blame; it tells the user what the failure mode is and lets them assess whether their workflow has a workaround.
Give a what-to-do, not a what-to-feel. Telling users you are sorry is fine but optional; telling them what to do is required.

We're seeing elevated errors from our AI provider (Anthropic).
Some [feature/action] requests may fail or take longer than usual.
We're tracking the situation. Status: claudestatus.com

What this template does well:

Acknowledges the problem in the first half of the first sentence.
Names the upstream so users can verify independently.
Tells them which features are affected (do not say “everything” — be specific).
Provides a third-party link they can check, which signals confidence in the diagnosis.

What to avoid:

Do not promise an ETA. You do not have one. Promised ETAs that slip make things worse.
Do not over-apologize. The user is already inconvenienced; they do not need three sentences of regret. One acknowledgment is plenty.

Template 2 — Within 15 minutes (status channel post / email)

Subject: Service degradation — AI features

What's happening:
Anthropic's API is reporting elevated error rates and longer response times.
This affects: [list of your specific features]

What we're doing:
- Monitoring the situation
- Applying retries with backoff to recover transient failures
- [If applicable: switching to a fallback model where possible]

What you can do:
- Wait — most requests will succeed on retry
- For urgent work, [specific workaround or fallback in your product]

We'll update this thread when the situation changes. Track upstream:
https://status.claude.com  |  https://claudestatus.com

The structure is borrowed from incident-response writing: what’s happening, what we’re doing, what you can do. Each section is short and concrete. The “what you can do” section is the part most templates skip and the part users find most useful.

Template 3 — During a long incident (1+ hours, periodic update)

Update [HH:MM UTC]: AI provider degradation continues.

Latest from Anthropic: [paste their most recent Statuspage update verbatim,
shortlink to the official incident].

Our metrics show [specific number — e.g., "API error rate at 12%, down from
peak of 38%"]. [Specific feature] is recovering; [other feature] is still
affected.

We are continuing to monitor. Next update at [time + 30/60 minutes,
whichever cadence you committed to].

Two things this gets right:

Specific numbers. “Error rate at 12%” is far more useful than “things are improving.” Users — especially the technical ones — read specific numbers as a sign that you actually have telemetry, not as marketing softening.
A commitment to a next-update time. Users who do not have to wonder “when will I hear next” stop checking the status thread compulsively.

The next-update commitment is binding. If you say “next update at 14:30 UTC” and 14:30 passes without an update, you have just lost trust. Pick a cadence you can keep — 30 minutes minimum during active events.

Template 4 — After resolution (within 30 minutes of upstream “Resolved”)

Resolved: AI features are operating normally.

Anthropic posted resolution at [HH:MM UTC]. Our metrics confirm full recovery
across [feature, feature, feature].

If you experienced specific failures during the incident, those requests
will retry automatically on next use; no action needed on your part.

Postmortem to follow once Anthropic publishes their detailed report.
We'll link to it here.

What this gets right:

Confirms resolution against your own metrics, not just upstream’s. “Anthropic says it’s resolved” plus “our error rate is back to baseline” is a stronger statement than either alone.
Tells affected users what they need to do next (typically nothing — modern retry logic catches the tail).
Points forward to postmortem. If users want closure, they know where to look.

Template 5 — Postmortem follow-up (1-7 days after)

Postmortem: AI provider incident on [date]

Anthropic published their full postmortem: [link]

Summary: [2-3 sentences in plain language about what happened and why,
sourced from Anthropic's writeup, not invented].

Impact on us: During the incident, [specific number] of requests failed
between [start] and [end]. [Specific feature] was unavailable for [duration].

What we're changing on our end:
- [Concrete improvement — e.g., "Adding fallback to a smaller model for
  this specific feature"]
- [Another concrete improvement — e.g., "Tightening our retry budget so
  long-tail incidents do not eat user latency"]

Thanks to everyone who reported issues during the event.

The postmortem follow-up is optional but highly leveraged. Most users will not read it. The ones who do are typically your highest-trust accounts — engineers, ops folks, decision-makers at your customer organizations. A clear postmortem signals operational maturity.

What to avoid in postmortems:

Blaming the upstream. “Anthropic let us down” sounds petty even when true. Stick to what happened.
Promising things you cannot deliver. “We will never let this happen again” is a forecast you cannot make. “We are reducing our exposure to this specific failure mode” is something you can measure.
Hiding the impact. If 1,200 user requests failed, say so. Specific numbers build trust.

Template 6 — When upstream is up but your users are still broken

The trickiest case. Anthropic reports green; your error rate is still elevated. Possible causes: cached errors at your edge, regional routing issues, your own retry queue still draining, something at a layer you have not investigated.

Most of our AI features have recovered. A small number of users may still
see slow responses or occasional errors as caches and queues drain. If
you are still seeing problems, please [link to support form / refresh
button / specific action]. Investigating the residual issues now.

This is not a great template — there is no good template for this case. The best you can do is acknowledge that some users are still broken even though the official upstream is green, and give them a way to surface that to you so you can investigate. The honesty here is more important than the polish.

What not to send

A few patterns that consistently backfire:

Silent recovery. Users notice when the product comes back, but a thread that ends without acknowledgment of resolution is a trust loss. Always close the loop.
Auto-generated “we are aware” messages with no detail. Users have learned that these are bot-shaped boilerplate. The lack of specifics is read as “no one has actually looked at this.” Always have a human-readable detail.
“Sorry, we know this is frustrating” with no other content. Apologies without information are noise. Acknowledge the inconvenience by giving the user information that is actually useful.
Marketing language during incidents. “We are continuously improving the user experience” reads as gaslighting during an active outage. Plain, technical language wins.

The cadence rule

The cadence of updates matters more than the eloquence of any single update.

For incidents under 30 minutes: one acknowledgment + one resolution. Two messages.

For incidents 30 minutes to 2 hours: one acknowledgment + updates every 30 minutes + resolution. Average 4–6 messages.

For incidents over 2 hours: one acknowledgment + updates every 60 minutes + resolution. Long-tail postmortem follow-up.

Skipping cadence to “wait until we have something definitive” is the most common mistake. Users would rather hear “still investigating, no new information” every 30 minutes than hear nothing for 90 minutes followed by a long writeup. The cost of a no-news update is small. The cost of silence is real.

For the link your users follow during incidents, the official Anthropic Statuspage is canonical, this dashboard is supplementary. Linking both gives users two independent ways to verify what you have told them, which is the strongest possible signal that you are not making it up.

Subscribe to Updates

The principle#

Template 1 — Within 5 minutes of detection (in-app banner)#

Template 2 — Within 15 minutes (status channel post / email)#

Template 3 — During a long incident (1+ hours, periodic update)#

Template 4 — After resolution (within 30 minutes of upstream “Resolved”)#

Template 5 — Postmortem follow-up (1-7 days after)#

Template 6 — When upstream is up but your users are still broken#

What not to send#

The cadence rule#

Share this post