Uptime percentages are tiny numbers. The difference between 99.5% and 99.9% looks like rounding error. Converted to time, the gap between those two numbers is the difference between a service that has 3.6 hours of downtime per month and one that has 43 minutes — a real, operational difference that changes how you architect on top of it.
This is the conversion table we wish more dashboards published, with notes on what each level actually buys you in practice.
The conversion
Per month (30 days = 43,200 minutes):
| Uptime | Allowed downtime per month | Per quarter | Per year |
|---|---|---|---|
| 99% | 7.2 hours | 21.6 hours | 3.65 days |
| 99.5% | 3.6 hours | 10.8 hours | 1.83 days |
| 99.9% | 43 minutes | 2.16 hours | 8.76 hours |
| 99.95% | 21 minutes | 1.08 hours | 4.38 hours |
| 99.99% | 4.3 minutes | 13 minutes | 52.6 minutes |
| 99.999% | 26 seconds | 79 seconds | 5.26 minutes |
The shape that matters: each “9” you add roughly cuts the allowed downtime by 10×. The cost of building toward each additional “9” goes up much faster than 10×.
Reading the table
A few practical translations:
99% — “we will have a bad day every quarter.” 7.2 hours of monthly downtime is roughly one bad workday per month. Quite a lot of consumer software, including many SaaS products in their first year, operates around this level. It is not catastrophic, but it is visible to users.
99.5% — “we will have a bad afternoon every month.” 3.6 hours per month is one extended outage during business hours, or a few shorter ones spread out. A reasonable target for early-stage products with limited operational maturity.
99.9% — “three nines.” 43 minutes per month. The conventional target for production B2B services. Achievable with careful engineering, on-call rotations, and deliberate operational practice. Most modern cloud services operate around this level.
99.95% — “three and a half nines.” 21 minutes per month. The target most public uptime SLAs from major cloud vendors come in at, with credit clauses below it. Genuinely hard to hit; requires multi-region deployments, mature incident response, and careful change management.
99.99% — “four nines.” 4.3 minutes per month. The target for critical infrastructure — payment processors, financial trading systems, telecom. Requires global redundancy, regional failover, and significant investment in operational practice. Few organizations operate any service at this level for extended periods.
99.999% — “five nines.” 26 seconds per month. The aspiration for a small set of mission-critical systems where outages translate to lost lives or hundreds of millions in losses. Reaching this level reliably is a research problem, not an engineering project.
What this means for AI provider availability
Most public AI services, including Anthropic’s API, operate in the high 99.9% range over long windows, with occasional bad months that drop them lower. Looking at the 30-day uptime numbers we publish, recent months have ranged roughly between 99.7% and 99.95% across components, depending on the month and the specific component.
If you are building on top of an AI service whose track record is roughly 99.9%, the implication is plan for 30–60 minutes of downtime per month. That is the realistic operational envelope. Architect your application to either degrade gracefully during those windows, or make peace with users seeing failures during them.
The math compounds across multiple dependencies. A service that depends on three independent 99.9% upstreams has an effective availability of 0.999³ = 99.7%, which converts to 2.16 hours per month rather than 43 minutes. Each additional dependency further reduces the achievable ceiling. This is one reason multi-provider redundancy can pay off — it converts an AND of vendor uptimes into an OR.
The cost curve, qualitatively
Going from 99% to 99.5% is mostly about not shipping obvious bugs. It costs a couple of engineers paying attention.
Going from 99.5% to 99.9% is about deploying carefully and having on-call. It costs a small operations function and a working incident-response practice.
Going from 99.9% to 99.95% is about multi-region failover, careful change management, and a culture of pre-mortem thinking. It costs significant engineering time on resilience features that do not directly produce user-visible value.
Going from 99.95% to 99.99% is about geographic redundancy, fault isolation, mature dependency management, and an organization-wide commitment to availability over feature velocity. The cost crosses into “this dominates engineering bandwidth” territory.
Going from 99.99% to 99.999% is mostly research. The number of organizations operating production systems at this level for sustained periods is small.
The cost curve is famously not linear. Each additional “9” is dramatically more expensive than the last. This is why most engineering teams aim for one specific tier and budget against it, rather than trying to push toward higher tiers indefinitely.
What error budgets do
Error budgets — the SRE practice popularized by Google — turn the uptime conversion table into a real operational practice. The premise is simple: if your target is 99.9% and the month so far has been clean, you have remaining budget for some downtime later in the month. If the month has already burned its budget, you should freeze risky changes for the rest of the month and focus on stability.
The error-budget framing makes the abstract uptime number actionable. Instead of “we should be more reliable,” teams can say “we have used 38 minutes of our 43-minute monthly budget; let’s hold the next deploy until next month.” The budget makes trade-offs explicit.
For consumers of an upstream provider, the same framing applies. If your application’s target is 99.9% and your upstream is 99.9%, your error budget is shared with the upstream — every minute of provider downtime that hits your users is a minute of your budget gone, regardless of whether it was your code’s fault. This is one reason architects sometimes push to inflate the dependency target (“we want to use a 99.99% upstream”) to preserve budget for their own internal failures.
The honest version of marketing claims
When a company says “99.99% uptime SLA” in marketing materials, the relevant questions are:
- Over what window? Monthly is the typical convention; some claims are denominated annually, which is much weaker.
- What counts against the percentage? Many SLAs exclude scheduled maintenance, partial outages, and certain geographic regions. The fine print can be the difference between a 99.9% claim and an actual 99.5% experience.
- What is the credit if the SLA is missed? A typical SLA pays out service credits — usually a few percent of the monthly bill — when missed. The credit is not the user’s value of the missing time; it is a refund of the provider’s pricing.
We have written elsewhere about why our uptime number is lower than the marketing figure. The short version: we do not exclude partial outages or minor degradation. Marketing math typically does. Both are defensible; readers should know the difference exists.
The pragmatic recommendation
For most products building on top of AI providers, the right targets and assumptions are:
- Assume your upstream is 99.7–99.9%, not the marketing number. Plan for it.
- Aim your own product at one tier lower than your upstream, not equal. If your upstream is 99.9%, do not promise users 99.9% — they will hold you to it on bad months when both your code and the upstream had problems.
- Communicate honestly about the dependency. Users who know they are downstream of a third-party AI service understand the failure mode. Users who do not know will assume your engineering team is incompetent during outages that were not yours.
- Do not over-invest in additional “9s” beyond what users actually require. A consumer product probably does not need five-nines availability; a payment processor does. Match the tier to the product.
The uptime number is more useful as a budget than as a target. The budget tells you what the realistic envelope is, and the realistic envelope tells you how to architect.