InsightsApril 21, 2026· 8 min read· By Marcus Webb

Cloudflare Outages Explained: A Complete History and How to Stay Resilient

A complete breakdown of major Cloudflare outages — including the 2019 regex incident, the 2022 BGP routing failure, and the 2024 control-plane outage — with practical strategies for surviving the next one.

Why Cloudflare Outages Feel Like the Internet Is Down

Cloudflare sits in front of roughly 20% of all websites, handles DNS for a similarly large share, and provides DDoS protection, CDN, Zero Trust access, and Workers compute for hundreds of thousands of companies. When Cloudflare has a bad day, the effect is categorical: login flows break, API calls time out, marketing sites return 5xx errors, and corporate VPNs lose access to internal tools simultaneously.

This concentration is also what makes Cloudflare outages so instructive. Because Cloudflare is load-bearing for such a broad swath of the internet, each major incident reveals a new class of shared-fate risk that engineering teams need to plan for.

The Most Impactful Cloudflare Incidents

July 2019 — The regex incident. A regular expression deployed to Cloudflare's WAF caused CPU exhaustion across the edge fleet, taking a large portion of the Cloudflare network offline for 27 minutes. Sites served through Cloudflare returned 502 errors globally. Lesson: a seemingly harmless content change can take down a global network if it runs on a shared hot path.

June 2022 — BGP route leak. A misconfiguration during a network change caused Cloudflare to withdraw routes for 19 data centers, affecting customers in Europe, Asia, and the Americas for about 90 minutes. Lesson: network changes are the single highest-risk category of production work, even at hyperscale.

June 2023 — Dashboard and API control-plane outage. A Cloudflare data center serving control-plane services went offline; the customer-facing edge continued serving traffic, but dashboards, API, and zero-trust policy changes were unavailable for roughly 2 days. Lesson: edge uptime is not control-plane uptime — your CDN can be serving traffic fine while you are unable to change any configuration.

November 2024 — Logs and analytics outage. A buffer overflow in the Logpush pipeline caused extended degradation for customers relying on real-time logs and analytics dashboards. Traffic was unaffected, but observability was blind for several hours. Lesson: degraded observability is an incident, even when the data plane is healthy.

How to Monitor Cloudflare Properly

The common mistake with Cloudflare monitoring is treating it as a single service. In practice, Cloudflare publishes dozens of independent component statuses: CDN, DNS, Workers, R2, D1, Access, Gateway, Tunnel, Magic Transit, Dashboard, API, and regional edge PoPs. A Cloudflare Dashboard outage does not necessarily mean your CDN is impacted, and vice versa.

PulsAPI tracks each Cloudflare component separately. This matters because the right response depends entirely on which component is degraded. If CDN is impacted, your customer-facing site is degrading — page on-call. If Dashboard is impacted but CDN is healthy, you don't need to page anyone, but you should hold off on planned configuration changes.

Pair component-level monitoring with synthetic checks from your own edge: PulsAPI probes Cloudflare-fronted endpoints from multiple geographic regions, giving you first-hand data about whether your users are actually seeing errors — which almost always precedes an official Cloudflare acknowledgment by 10 to 30 minutes.

Architectural Patterns for Cloudflare-Dependent Systems

The most resilient Cloudflare-dependent architectures share three characteristics. First, they maintain an alternative DNS and CDN path. NS1, AWS Route 53, or Fastly can be preconfigured as a warm failover; the switch is a DNS update, not a re-architecture. Many teams never use this path — but in the rare major incident, having it ready is the difference between 15 minutes of degradation and 6 hours.

Second, they do not depend on the Cloudflare Dashboard for production operations. All configuration changes go through Terraform or the Cloudflare API, version-controlled and reviewable. If the Dashboard is down for 2 days, your team can still operate.

Third, they treat Cloudflare Workers as application code subject to the same redundancy expectations as any other runtime. If a business-critical path runs in Workers, have a fallback for when Workers is degraded — even if that fallback is a graceful error page rather than a full secondary compute environment.

About the Author

Marcus WebbHead of Product

Marcus leads product at PulsAPI, where he focuses on making operational awareness effortless for engineering teams. Previously at Datadog and PagerDuty.

Start monitoring your stack

Aggregate real-time operational data from every service your stack depends on into a single dashboard. Free for up to 10 services.

Create Free Dashboard

InsightsThe Biggest AWS Outages in History: A Complete Timeline and What Engineers Can Learn9 min read GuidesHow to Set Up Real-Time Status Monitoring for Your Entire AWS Infrastructure7 min read DevOpsHow to Build an Incident Response Runbook for Third-Party Cloud Outages8 min read

Back to all articles