If you have the sense that the internet has been down a lot in recent days, you are correct about the outages. When one giant provider falls, dozens of the world’s most visited sites are snuffed out in an instant. Recent disruptions have snaked from Cloudflare to AWS, on to Microsoft’s Azure, and to Google Cloud, silencing — at least temporarily — everything from AI chat tools and music streaming to gaming and e-commerce. The pattern isn’t random — it’s structural.

A Few Choke Points Carry A Lot Of The Internet

Cloud and edge: Over the last decade, the web has coalesced around a few cloud and edge platforms. According to Synergy Research Group, AWS, Microsoft Azure, and Google Cloud collectively control well more than half of the global cloud infrastructure market share, with AWS at around the low 30s%, Azure in the mid-20s%, and Google somewhere in the low teens% — a concentration that centralizes mission-critical workloads under very few roofs.

Table of Contents

A Few Choke Points Carry A Lot Of The Internet
Incidents Are Not More Common; The Blast Radius Is
What Is Breaking Beneath the Hood of the Internet
DDoS Attacks Are Growing Larger and Less Expensive
Hidden Interdependencies Spread Failure
What Would Actually Help Reduce Outage Impact

Cloud servers and global network map with outage warnings show widespread internet blackouts

Then layer on content delivery and security. As the “front door,” Cloudflare, Akamai, and Fastly sit in front of an untold number of applications providing DNS, caching, web application firewalls (WAFs), and DDoS mitigation. When that front door misroutes or closes, users can never get to what’s behind it — even if the origin servers are totally fine. The result is a cascade that seems to look like “the internet is down” but isn’t — it’s just a single bottleneck.

Incidents Are Not More Common; The Blast Radius Is

By count alone, major providers are not failing at an increased rate. Cisco ThousandEyes, which monitors the health of the internet around the world, has reported that it hasn’t observed a significant spike in outage frequency among cloud and backbone service providers. What has grown is dependency. At the same time, more sites, APIs, and mobile apps are now hidden behind those same few platforms. And that centralization magnifies the shockwave of each incident.

In other words: the number of fires might be constant, but we’ve constructed neighborhoods where one fire can devastate a block.

What Is Breaking Beneath the Hood of the Internet

Bad configuration pushes: The easiest way to knock over a planet-scale network? An errant change that spreads everywhere in seconds. Providers have admitted past outages caused by routing or traffic engineering updates that went wrong at scale.
BGP route leaks and hijacks: The Border Gateway Protocol is the internet’s map, and it operates largely on trust. A misbehaving or malicious route announcement can drive traffic into a void. Kentik researchers, APNIC analysts, and those from RIPE NCC regularly track the sort of leaks that implode entire regions. Security measures like RPKI are slowly expanding but uneven in their deployment.
DNS and cert gotchas: When authority fails or is rate-limited, domains disappear. An expired TLS cert, or OCSP stapling not working for some reason, can brick your logins or your APIs all over again — or maybe just brick a mobile app — which looks like “everything is slow” until someone checks the cert chain.
Overloaded microservices and third-party APIs: Modern apps are constellations of services. If the identity provider, payments gateway, or feature flag service times out, the entire product can grind to a halt. Rate limiters and circuit breakers exist to contain this, but not all teams adhere rigorously.

DDoS Attacks Are Growing Larger and Less Expensive

The attack surface hasn’t gone quiet either. Cloudflare, Google, and AWS have recently benchmarked record-setting HTTP/2 “Rapid Reset” attacks climbing into hundreds of millions of requests per second that quickly saturate enrolled PoPs to such an extent that capacity almost instantly becomes overwhelmed. Botnets of hijacked IoT devices and misused cloud utilities can create vast floods in little time that will overwhelm all but the stoutest shields.

Providers are getting better at deflecting these barrages, but defense-in-depth is not universal across the long tail of services. When a huge DDoS wave strikes, it’s inevitable that some downstream slowdown occurs. For most people, it simply feels like another service outage.

Hidden Interdependencies Spread Failure

Today’s internet is a spiderweb of SaaS dependencies: authentication platforms, analytics beacons, ad tech, media optimizers, real-time messaging, and observability agents. If any of these common dependencies becomes unresponsive for a time in a browser or mobile app, the page can freeze up or the session might fail to start. That’s why one provider hiccup can interrupt seemingly unrelated banks, retailers, news sites, and games with the same underlying building blocks behind the scenes.

Past incidents underline this fragility. In 2016, an attack targeting Dyn’s DNS infrastructure brought down the online services of numerous major sites. A 2021 edge software bug at Fastly brought news outlets and social platforms to a standstill. All of them turned out to have the same mathematics: concentration and interdependence equal outsized impact.

What Would Actually Help Reduce Outage Impact

Architect for graceful degradation: read-only modes, cached pages, and static content fallbacks when upstream fails. Force non-critical scripts to be async on the client.
Splinter the front door: using multi-CDN and multi-DNS with active-active health checks can increase availability when any one provider has issues. Many companies cite multicloud strategies — those same surveys from Flexera show widespread adoption — but far fewer run the same workload across providers simultaneously.
Harden the routes: encourage BGP best practices with RPKI validation, IRR filtering, and MANRS-compliant routing. These mitigate the risk of a stray route leak wiping out an entire region.
Control change velocity: staged rollouts with automatic rollback, feature flags, and strong pre-prod validation tests are still the best medicine for bad pushes. Error budgets and SLOs prevent teams from moving faster than they have resilience to.
Require transparency: automated postmortems from cloud, CDN, and SaaS providers should enable consistent status telemetry to help customers tune failover logic. Regulators and industry groups are advocating for similar concentration risk planning in essential industries, acknowledging that the reliability of the internet is now a matter of systemic defenses rather than hero stunts.

The bottom line: The internet isn’t breaking more frequently; it’s breaking bigger. The more our lives pass through the same few pipes and front doors, the harder it is to stave off breakdowns at one critical piece or another. Resilience comes from dispersion of risk, engineering for failure, and treating concentrations as a vulnerability to be managed rather than a convenience to be assumed.