An extensive outage at Amazon Web Services cascaded across the internet and caused many students and instructors to experience delays in accessing Canvas, the popular learning management system. Instructure, the firm responsible for Canvas, admitted that there were continued problems impacting Canvas and Mastery Connect as its engineers worked with AWS to stabilise services.

What happened and who was affected by the outage

Users complained of slow loading, not being able to log in, and errors when trying to submit assignments, with complaints focusing on hotspots at universities. Canvas availability issues were cited in tech bulletins issued by large public university systems, such as those of the California State University system and the Big Ten. Crowdsourced tracking sites also showed an extremely high increase of reports, specifically with Canvas outage reports during the AWS window.

Table of Contents

What happened and who was affected by the outage
How Canvas relies on AWS cloud infrastructure services
Why Some Users Were Able to Log In and Others Weren’t
The scale and stakes for education amid cloud outages
What to watch from Instructure and AWS next

AWS service outage disrupts Canvas LMS access

As several AWS services began to recover, Canvas use was still erratic for some.

Such a staggered recovery is par for cloud incidents: just as core infrastructure recovers, application layers that rely on several cloud services can remain in a degraded state while caches rebuild, database connections normalise, and background jobs make up lost ground.

How Canvas relies on AWS cloud infrastructure services

Instructure runs Canvas on AWS using a combination of compute, databases, storage, and content delivery solutions. An outage that takes down an entire AWS region or control-plane services can affect all SaaS platforms with any dependencies in the blast radius. Even if our Canvas app servers are live, nearby services—file storage where we upload user files, the load balancer or DNS service hardware we depend on for systems to access each other via IP address, or any message queues run by people in the data center—can slow down crucial workflows: authentication, file processing, event delivery.

The subtle part is the chaining of dependencies. A momentary blip in AWS identity or networking can topple new session authentication without affecting users doing work under established sessions. File-heavy operations tend to be the first to fail, since they hit more services (storage, antivirus scanning, media processing, CDN invalidations). That’s why some students were able to access pages but unable to upload a PDF, or instructors could grade but not post updated files again.

Why Some Users Were Able to Log In and Others Weren’t

Access consistency was mostly a function of session state and identity routing. Users who had valid sessions cached in their browsers or mobile applications were frequently skipping over the most brittle step, new authentication handshakes. Those starting new logins encountered it more often.

Institutional single sign-on also becomes relevant. Many campuses have identity providers like Azure AD, Okta, or Shibboleth. If an organization’s identity service remained responsive and Canvas endpoints were available, SSO could work. But when Canvas dependencies or certain AWS regions were pushed, SAML or OAuth redirects timed out, leading to a patchwork of experiences between campuses — even on the same university network.

Canvas LMS disruption caused by AWS service outage affecting online learning

The scale and stakes for education amid cloud outages

According to Instructure’s investor materials, “Canvas is firmly established in higher education and K–12 and reaches thousands of educational institutions and tens of millions of learners.” That concentration also means one single cloud event can interrupt coursework, quizzes, and grading across entire systems.

Some context: AWS commands about a one-third share of the worldwide market for cloud infrastructure in revenue, according to Synergy Research Group. When a provider of that scale wobbles, the blast radius involves airlines, payments, media and, increasingly now, classrooms. Industry studies conducted by the Uptime Institute show that if an organization experiences a major outage, it is likely to incur six-figure costs; David Bankoski of the Institute for Open Economic Networks at the University of Maryland expresses “cost” in terms of missed tests, delayed teacher feedback and students frantically asking for extensions.

What to watch from Instructure and AWS next

Look for Instructure to release a post-incident review that will include root cause, remediations and/or roadmap items such as creating redundancy across regions, automated failover and more graceful degradation of file uploads and submissions. (AWS typically follows up RCAs with a Post-Event Summary on its Health Dashboard, identifying the root cause and corrective actions.)

For both providers, the essential questions are whether we can attribute these errors to a single-region bottleneck (in other words, were Canvas services limited by some particular resource in one region), what were priority settings for dependencies in failover scenarios, and what effect rate limits or throttling had on log-in spikes. The answers will inform campus IT leaders as they consider redundancy for learning tools and authentication.

In the short run, faculty can help minimize disruption by giving students more time to complete assignments and offering offline access to readings and rubrics. They may also reiterate urgent announcements that were made in a virtual classroom through email or text message. Students must maintain local copies of important files and confirm submissions once service is restored. Including these various steps to manage risk, though none eliminates it, helps reduce the window of opportunity for a cloud outage to interrupt learning.

Bottom line: Yes, Canvas did experience disruptions that lined up with the AWS outage based on its cloud footprint and dependencies. As the platforms stabilise, the enduring work is resilience — spreading risks across regions, trimming single points of failure and making sure that when the cloud wobbles, classrooms don’t topple along with it.