Tuesday, April 14, 2026
Latest:

Amazon DNS Outage Disrupts Internet Access for Millions | TechCrunch

October 20, 2025
Amazon DNS Outage Disrupts Internet Access for Millions | TechCrunch
Share

Summary

The Amazon DNS outage was a significant internet disruption event in which Amazon Web Services (AWS) experienced widespread service interruptions due to failures in its Domain Name System (DNS) infrastructure. Beginning in the early hours and lasting several hours, the outage primarily affected AWS’s US-EAST-1 region, located in Northern Virginia, a critical hub that supports numerous cloud services globally. DNS is essential for translating human-readable domain names into IP addresses, enabling users to access websites and online applications; the outage’s impact underscored the crucial role DNS plays in internet functionality.
The failure originated from DNS resolution problems affecting the DynamoDB API endpoint, a key database service within AWS, which triggered cascading disruptions across many high-profile websites and applications including Snapchat, Reddit, Disney+, Coinbase, and Amazon’s own Alexa and Ring services. The outage also affected platforms across multiple sectors, from social media and gaming to finance and government services, revealing the extent of global reliance on AWS cloud infrastructure. Importantly, cybersecurity experts confirmed that the incident was caused by an operational fault rather than any malicious attack.
AWS responded promptly by identifying the root cause and implementing mitigations, restoring most services within hours. The incident highlighted vulnerabilities inherent in cloud service dependencies and single-region failures, sparking discussions around the importance of redundancy, multi-cloud strategies, and resilient DNS architectures to prevent similar large-scale disruptions in the future. It also emphasized the need for robust disaster recovery planning and proactive infrastructure management within critical internet services.
Overall, the Amazon DNS outage served as a stark reminder of the interconnected nature of modern internet ecosystems and the cascading effects that can arise from failures in foundational cloud services, prompting industry-wide reassessments of best practices in DNS management and cloud infrastructure resilience.

Background

The Amazon DNS outage was caused by an operational issue at Akamai, a major internet security and content delivery network provider, which impacted Amazon Web Services (AWS) and its customers starting at around 11 a.m. The Domain Name System (DNS) is a critical internet infrastructure that translates human-readable domain names into numerical IP addresses, enabling browsers and other applications to locate and load websites effectively. Disruptions to DNS services can prevent users from accessing online content, as their devices are unable to resolve domain names to the correct IP addresses.
During the outage, Amazon experienced widespread service disruptions affecting several AWS services, including EC2, a popular virtual server platform used by many companies to host their online applications. Although there was speculation regarding the nature of the outage, cybersecurity experts confirmed there was no evidence of a denial-of-service attack or any form of malicious hacking causing the problem.
To mitigate the effects of the outage, Amazon recommended that users and companies flush their DNS caches — temporary storage files that hold DNS query results — to help restore access more quickly. This incident highlighted the importance of cloud-based DNS solutions, which offer significant scalability and resilience advantages over traditional DNS systems by dynamically handling traffic spikes and supporting millions of queries per second. Nonetheless, even with such infrastructure, operational issues at critical service providers can still lead to substantial internet access disruptions.

The Outage Event

On a day marked by widespread internet disruptions, Amazon Web Services (AWS) experienced a significant outage centered in its US-EAST-1 region, based in Northern Virginia. This outage, beginning around 3 a.m. Eastern Time, led to numerous popular websites, apps, and services becoming inaccessible globally. The incident primarily stemmed from DNS resolution failures affecting the DynamoDB API endpoint, a critical component of AWS’s cloud infrastructure.
The outage’s impact was extensive, affecting a diverse range of platforms including social media and messaging apps like Snapchat, Signal, and Reddit; entertainment services such as Disney+, Hulu, and Fortnite; financial and trading applications including Coinbase, Robinhood, and Venmo; as well as Amazon’s own services like Alexa and Ring doorbells. Additionally, major corporations such as McDonald’s, Verizon, and Roblox reported service interruptions during the event. Users worldwide experienced difficulties accessing these services, highlighting the heavy reliance on AWS’s cloud ecosystem.
Amazon acknowledged the root cause as issues with DNS, the system responsible for translating web addresses into IP addresses, which disrupted the normal functioning of their DynamoDB database service and other dependent features like IAM updates and DynamoDB Global tables. Despite initial concerns, data integrity was reportedly maintained as the problem did not originate within the database itself.
The outage demonstrated the vulnerability of even the largest cloud environments to infrastructure-level failures. While some glitches resolved quickly, DNS-related problems prolonged downtime for many users. By the evening hours, AWS confirmed that the outage had been fully mitigated and that most services relying on the US-EAST-1 region were returning to normal. The event underscored the critical importance of DNS stability in maintaining global internet access and the cascading effects that cloud service disruptions can have across the digital ecosystem.

Causes of the Outage

The Amazon DNS outage primarily stemmed from significant error rates encountered during requests to the DynamoDB API in the AWS US-EAST-1 region. Investigations revealed that the root cause was related to DNS resolution failures affecting the DynamoDB API endpoint, which played a critical role in the cascading service disruptions. DNS, or Domain Name System, is essential for translating human-readable domain names into numerical IP addresses, enabling internet routing and connectivity; thus, any malfunction in DNS services can severely impair web communication.
This outage highlighted the vulnerability of even the largest cloud infrastructures to failures within a single critical service component, in this case, DNS. The problems began with DNS resolution issues within AWS’s Northern Virginia data centers, disrupting multiple services including DynamoDB and EC2, and propagating widespread impact across numerous high-profile websites and applications. The outage was not linked to any cyber-attack, as confirmed by experts, but rather to an operational fault in AWS’s DNS handling.
Furthermore, the outage underscored the risks associated with reliance on a single DNS server or resolver without sufficient redundancy. Industry best practices recommend deploying redundant DNS servers across multiple regions with failover mechanisms to mitigate the risk of a single point of failure. Cloud-based solutions that host DNS services across various data centers enhance resilience by ensuring seamless query resolution despite localized outages or hardware failures. The AWS incident illustrated the consequences when such redundancy and failover are insufficient or compromised, leading to significant downtime affecting millions of users and multiple global services.

Impact

The Amazon DNS outage caused widespread disruption across numerous popular websites, applications, and online services, highlighting the critical dependency of global businesses on Amazon Web Services (AWS) infrastructure. Social media platforms such as Snapchat were significantly affected, experiencing service interruptions during the outage period. Additionally, cloud-based games including Roblox and Fortnite faced disruptions, while crypto exchange Coinbase reported that many users were unable to access their services.
The issue stemmed from a failure in Amazon’s DynamoDB, a core infrastructure database service integral to AWS operations. This failure resulted in increased error rates and latency across more than 20 other dependent services, amplifying the scale of the outage. Graphic design tool Canva also reported significantly increased error rates impacting its functionality, attributing the problem to the underlying cloud provider. Similarly, other affected services included Discord, Feedly, Politico, Shopify, and League of Legends, illustrating the broad impact of the outage on diverse sectors.
The outage not only disrupted direct user access but also impaired status pages designed to provide real-time warnings and track ongoing issues, with at least one incident where even a status page was rendered unavailable. The incident has drawn attention to the strategic vulnerability posed by overreliance on a single cloud provider, underscoring calls for digital resilience through multi-cloud strategies, redundancy, and backup systems to mitigate future risks.
Experts have emphasized the importance of planning for such technical failures with the same rigor as cybersecurity threats, recommending that organizations adopt cloud architectures that minimize provider-specific risks and align with cybersecurity best practices. The incident serves as a stark reminder of the interconnected nature of modern internet infrastructure and the cascading effects that can arise from failures within foundational cloud services.

Response and Recovery

Following the widespread disruption caused by the Amazon Web Services (AWS) DNS outage in the US-EAST-1 (Northern Virginia) region, AWS engineers quickly identified the root cause as a DNS-related issue and initiated “initial mitigations” by 7:22 pm AEST. These early measures led to “significant signs of recovery” across most affected AWS services, including global platforms that rely on the US-EAST-1 infrastructure. The physical infrastructure in Northern Virginia, which supports a large number of AWS services, was the primary region impacted during the outage.
AWS maintained communication with users through its Health Dashboard, providing ongoing updates about the recovery process. By 8:03 pm AEST, the company confirmed that the outage had been “fully mitigated” and that all services should be operating normally, although some requests might still experience throttling as work toward a complete resolution continued. Despite the general restoration of services, some platforms like Ring reported persistent service disruptions, highlighting the uneven pace of recovery among affected services.
Third-party companies also responded to the outage. For instance, the Hugging Face Hub announced that its services were back online following the resolution of the AWS disruption. Meanwhile, internet monitoring firms such as ThousandEyes attributed part of the outage’s impact to issues experienced by Akamai, which was simultaneously investigating and addressing its own related service disruptions.
The outage prompted considerable sympathy for the technical teams working overnight to restore service, with public comments acknowledging the challenges faced by engineers who were largely waiting for DNS propagation and system recovery. Overall, AWS’s rapid identification of the root cause and the swift deployment of mitigations were critical in restoring internet access for millions affected by the incident.

Investigations and Findings

At 2:01 AM PDT (9:01 UTC), Amazon’s technical team identified that the outage was potentially caused by issues related to DNS resolution of the DynamoDB API endpoint. They immediately began working on multiple parallel efforts to expedite recovery. Amazon later rated the severity of the outage as “impacted,” acknowledging the widespread disruptions affecting numerous services.
The outage had a significant impact internationally, notably affecting services in the UK such as Lloyds Banking Group’s applications and various government platforms, including HMRC. Amazon responded by directing inquiries to its Health Dashboard, which provided ongoing updates and confirmed that the underlying DNS problem had been fully mitigated by 3:35 AM PDT (11:35 AM UK time), with most AWS services returning to normal operations shortly thereafter.
Other industry players also weighed in during the incident. Akamai confirmed that it was actively investigating the issue but refrained from explicitly linking its own outage to the broader disruptions. However, ThousandEyes, an internet monitoring firm acquired by Cisco, attributed the outage to Akamai’s problems. Despite these claims, there was no evidence that the outage was caused by denial-of-service attacks or other forms of malicious hacking.
The investigations highlighted that DNS, while seemingly simple, is a frequent target for cyberattacks due to its critical role in directing internet traffic. Vulnerabilities in DNS can lead to data breaches, malware infections, or extensive outages if not properly secured. Consequently, the event underscored the importance of implementing robust security measures and adopting multi-provider DNS strategies to minimize risks and improve resilience against provider-specific failures.

Lessons Learned and Future Prevention

The widespread Amazon DNS outage highlighted several critical lessons regarding the reliance on single cloud providers and the vulnerabilities inherent in the Domain Name System infrastructure. One major takeaway is the importance of building resilience into essential services by avoiding dependence on a single cloud region or vendor. Experts emphasize that redundancy, geographical distribution of resources, and rigorous testing of emergency scenarios should be standard practice rather than optional measures to prevent total service disruptions.
The outage demonstrated that a failure in the DNS, which is fundamental to translating human-readable domain names into IP addresses, can have cascading effects across a wide range of internet services, from major websites to even status pages designed to track outages. Amazon recommended that users flush their DNS caches to help restore connectivity more quickly, underscoring how temporary local data can affect the speed of recovery during such events.
Looking ahead, the adoption of cloud-based DNS solutions offers advantages such as unparalleled scalability and dynamic handling of traffic spikes, which are crucial for maintaining service stability. However, this scalability must be paired with robust security practices and configurations aligned with cybersecurity best practices to minimize risks from potential threats.
Furthermore, the incident serves as a reminder that while cloud providers occasionally experience outages, they generally offer greater resilience and security compared to in-house servers, provided systems are architected with fail-safes and diverse infrastructure. Organizations are encouraged to design their systems with multiple layers of redundancy and to regularly review and update their disaster recovery plans to mitigate the impact of future outages. This proactive approach can help ensure continuity of services even when faced with provider-specific or regional failures.


The content is provided by Harper Eastwood, 11 Minute Read

Harper

October 20, 2025
Breaking News
Sponsored
Featured

You may also like

[post_author]