The reason why Facebook, Instagram and Whatsapp were down for 6 hours

Facebook, Instagram and Whatsapp down for almost 6 hours. Know why

Facebook, Instagram and whatsapp face global outage

Around 11.40 Eastern time (9 PM IST), users of Facebook, Instagram were not able to refresh their feeds and no messages were loading on Whatsapp too.

The tech giant Facebook which owns Facebook, Instagram and Whatsapp was experiencing some internal problems that led to the global outage of these platforms for about 6 hours.

Facebook platforms are used by a number of people who took to Twitter to share the news that they were not able to access these platforms. Facebook has nearly 2 billion active users and many of these users depend on these platforms for their businesses and communications.

But what caused this outage?

Facebook has officially come out and issued an apology along with a brief statement of what caused the outage.

Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.

Facebook said in its report

Many security experts and people on the internet citing sources from inside Facebook, traffic analysis pointed out that the problem would have to do with the DNS or BGP.

What is BGP?

Internet is not exactly easy or simple to explain. It’s a complex world and the best way to describe BGP would be with an example of an air traffic controller.

Internet is a complex network. Source

BGP, Border Gateway Protocol is a system that is responsible for your data to reach where it needs to reach as quickly as possible. Let’s say you (your data) wants to reach Facebook, there are so many ways your packet can reach the end destination.

Since there are lots of different ISP (Internet Service Providers), backbone routers and servers, your data can take a lot of different routes. It’s the BGP’s job to show your packet (flight) the quickest route to Facebook.

Let’s take an example. Let’s say you live on island A (it means your ISP is A) and Facebook is on island B. Now A and B aren’t connected directly.

But, A is connected to island C and B is connected to island D. Both the islands B and D are connected to island X. So the route for your data becomes (A -> C -> X -> B). This is exactly what BGP does.

Of course, this explanation is oversimplified and there are many other factors involved in this. In real-world these small islands would be called autonomous systems.

Since the internet is too huge and keeps changing always it is not possible to map the entire internet. As a shortcut, these autonomous systems share their maps and just copy the maps of the connected networks (islands).

Just like how you can reach the wrong destination if your GPS doesn’t work properly, BGP can go wrong too. If the maps aren’t updated properly, your data will reach some dead end.

Facebook’s report on the issue doesn’t reveal much but Cloudfare CTO reported that this might have been due to some bad updates to the BGP that took Facebook off the internet.

If you want to learn more about the details on how this works, you can read this article by Cloudfare which explains in technical details about this specific Facebook incident.

It was also reported that many employees were deployed to data centres in an effort to fix the issues. It isn’t easy to solve such problems and now that services are back Facebook has apologised and they blamed a “faulty configuration change” and not any devious hacks.