
An engineer at a South African Internet service provider accidentally deleted important network settings while rushing out for a cigarette break, causing the largest Internet outage on the African continent at the time.
A few decades ago, one of South Africa’s leading Internet service providers faced an unprecedented crisis due to a seemingly minor mistake by an employee. An engineer named Paton worked as a «backbone network engineer» — a position that required high responsibility and attention to detail.
The company Paton worked for played a key role in keeping the Internet running not only in South Africa but also in neighboring countries. The provider’s DNS servers were authoritative for thousands of domains, including the national top-level domains of several African countries.
One day, Paton was tasked with updating the network blocksA network block is a portion of IP address space allocated for use on a particular network or subnet. and distributing them via BGP
BGP (Border Gateway Protocol) is the primary routing protocol between autonomous systems on the Internet that allows routes to be transferred between different networks. to partners and transit providers. This involved editing access control lists (ACLs)
ACL (Access Control List) — an access control list that defines rules for accessing network resources for different users or groups of users. that regulated user and domain access to specific network resources. Paton usually did this work thoroughly, but this time his colleagues called him out for a smoke break. The desire to join them made the engineer hurry.
When Paton returned from the break, the office was in real chaos. The network operations center was flooded with calls from angry customers. It turned out that the largest Internet outage on the African continent at the time had occurred.
To make matters worse, an unknown person claiming to be a hacker contacted a local tech publication and claimed to be involved in the incident. The news spread quickly, creating additional problems for the company’s management.
An investigation revealed that there had been no security breach. Paton, in his haste, accidentally replaced all existing access control lists instead of simply adding new network units. This resulted in a complex system of routing Internet traffic for a large part of Sub-Saharan Africa ceasing to function.
After the incident, Paton not only restored the ACLs and updated the network blocks, but also developed the company’s first change management protocol — a set of rules and procedures that govern the process of making changes to IT systems to avoid incidents or disruptions.
Source: TheRegister
Spelling error report
The following text will be sent to our editors: