Рубрики NewsIT businessWTF

A rookie sysadmin in the 2000s «put» the Amazon website down for 3 hours due to a single print

Published by Kateryna Danshyna

A small typo in a backup system configuration file cost Amazon three hours of downtime — financial losses could have been in the millions if the story had happened now instead of 20 years ago.

This story was originally published by Register as part of the retrospective section «Who, Me?», which collects interesting work memories of IT professionals

This time we are talking about a guy named Ken, who 20 years ago got a job as a Linux system administrator at Amazon.com. According to the man himself at the time, he was «completely unqualified» for the job, but his previous experience in Solaris helped him to successfully pass the interview. Ken quickly learned the basics of Linux, but while already on staff, he discovered that the Red Hat Enterprise Linux environment that existed at the time was very different from Solaris. Despite his lack of knowledge, his boss assigned Ken to upgrade the workflow for backing up to tape drives.

«I spent months planning and testing because the configuration files changed with this update and we had to create new ones and release them with the update,» says Ken. «I created those files and ran all the necessary tests. Everything seemed to be in order, and the day came when we pushed the» button.

For the first few hours, everything seemed to be working as planned, so the sysadmin gave himself a big pat on the back and headed home. At around 7pm, Ken’s pager went «crazy» and within minutes, the sysadmin joined a conference call. Ken recalls that the meeting was attended by all of Amazon’s top executives at the time, including the company’s then-CEO Jeff Bezos. Everyone was interested in one single question — what happened to the site?

Ken and his colleagues started checking and found that the main database of the Amazon online store had stopped working, despite the fact that the huge cluster of computers that provided this work «felt» normal. The man knew that the backup program he had created copies the database logs to tape and then has to delete them from the servers. It turned out that the latter process did not complete because Ken had made a typo.

«There was no problem for the first few hours, but eventually the section that stored the logs filled up and the database just gave up and started complaining that no one loved it anymore», — says Ken.

After making sure that none of the log files were lost, Ken deleted them in the cluster and watched the database come back to life — as did Amazon.com. The man corrected the typo in the configuration file and headed home into a restless night, thinking about finding a new job.

«The next morning I arrived at the office and saw my manager standing by my parking spot, which didn’t seem like a good sign,» Ken recalls. «I got out of the car and walked up to him. He was silent for about 15 seconds and looked at me intently. Then he smiled broadly and said: “Congratulations, you’ve lost your virginity.” We went into the office, where everyone was making fun of me for a long time».

Контент сайту призначений для осіб віком від 21 року. Переглядаючи матеріали, ви підтверджуєте свою відповідність віковим обмеженням.

Cуб'єкт у сфері онлайн-медіа; ідентифікатор медіа - R40-06029.