Phones started beeping, mayhem ensued. The first thing we looked at was the database. Is some MyISAM table locked? Is there a hung log processor running? The database was busy, but it looked odd. The web servers were going nuts.
As we soon discoverd, we (dealnews.com) were mentioned in an article on Yahoo!. At 5Pm Eastern, that article made it to be the featured article on the Yahoo! front page. It was there for an hour. We went from our already high Christmas traffic of about 80 req/s for pages and 200 req/s for images to a 130 req/s for pages and 500 req/s for images. We survived with a little tinkering. We have been working on a proxy system and this sounded like as good a time as any to try it out. Thanks to the F5 BIG-IP load balancers, we could send all the traffic from Yahoo! to the proxy system. That allowed us to handle the traffic. Just after 6PM, Yahoo! changed the featured article and things returned to normal.
Until 9PM. It seems the earlier posting by Yahoo! must not have went out to all their users. Because at 9PM the connections came back with a vegance. We started hitting bottleneck after bottleneck. We would up one limit and another would bottleneck would appear. The site was doing ok during this time. Some things like images were loading slow. That was a simple underestimation of having our two image servers set to only 250 MaxClients. Their load was nothing. We upped that and images flowed freely once again. Next we realized that all our memcached daemons were maxed out on connections. So, again, we up that and restart them. That's fixed now. Oh, now that we are not waiting on memcached, the Apache/PHP servers are hitting their MaxClients. We check the load and the servers are not stressed. So, up those limits go. The proxy servers were not doing well using a pool of memcached servers. So, we set them to use just one server each. This means several copies of the same cache, but better access to the data for each server. After all that, we were handling the Yahoo! load.
In the end, it was 300 req/s for pages and 3000 req/s for images. It lasted for over 2 hours. The funny thing is, we have been talking all week about how to increase our capacity before next Christmas. Given our content, this is our busy time. Our traffic has doubled each December for the last 3 years. At one point, during the Yahoo! rush, the incoming traffic was 10MB/s. A year and a half ago, that was the size of our whole pipe with our provider. Luckily we increased that a while back.
The silver lining is that I got to see this traffic first hand for over 2 solid hours. This will help us to design our systems to handle this load and then some all the time in the future. In some ways it was a blessing.
Digg? Slashdot? They can bring traffic for sure. We have been on both several times. But wow, just getting in the third paragraph of an article that is one page deep from the Yahoo! front page can bring you to your knees if you are not ready. But, in this business, I will do it again tomorrow. Bring it on.
Update: Yahoo! put the article on their front page again on the 26th. Both our head sys admin and I were off. No phones went off. We handled 400 req/s for the front pages and 1500 req/s for images. This lasted for 3 hours. Granted, some things were not working. You could not change your default settings for the front page for example. But, all in all, the site performed quite well.