So, the whole world changed at dealnews when Yahoo! linked us. We realized that our current infrastructure was not scaling very well. We had to make a change.

The Problem

Even though we were using all sorts of cool techniques, the server architecture was really still just a bunch of web servers all serving the same content. In addition to that, our existing systems as the time used a pull method. When a request came in, memcache was checked, if the data was not there, it was fetched from our main MySQL server. So, when there is no data in the cache or when it expires, this was very bad. Like when Yahoo! hit us. Some cache item would expire and 60,000 users would hit a page and each page would try and create the cache item.

The Solution

I was tasked with two things. Find a way to handle something like the Yahoo burst and finding a way to store the data we need to generate our web pages that was highly available and would scale. For bursting, I wrote a proxy using apache, mod_rewrite, php and memcached. I have reasons I did it this way that are not relevent to this post. Maybe more on that later.

For the data solution, I considered several things: MySQL replication, writing my own replicating memcached client, and other exotic ideas. One of the semi-exotic ideas for us was MySQL Cluster. We had not used it at all. Some things about it made us gun shy. But, we tested it and were very happy with the results.

Initial Test

With the help of Gentoo, getting a cluster up and running was really, really easy. In fact, it seemed too easy. We ran a cluster on some dev boxes at first. We did some generic testing using the PHPTestSuite from the guys at MySQL Performance Blog. What we found was that while the cluster appeared slower at low concurrent connections, it scaled much better than InnoDB (our prefered storage engine) when the concurrent connections grew.

Application Testing

So, we moved to the next step, testing our application. We discovered early on with cluster that we would have to redesign our application. Our DB was highly relational. Almost no data could be put on the site without data from other tables. We used a lot of joins. We learned (later) that joins in the cluster are not a good idea. Neither are sub-selects. So, we wrote some proof of concept scripts for our application. We were very happy. Very few issues were found. Nothing anywhere near show stopping.

Installation

We ordered our servers. Six new Dell dual-core, dual processor Opterons with a lot of memory. Two would become SQL nodes and the other four would be storage nodes. Our data set is not that large compared to a lot of companies. So, we configured the cluster with 4 replicas. Our main goal is high availability and scalability. I could find nothing in my tests or in the manual that indicated this would be bad for scalability and it should be great for HA.

We rewrote our application (basically, our public web site) to use the new cluster and its new table design. We hit our first snag when we tried to seed the data in the cluster. We got errors from the cluster about its transaction logs not being big enough to handle the inserts. Through the manual, forum posts, the mailing list archives and some blogs I was able to find the correct settings for our needs. I remembered back when I first installed the cluster thinking it was too easy. I now realize that getting a cluster running is easy. Making it run well, is a whole other story.

The second snag was with joins. Our test bed for the cluster was not a cluster. We used a group of servers using InnoDB to test against. That was a mistake. Joins did not work at all with the cluster. We had to back up, rewrite some code and redo some tables. In the end, the design is probably faster on InnoDB or cluster.

Everyday Use

We started using the cluster for every day use about a month ago. I guess 5 months is not bad for starting from nothing to live in production. We have been slowly moving applications to it. We take care each time to monitor the cluster and see that its not throwing new errors. So far, so good. We have about 80% of our page views (40% of our page views are our front page) and about 50% of our end user applications using the cluster now. We are doing caching at the proxy level for a lot of this. But, when tested, the new architecture is much more reliable even without the caching proxies. Some things like our forums will never translate to the cluster. But, they have their own dedicated systems already and are non-critical for our business. They could be shut down if there was a problem with them.

Administration

MySQL Cluster is a whole new animal. Its not like monitoring mysqld, apache or other stuff we already use. It took me a while to get the hang of rolling restarts, brining nodes up and down after crashes, etc. We have had just one crashed node since we switched over to production use. The cluster stayed up and kept serving content. We have written a Nagios monitor to keep track of the nodes' status. It uses ndb_mgm and reports any problems to us.

Feedback

Now, as the title says, I have only been using MySQL Cluster for 5 months. If you are reading this and have more experience and are thinking "What a moron!", please tell me. We are still learning.

Update:

Ronald Bradford had some questions on his blog for me.  I figured I would just answer them here.

You didn’t mention any specific sizes for data, I’d be interested to know, particularly growth and how you will manage that?

We currently have a DataMemory of 4GB and IndexMemory of 2GB.  Based on the crude methods we have to monitor it, I think we are at about 40% capacity.  We are using MySQL Cluster purely as a data store for content on our web site.  So, we can trim the data store down significantly.  If it does not appear on the site, its not in cluster.

You also didn’t mention anything about Disk? MySQL Cluster may be an in-memory database but it does a lot of disk work, and having appropriate disk is important. People overlook that.

Yes, we have U320 15k SCSI drives.  We do use RAID 1 on our servers contrary to some opinions.  We see a lot of drive failures.  About one every 4 months.  Sucks to lose a whole machine just because a $200 drive failed.

You didn’t mention anything about timings? Like how does backups for example compare now to previously.

Well, we don't currently back up the cluster data as it is being copied from our main database already.  Maybe that is a mistake, I don't know.  But, I can't come up with a reason to backup data that is just a copy of another database server.  Also, I have written a PHP class that does parallel writing to multiple servers using transactions.  Everything we write to the cluster also gets written to an "oh shit" mysql server that users InnoDB.  So, in the event we have a total cluster failure, F5 BIG-IP load balancers will send mysql traffic to the InnoDB server.

You didn’t mention version? 5.1 (not GA) is significant improvement in memory utilization due to true varchar support, saving a lot of memory, but as I said not yet production software.

Yeah, I am drooling over 5.1.  But, we are using current Gentoo stable, 5.0.38 I believe.  5.1 looks superior in many many ways.  I can't wait to upgrade.