Replication is much better than cold backups

So, I wrote about the begining of our wild database issues. Since then, I have been fighting a cold, coaching little league football and trying to help out in getting our backup solutions working in top shape.  That does not leave much time for blogging.

Never again will we have ONLY a cold backup of anything.  We were moving nightly full database dumps and hourly backups of critical tables over to that box all day long.  Well, when the filesystem fails on both the primary database server and your cold backup server, you question everything.  A day after my marathon drive to fix the backup server and get it up and running, the backup mysql server died again with RAID errors.  I guess that was the problem all along.  In the end, we had to have a whole new RAID subsystem in our backup database server.  So, my coworker headed over to the data center to pull the all nighter to get the original, main database server up and running.  The filesystem was completely shot.  ReiserFS failed us miserably.  It is no longer to be used at dealnews.

Well, today at 6:12PM, the main database server stops responding again.  ARGH!!  Input/Ouput errors.  That means RAID based on last weeks experience.  We reboot it.  It reports memory or battery errors on the RAID card.  So, I call Dell.  Our warranty on these servers includes 4 hour, onsite service.  They are important.  While on the phone with Dell, I run the Dell diagnostic tool on the box.  During the diagnostic test, the box shuts down.  Luckily, the Dell service tech had heard enough.  He orders a whole new RAID subsystem for this one as well.

There is one cool thing about the PERC4 (aka, LSI Megaraid) RAID cards in these boxes.  They write the RAID configuration to the drives as well as on the card.  So, when a new blank RAID card is installed, it finds the RAID config on the drives and boots the box up.  Neato.  I am sure all the latest cards do it.  It was just nice to see it work.

So, box came up, but this time we had Innodb corruption.  XFS did a fine job in keeping the filesystem in tact.  So, we had to go from backups.  But, this time we had a live replicated database that we could just dump and restore.  We should have had it all along, but in the past (i.e. before widespread Innodb) we were gun shy about replication.  We had large MyISAM tables that would constantly get corrupted on the master or slave and would halt replication on a weekly basis.  It was just not worth the hassle.  But, we have used it for over a year now in our front end database servers with an all Innodb data set.  As of now, only two tables in our main database are not Innodb.  And I am trying to drop the need for a Full-Text index on those right now.

So, here is to hoping our database problems are behind us.  We have replaced almost everything in one except the chassis.  The other has had all internal parts but a motherboard.  Kudos to Dell's service.  The tech was done with the repair in under 4 hours.  Glad to have that service.  I recommend it to anyone that needs it.

Database nightmare

Its 12:30AM (00:30 for you Euros).  I am watching The Daily Show on Tivo.  All is well.  Then the phone beeps.

MySQL Main is critical

SSH?  no.

Digi console?  no.

About a week ago, we had a mysqldump file that was corrupt.  We cleaned it up.  My worst fears came to my mind.  We tried power cycling it.  It did not come back.

While my coworker was dealing with the facilily people, I worked on the backup server.  Had to ensure the last full backup was in place and apply the incremental data.  Suddenly, my SSH connection dies.  OMG.  THAT DUMB A** GUY POWER CYCLED THE WRONG BOX!!! --- FS corrupted.  Damn you ReiserFS!

By now, it's 4AM.  Tech took an hour to get to the rack.  It is 20 feet from his cubicle.  I get in the car.  I am two hours away from the facility.  No sleep.  It is still dark.  I play loud music.  I talk to myself.  I curse the guy that power cycled the wrong box. The sun comes up and it is easier to drive.

I sit here, waiting on the OS to finish installing so I can restore the backup and incremental data again.  Hours of content lost.  The content team is hand writing HTML that other developers are rsyncing around to the servers.

The good news?  All the work done in 2007 to separate our front end and backend worked.  The front end works fine (99%).  Just no new content.  Well, except for the hand done HTML.

Note to self: Get that main database replication working again.  ASAP.

Where Drizzle fits in for me

So, most of you have heard about Drizzle by now.  For those that have not, you can check out many, many blog posts or the Launchpad page.

The thread on Slashdot about Drizzle was quite negative.  Most misunderstand what Drizzle is about.  SQLite is not a good solution when you have 100 web servers.  Let me describe how it I would use it and maybe that will help some understand it.

When it comes to MySQL use, dealnews has two very different use cases.  The first is an enterprise storage system that involves content creation, reporting and data warehousing.  For that layer of our business, we are using more and more advanced features as they become available.  We use triggers and stored procedures.  We use complex data types for specific use cases.  All those features are a big gain.

The other way that we use MySQL is for serving content to our readers.  I have written about this before.  For this purpose, we avoid joins, don't use any advanced features.  We do use replication, indexes and intelligent queries.  We don't (as one slashdot reader claimed) do all of our processing in the code.  That would be stupid.  If you do that you are ignorant.  I will stop talking about that before this becomes a rant.  I do believe in letting MySQL do my work for me.

This is where Drizzle fits in.  To serve content, I don't need stored procedures, triggers, views or any of that other stuff.  The whole database that the front end web servers use is basically a view.  It is a denormalized, prepared version of the real data.  I store objects. But, I have to be able to sort and filter the data in a way that SQL allows me to do.  CouchDB sounds interesting.  Maybe one day it will be there.  It is sill in the optimization phase.

Now, some say that this is just MySQL 3.x all over again.  Well, you clearly have not been listening to the really smart people that are working on Drizzle.  They are doing more than just removing the 4.1 and 5.x features from MySQL.  They are removing things that don't make sense for this use case.  They are adding things that do make sense.  They are replacing parts of the code base where there is a better library or way of doing it.  At this point, they have no feature requirements to meet.  They have no deadlines.  They are making what they think the high volume web world and/or cloud computing needs.  They are making it plugable:  think Apache modules or PHP extensions.  So, if you need feature XYZ that was yanked out, you can add it back in (hopefully) via the internal API.  There is a lot more going on here than just removing "features".

So, I am cheering on the folks working on Drizzle.  I have joined their community and will provide what feedback I can from userland.  I am no C++ coder.  I can read it.  I can debug it.  But, writing it or doing heavy lifting is not in my skill set.  Hopefully I can contribute

Caching and TTL behavior

So, I am working on MemProxy some.  Mainly, I am trying to implement more of the Cache-Control header's many options.  The one that has me a bit perplexed s-maxage.  Particularly when combined with max-age.

s-maxage is the maximum time in seconds an item should remain in a shared cache.  So, if s-maxage is set by the application server, my proxy should keep it for that amount of time at the most.  Up until now, I have just been looking at max-age.  But, s-maxage is the proper one for a proxy to use if it is present.  I do not send the s-maxage through because this is a reverse proxy and, IMO, that is proper behavior for an application accelerating proxy.  However, I do send forward the max-age value that is set by the application servers.  If no max-age is set, I send a default as defined in the script.  Also, if no-cache or no-store is set, I send those and a max-age of 0.

My problem arises when max-age is less than s-maxage.  Up until now, I have sent a max-age back to the client that represents the time left for the cached item in my proxy's cache.  So, if the app server sent back max-age=300 and a request comes in and the cache is found and the cache was created 100 seconds ago, I send max-age-200 back to the client.  But, I was only using max-age before.  Now, in cases where s-maxage is longer than max-age, I would come up with negative numbers.  That is not cool.  The easiest solution would be to always send the original max-age back to the client.  But, that seems kind of lame.

So, my question is, if you are using an application (HTTP or otherwise) accelerator, what would you expect?  If you application set a max-age of 300 would you always expect the end client to receive a max-age of 300?  Or should it count down over time?  The only experience I have is a CDN.  If you watch CDN traffic, the max-age gets smaller and smaller over time until it hits 0.  I have not tried sending an s-maxage to my CDN.  I don't know what they would do with that.  Maybe that is a good test.

UPDATE: Writing this gave me an idea.  If the item will be in the proxy cache longer than the max-age ttl, send the full max-age ttl.  Otherwise, send the time left in the proxy cache.  Thoughts on that?

(thanks for being my teddy bear blogosphere)

Velocity Conference Roundup

As I said before, I was invited to be on a panel at Velocity Conference.  I was delighted to go.  I had never been to San Francisco.  I have been to Portland and Santa Clara several times.  The panel was great.  It was the Brian and photo sharing sites show.  Seriously, it was me (dealnews.com), John Allspaw of Flickr, Don MacAskill of SmugMug and Farhan Mashraqi of Fotolog.  Oh, there was also Shayan Zadeh of Zoosk, a social dating network and Michael Halligan, a consultant from BitPusher.  We all had similar ideas.  I told my Yahoo story.  I told everyone that they should denormalize (or optimize as Farhan prefered) their data to improve performance.  Others agreed.  I have written about my methods for denormalizing normalized data before.  (See pushed cache)  Fun was had by all.

I mentioned John Allspaw above.  He gave a talk on his own as well.  It was good.  The slides are on SlideShare.  He and I see eye to eye on a lot of things.  One thing he says in there that may shock a lot of people is to test using produciton.  I agree fully.  We could have never been sure our infastructure was ready last year without testing the production servers.

I also learned about Varnish at the conference. It is a super fast reverse proxy.  It uses the virtual memory systems of recent kernels to store its cache.  The OS worries about moving things from memory to disk based on usage.  The claim is that the OSes are better at this than any programmer could do (without copying them of course).  It is fast.  The developers are proud.  And by proud I mean cocky.  I have been playing with it.  As you know, I have my own little caching proxy solution.  Varnish is much faster, as I expected.  However, storing cache in memcached is very attractive to me.  Varnish can't do that.  It would likely slow it down a great deal.  MemProxy does do that.  Also, because MemProxy is written in PHP and my application layer is PHP, I can do things at the proxy layer to inspect the request and take action.  Works well for my use.  But, if you are using squid or mod_cache or something, you may want to give Varnish a look.

There was a good bit of information about the client side of performance.  There were folks from Microsoft there talking about IE8.  It looks like IE8 will catch up with the other browsers in a lot of ways.  Yahoo talked about image optimization.  Good stuff in there.  I use Fireworks and it does a pretty good job of making small images.  I am looking more into combining images and making image maps that use CSS.  We use a CDN, but fewer connections is better for users.

There was also a lot of great debate.  SANs rock!  SANs suck!  Rails Scales!  Rails Sucks!  The Cloud is awesome!  The Cloud is a lie!  (lots of cloud)

I had dinner both nights with guys from Six Apart.  Good conversations were had.  I don't know if I am a big vegan fan though.  I mean, the food was good, but it all kinda tasted the same.  Perhaps I ordered poorly.  At dinner on Tuesday I met a guy going to work for Twitter soon.  He is an engineer that hopefully will be another step toward getting them back to 100% again.  Lets keep our fingers crossed.

They did announce that the conference would be held again next year.  I am definitely going back.  Probably two of us from dealnews will go.  OSCON is fun.  MySQL conference is too.  But, more and more, capacity planning and scaling is what I do.  And this conference is all about those topics.

Did you know I am going to be at Velocity?

Well, neither did I until today. HA!

Velocity is a new O'Reilly conference dedicated to "Optimizing Web Performance and Scalability".  It starts next Monday.  Yesterday I was contacted by Adam Jacobs of HJK Solutions about taking part in a panel discussion about what happens when success comes suddenly to a web site.  I think he thought I was in the bay area.  Little did he know I am in Alabama.  But, amazingly, I was able to work it all out so I can be there.  I wish I had known about this conference ahead of time.  It sounds really awesome.  Performance has always been something I focus on.  I hope to share some and learn at the same time.

So, if you are going to be there, come see our panel.

P.S. Thanks to John Allspaw of Flickr for recommending me to Adam.

An Introduction to MySQL - Birmingham, AL

I am giving a talk titled "An Introduction to MySQL" here in Birmingham, AL on June 21, 2008 at 3PM.

I love living in Alabama.  I was born and raised in Huntsville.  However, Birmingham has always seemed a bit behind in technology compared to what I do for a living.  There is good reason.  The industry here is medical, banking, industrial and utilities.  I don't really want my doctors keeping my medical records in an alpha release of anything.  Same goes for my banking and utilities.  But, as this page shows, the companies here are catching up.  So, I am happy to present MySQL to as many people as I can in this town.  Hopefully I will help some folks that have not been exposed to MySQL or any open source for that matter.

The event is part of our local Linux user group's (BALU) planned events.

MemProxy 0.1

MemProxy 0.1 is out!  It has taken me a while, but I have finally gotten around to releasing the code that I credited with saving us during a Yahoo! mention.  It is a caching proxy "server" that uses memcached for storing the cache.  I put server in quotes because it is really just a PHP script that handles the caching and talking to the application servers.  Apache and other HTTP servers already do a good job talking HTTP to a vast myriad of clients.  I did not see any reason to reinvent the wheel.  Here are some of the features that make it different from anything I could find:

  • Uses memcached for storage

  • Serves cache headers to clients based on TTL of cached data

  • Uses custom headers to assemble multiple pieces of cache into one object

  • Minimal dependencies.  Only PHP and pecl/memcached needed.

  • Small code base.  It is just two files, one when settings are cached.

  • Application agnostic.  If the backend is hosted on an HTTP server this can cache it.


Some other things it does that you might expect:

  • Handles HTTP 1.1 requests to the backend

  • Allows TTLs set by the standard Cache-Control header

  • Appears transparent to the client.

  • Sends proper HTTP error codes relating to proxies/gateways

  • Allows pages to be refreshed or removed from cache

  • Allows a page to be viewed from the application server without caching it

  • more....


You can find the code on Google Code.  The code (or something like it rather) has been in use at dealnews for well over a year.  But, this is a new code base.  It had to be refactored for public consumption.  So, there may be bugs.