Velocity 2010 In Review

I just got back from Velocity for the third straight year. I have been to all three of them which is kind of a neat little club to be in. The first one only had maybe 300 people. This year there were over 1,000 attendees. Registration was shut down by the fire code for the rooms we were using. Most sessions had standing room only. It was awesome.

The people that talk at Velocity are really smart. I am always humbled by the likes of John Allspaw. He and I see eye to eye on a lot, but he is so much better at explaining to people and showing them how to make the ideas work. I wish I had his charisma when at the podium. I was lucky enough to write a chapter in a book for John this year. He and Velocity co-chairperson Jesse Robbins organized and authored a book titled Web Operations that debuted at the conference. I basically just told and expanded on my Yahoo story. John loves that story for some reason. I was happy to be a part of it. So many smart people in that book.

The IE9 technology preview dropped while we were there. HTML 5, CSS 3 and more in there. One feature where Microsoft is actually ahead of the curve is in a new DOM level measurement feature. Basically they expose statistics via the DOM about the time it takes to do different things in the page. The other browser vendors in attendance (Google and Mozilla) vowed to support the same data. Another big advancement of IE9 is the heavy use of the GPU for rendering the pages. They have a real advantage here. They are the only browser vendor that is now locked to one operating system. IE9 will require Vista or higher. They can really max out the system for faster rendering.

As usual some of the best content was in the hall ways and bar. We hung out with Theo Schlossnagle from OmniTI and talked about Reconnoiter. It is a kind of Cacti/Ganglia/Nagios all in one. I got to see the Six Apart guys again this year. That is becoming an annual thing. I shared our new Gearman assisted proxy with them. They do some similar stuff for TypePad. More on that proxy later this year. I met a guy from CloudTest. It sounds like a really good use of on-demand cloud resources. I am gonna talk with them about some possible testing.

Membase also dropped while we were there. Most of the persistent key/value stores I have used have disappointed me or just been way too complex for our needs. We don't want a memcached replacement. It does its job damn well. I just need a place to store adhoc data for various applications. Membase is promising because the guys that wrote it are core memcached contributors. There is a company behind it, so it is not as inviting as Drizzle. But, the code is on GitHub so it is more open than say MySQL. Time will tell.

If you have not been to Velocity I encourage you to go next year. It is right for all types of people in the web business. Developers can learn about performance in new ways that will change they way they write code. Operations can learn techniques to make their work day much less painful. Everyone will learn how to empower their business to achieve the goals of the business.

Replication is much better than cold backups

So, I wrote about the begining of our wild database issues. Since then, I have been fighting a cold, coaching little league football and trying to help out in getting our backup solutions working in top shape.  That does not leave much time for blogging.

Never again will we have ONLY a cold backup of anything.  We were moving nightly full database dumps and hourly backups of critical tables over to that box all day long.  Well, when the filesystem fails on both the primary database server and your cold backup server, you question everything.  A day after my marathon drive to fix the backup server and get it up and running, the backup mysql server died again with RAID errors.  I guess that was the problem all along.  In the end, we had to have a whole new RAID subsystem in our backup database server.  So, my coworker headed over to the data center to pull the all nighter to get the original, main database server up and running.  The filesystem was completely shot.  ReiserFS failed us miserably.  It is no longer to be used at dealnews.

Well, today at 6:12PM, the main database server stops responding again.  ARGH!!  Input/Ouput errors.  That means RAID based on last weeks experience.  We reboot it.  It reports memory or battery errors on the RAID card.  So, I call Dell.  Our warranty on these servers includes 4 hour, onsite service.  They are important.  While on the phone with Dell, I run the Dell diagnostic tool on the box.  During the diagnostic test, the box shuts down.  Luckily, the Dell service tech had heard enough.  He orders a whole new RAID subsystem for this one as well.

There is one cool thing about the PERC4 (aka, LSI Megaraid) RAID cards in these boxes.  They write the RAID configuration to the drives as well as on the card.  So, when a new blank RAID card is installed, it finds the RAID config on the drives and boots the box up.  Neato.  I am sure all the latest cards do it.  It was just nice to see it work.

So, box came up, but this time we had Innodb corruption.  XFS did a fine job in keeping the filesystem in tact.  So, we had to go from backups.  But, this time we had a live replicated database that we could just dump and restore.  We should have had it all along, but in the past (i.e. before widespread Innodb) we were gun shy about replication.  We had large MyISAM tables that would constantly get corrupted on the master or slave and would halt replication on a weekly basis.  It was just not worth the hassle.  But, we have used it for over a year now in our front end database servers with an all Innodb data set.  As of now, only two tables in our main database are not Innodb.  And I am trying to drop the need for a Full-Text index on those right now.

So, here is to hoping our database problems are behind us.  We have replaced almost everything in one except the chassis.  The other has had all internal parts but a motherboard.  Kudos to Dell's service.  The tech was done with the repair in under 4 hours.  Glad to have that service.  I recommend it to anyone that needs it.

Velocity Conference Roundup

As I said before, I was invited to be on a panel at Velocity Conference.  I was delighted to go.  I had never been to San Francisco.  I have been to Portland and Santa Clara several times.  The panel was great.  It was the Brian and photo sharing sites show.  Seriously, it was me (dealnews.com), John Allspaw of Flickr, Don MacAskill of SmugMug and Farhan Mashraqi of Fotolog.  Oh, there was also Shayan Zadeh of Zoosk, a social dating network and Michael Halligan, a consultant from BitPusher.  We all had similar ideas.  I told my Yahoo story.  I told everyone that they should denormalize (or optimize as Farhan prefered) their data to improve performance.  Others agreed.  I have written about my methods for denormalizing normalized data before.  (See pushed cache)  Fun was had by all.

I mentioned John Allspaw above.  He gave a talk on his own as well.  It was good.  The slides are on SlideShare.  He and I see eye to eye on a lot of things.  One thing he says in there that may shock a lot of people is to test using produciton.  I agree fully.  We could have never been sure our infastructure was ready last year without testing the production servers.

I also learned about Varnish at the conference. It is a super fast reverse proxy.  It uses the virtual memory systems of recent kernels to store its cache.  The OS worries about moving things from memory to disk based on usage.  The claim is that the OSes are better at this than any programmer could do (without copying them of course).  It is fast.  The developers are proud.  And by proud I mean cocky.  I have been playing with it.  As you know, I have my own little caching proxy solution.  Varnish is much faster, as I expected.  However, storing cache in memcached is very attractive to me.  Varnish can't do that.  It would likely slow it down a great deal.  MemProxy does do that.  Also, because MemProxy is written in PHP and my application layer is PHP, I can do things at the proxy layer to inspect the request and take action.  Works well for my use.  But, if you are using squid or mod_cache or something, you may want to give Varnish a look.

There was a good bit of information about the client side of performance.  There were folks from Microsoft there talking about IE8.  It looks like IE8 will catch up with the other browsers in a lot of ways.  Yahoo talked about image optimization.  Good stuff in there.  I use Fireworks and it does a pretty good job of making small images.  I am looking more into combining images and making image maps that use CSS.  We use a CDN, but fewer connections is better for users.

There was also a lot of great debate.  SANs rock!  SANs suck!  Rails Scales!  Rails Sucks!  The Cloud is awesome!  The Cloud is a lie!  (lots of cloud)

I had dinner both nights with guys from Six Apart.  Good conversations were had.  I don't know if I am a big vegan fan though.  I mean, the food was good, but it all kinda tasted the same.  Perhaps I ordered poorly.  At dinner on Tuesday I met a guy going to work for Twitter soon.  He is an engineer that hopefully will be another step toward getting them back to 100% again.  Lets keep our fingers crossed.

They did announce that the conference would be held again next year.  I am definitely going back.  Probably two of us from dealnews will go.  OSCON is fun.  MySQL conference is too.  But, more and more, capacity planning and scaling is what I do.  And this conference is all about those topics.