Talking about Gearman at Etsy Labs

I find myself flying to New York on Monday for some dealnews related business. Anytime I travel I try and find something fun to do at night. (Watching a movie by myself in Provo, Utah was kinda not that fun.) So, this week I asked on Twitter if anything was happening while I would be in town. Anything would do. A meetup of PHP/MySQL users or some design/css/js related stuff for example. Pretty much anything interesting. Well, later that day I received an IM from the brilliant John Allspaw, Senior VP of Technical Operations at Etsy. He wanted me to swing by the Etsy offices and say hi. Turns out it is only a block away from where I would be. Awesome! He also mentioned that he would like to have me come and speak at their offices some time. That would be neat too. I will have to plan better next time I am traveling up there.

Fast forward another day. I get an email from Kellan Elliott-McCrea, CTO of Etsy wanting to know if I would come to the Etsy offices and talk about Gearman. At first I thought "That is short notice, man. I don't know that I can pull that off." Then I remembered the last time I was asked to speak at an event on short notice based off a recommendation from John Allspaw.

It was in 2008 for some new conference called Velocity. That only turned out to be the best conference I have ever attended. I have been to Velocity every year since and this year took our whole team. In addition, I spoke again in 2009 at Velocity, wrote a chapter for John's book Web Operations that was released at Velocity in 2010 and was invited to take part in the Velocity Summit this year (2011) which helps kick off the planning for the actual conference. The moral of that story for me is: when John Allspaw wants you to take part in something, you do it.

In reality, it was not that tough a decision. Even without John's involvement, I love the chance to talk about geeky stuff. The Etsy and dealnews engineering teams are like two twins separated at birth. Every time we compare notes, we are doing the same stuff. For example, we have been trading Open Source code lately. They are using my GearmanManager and we just started using their statistics collection daemon, statsd. So, speaking to their people about what we do seem like a great opportunity to share and get input.

The event is open to the public. So, if you use Gearman, want to use Gearman, or just want to hear how we use Gearman at dealnews, come here me ramble on about how awesome it is Tuesday night in Dumbo at Etsy Labs. You can RSVP on the event page.

Best Practices for Gearman by Brian Moon
Etsy Labs
55 Washington St. Ste 712
NY 11222

Tuesday, August 09, 2011 from 7:00 PM - 10:00 PM (ET)

Velocity 2010 In Review

I just got back from Velocity for the third straight year. I have been to all three of them which is kind of a neat little club to be in. The first one only had maybe 300 people. This year there were over 1,000 attendees. Registration was shut down by the fire code for the rooms we were using. Most sessions had standing room only. It was awesome.

The people that talk at Velocity are really smart. I am always humbled by the likes of John Allspaw. He and I see eye to eye on a lot, but he is so much better at explaining to people and showing them how to make the ideas work. I wish I had his charisma when at the podium. I was lucky enough to write a chapter in a book for John this year. He and Velocity co-chairperson Jesse Robbins organized and authored a book titled Web Operations that debuted at the conference. I basically just told and expanded on my Yahoo story. John loves that story for some reason. I was happy to be a part of it. So many smart people in that book.

The IE9 technology preview dropped while we were there. HTML 5, CSS 3 and more in there. One feature where Microsoft is actually ahead of the curve is in a new DOM level measurement feature. Basically they expose statistics via the DOM about the time it takes to do different things in the page. The other browser vendors in attendance (Google and Mozilla) vowed to support the same data. Another big advancement of IE9 is the heavy use of the GPU for rendering the pages. They have a real advantage here. They are the only browser vendor that is now locked to one operating system. IE9 will require Vista or higher. They can really max out the system for faster rendering.

As usual some of the best content was in the hall ways and bar. We hung out with Theo Schlossnagle from OmniTI and talked about Reconnoiter. It is a kind of Cacti/Ganglia/Nagios all in one. I got to see the Six Apart guys again this year. That is becoming an annual thing. I shared our new Gearman assisted proxy with them. They do some similar stuff for TypePad. More on that proxy later this year. I met a guy from CloudTest. It sounds like a really good use of on-demand cloud resources. I am gonna talk with them about some possible testing.

Membase also dropped while we were there. Most of the persistent key/value stores I have used have disappointed me or just been way too complex for our needs. We don't want a memcached replacement. It does its job damn well. I just need a place to store adhoc data for various applications. Membase is promising because the guys that wrote it are core memcached contributors. There is a company behind it, so it is not as inviting as Drizzle. But, the code is on GitHub so it is more open than say MySQL. Time will tell.

If you have not been to Velocity I encourage you to go next year. It is right for all types of people in the web business. Developers can learn about performance in new ways that will change they way they write code. Operations can learn techniques to make their work day much less painful. Everyone will learn how to empower their business to achieve the goals of the business.

DevOps at dealnews.com

I was telling someone how we roll changes to production at dealnews and they seemed really amazed by it. I have never really thought it was that impressive. It just made sense. It has kind of happened organically here over the years. Anyhow, I thought I would share.

Version Control

So, to start with, everything is in SVN. PHP code, Apache configs, DNS and even the scripts we use to deploy code. That is huge. We even have a misc directory in SVN where we put any useful scripts we use on our laptops for managing our code base. Everyone can share that way. Everyone can see what changed when. We can roll things back, branch if we need to, etc. I don't know how anyone lives with out. We did way back when. It was bad. People were stepping on each other. It was a mess. We quickly decided it did not work.

For our PHP code, we have trunk and a production branch. There are also a couple of developers (me) that like to have their own branch because they break things for weeks at a time. But, everything goes into trunk from my branch before going into production. We have a PHP script that can merge from a developer branch into trunk with conflict resolution assistance built in. It is also capable of merging changes from trunk back into a branch. Once it is in trunk we use our staging environment to put it into production.

Staging/Testing

Everything has a staging point. For our PHP code, it is a set of test staging servers in our home office that have a checkout of the production branch. To roll code, the developer working on the project logs in via ssh to a staging server as a restricted user and uses a tool we created that is similar to the Python based svnmerge.py. Ours is written in PHP and tailored for our directory structure and roll out procedures. It also runs php -l on all .php and .html files as a last check for any errors. Once the merge is clean, the developer(s) use the staging servers just as they would our public web site. The database on the staging server is updated nightly from production. It is as close to a production view of our site as you can get without being on production. Assuming the application performs as expected, the developer uses the merge tool to commit the changes to the production branch. They then use the production staging servers to deploy.

Rolling to Production

For deploying code and hands on configuration changes into our production systems, we have a staging server in our primary data center. The developer (that is key IMO) logs in to the production staging servers, as a restricted user, and uses our Makefile to update the checkout and rsync the changes to the servers. Each different configuration environment has an accompanying nodes file that lists the servers that are to receive code from the checkout. This ensures that code is rolled to servers in the correct order. If an application server gets new markup before the supporting CSS or images are loaded onto the CDN source servers, you can get an ugly page. The Makefile is also capable of copying files to a single node. We will often do this for big changes. We can remove a node from service, check code out to it, and via VPN access that server directly to review how the changes worked.

For some services (cron, syslog, ssh, snmp and ntp) we use Puppet to manage configuration and to ensure the packages are installed. Puppet and Gentoo get along great. If someone mistakenly uninstalls cron, Puppet will put it back for us. (I don't know how that could happen, but ya never know). We hope to deploy more and more Puppet as we get comfortable with it.

Keeping Everyone in the Loop

Having everyone know what is going on is important. To do that, we start with Trac for ticketing. Secondly, we use OpenFire XMPP server throughout the company. The devops team has a channel that everyone is in all day. When someone rolls code to production, the scripts mentioned above that sync code out to the servers sends a message via an XMPP bot that we wrote using Ruby (Ruby has the best multi-user chat libraries for XMPP). It interfaces with Trac via HTTP and tells everyone what changesets were just rolled and who committed them. So, in 5 minutes if something breaks, we can go back and look at what just rolled.

In addition to bots telling us things, there is a cultural requirement. Often before a big roll out, we will discuss it in chat. That is the part than can not be scripted or programmed. You have to get your developers and operations talking to each other about things.

Final Thoughts

There are some subtle concepts in this post that may not be clear. One is that the code that is written on a development server is the exact same code that is used on a production server. It is not massaged in any way. Things like database server names, passwords, etc. are all kept in configuration files on each node. They are tailored for the data center that server lives in. Another I want to point out again is that the person that wrote the code is responsible all the way through to production. While at first this may make some developers nervous, it eventually gives them a sense of ownership. Of course, we don't hire someone off the street and give them that access.  But it is expected that all developers will have that responsibility eventually.

Developers should write code for production

Having development and staging environments that reflect production is a key component of DevOps.  An example for us is dealing with our CDN.

I can imagine in some dysfunctional, fragmented company, a developer works on a web application and sticks all the images in the local directory with his scripts. Then some operations/deployment guy has to first move the images where they need to be and then change all the code that references those images.  If he is lucky, he has a script that does it for him. This is a needless exercise. If you have a development environment that looks and acts like production, this is all handled for you.

Here is an example of how it works for us. We use a CDN for all images, javascript, CSS and more. Those files come from a set of domains: s1.dlnws.com - s5.dlnws.com. So, our dev environments have similar domains. somedev.s5.dlnws.com points to a virtual server. We then use mod_substitute in Apache to rewrite those URLs on the dev machine. Each developer and staging instances will have an Apache configuration such as:
Substitute "s|http://s1.dlnws.com|http://somedev.s1.dev.dlnws.com|in"
Substitute "s|http://s2.dlnws.com|http://somedev.s2.dev.dlnws.com|in"
Substitute "s|http://s3.dlnws.com|http://somedev.s3.dev.dlnws.com|in"
Substitute "s|http://s4.dlnws.com|http://somedev.s4.dev.dlnws.com|in"
Substitute "s|http://s5.dlnws.com|http://somedev.s5.dev.dlnws.com|in"
So our developers put the production URLs for images into our code. When they test on the development environment, they get URLs that point to their instance, not production. No fussing with images after the fact.

In addition to this, we use mod_proxy to emulate our production load balancers. Special request routing happens in production. We need to see that when developing so we don't deploy code that does not work in that circumstance. If the load balancers send all requests coming in to /somedir to a different set of servers, we have mod_proxy do the same thing to a different VirutalHost in our Apache configuration. It is not always perfect, but it gets us as close to production as we can get without buying very expensive load balancers for our development environments.

Of course, we did not come to this overnight. It took us years to get to this point. Maybe it won't take you that long. Keep in mind when creating your development environments to make them work like production. It is neat to be able to write code on your laptop. I did it for years. But, at some point before you send out code for production, the developer should run it on a production like environment. Then deploying should be much easier.

DevOps is the only way it has ever made sense

DevOps is the label being given to the way we have always done things. This is not the first time this has happened. As it says on my About Me page,

Brian Moon has been working with the LAMP platform since before it was called LAMP.

At some point, not sure when, someone came up with LAMP. I started working on what is now considered LAMP in 1996. I have seen lots of acronyms come and some go. We started using "Remote Scripting" after hearing Terry Chay talk about it at OSCON. The next OSCON, AJAX was all the rage. Technically, we never used AJAX. The X stands for XML. We didn't use XML. What made sense for us was to send back javascript arrays and objects that the javascript interpreter could deal with easily. We wrote a PHP function called to_javascript that converted a PHP array into a javascript array. Sound familiar? Yeah, two years later, JSON was all the rage.

We also have seen the same thing with how we run our development process.  We always considered our team to be an agile development team. That is agile with little a. Nowadays, "Agile" with the big A is usually all about how you develop software and not about actually delivering the software. So, I am always perplexed when people ask me if we use "Agile" development. Are they talking little a or big A?

Today I came across the term DevOps on twitter (there is no Wikipedia page yet). We have always had an integrated development and operations team. I could be writing code in the morning and configuring servers in the afternoon. Developers all have some level of responsibility over managing their development environment. They updated their Apache configurations from SVN and make changes as needed for their applications. The development environments are simulated as close as possible to production. Developers roll code to the production servers. It is their responsibility to make sure it works on production. They also roll it when it is ready rather than letting it sit around for days. This means if there is an unforeseen issue, the code is fresh on their minds and the problem can quickly be solved. We have done things this way since 1998. We are not the only ones. The great guys at Flickr gave a great talk last year at Velocity about their DevOps environment. People were amazed at how their teams worked together.

One of the huge benefits of being a DevOps team is that we can utilize the full stack in our application. If we can use the load balancers, Apache or our proxy servers to do something that offloads our application servers, we plan for that in the development cycle. It is a forethought instead of an afterthought. I see lots of PHP developers that do everything in code. Their web servers and hardware are just there to run their code. Don't waste those resources. They can do lots of things for you.

One cool thing about this is that I now have a label to use when people ask us about our team. I can now say we are an agile DevOps team. They can then go look that up and see what it means. Maybe it will lead to less explanation of how we work too. And if we are lucky, maybe we can find people to hire that have been in a similar environment.

So, I welcome all the new people into the "DevOps movement". Adopt the ideas and avoid any rules that books, blogs, "experts" may want to come up with. The first time I see someone list themselves as a DevOps management specialist, I will die a little on the inside. It is not a set of rules, it is a way of thinking, even a way of life. If the process prevents you from doing something, you are using it wrong, IMO.