Questions for Git users

So, we have been using Subversion for a long time. We are very comfortable with it. We know how it works. Even when things go sideways, we have people here that understand it and can set things straight again. But, we do find some things lacking. I had an SVN conflict today where the diff showed the problem to be that I was replacing nothing with something. That was a conflict. Sigh. Also, it is getting slower and slower as time passes. Committing or updating my code base can take long enough that I start checking email and forget I was committing something. So, we are thinking about a switch. I use GitHub for some of my OSS software I have released. I like GitHub a lot. I find the Git command line tool a little confusing (commit -a seems silly for example). But, I figure I can get over that. I am going to end up wrapping it up in our merge and deploy tools anyway. But, I have some questions for people that use Git on a daily basis. I have searched around and there is either more than one answer or no answers to these things.

How big is your repository?

We have 1,610 directories and 10,215 files in our code library. How does Git deal with repositories that size?

What are you using for a Git server?

From what I can tell, there is no server component to Git. I think I understand that you can use a shared Git use and allow ssh keys to access it that way? Or there are some other 3rd party projects that act as a Git server. We currently use Apache+SVN which lets us A) Use our LDAP server and B) control access to parts of the repository using Apache configuration. It looks like we would lose the LDAP for sure but maybe something like Gitolite would allow us to do some auth/acl stuff. Maybe we could dump our LDAP data on a schedule to something it can read.

What is your commit, push, deploy procedure?

This is really aimed at the guys doing continuous deployment in a web application with Git. Our current methodology is that each developer has a branch. They commit to their branch. When a set of commits is ready for release, they push it to trunk. In a staging environment, the changeset in trunk is merged with the production branch. That branch is then rolled to the servers. We like having the 3 layers. It lets us review changes in one place (trunk) for large commits. Meaning there can be 10 commits in a developer's branch, but when all the changes are merged into trunk (using one SVN merge of course) you get one nice neat commit in the trunk branch that can easily be evaluated. It also lets users collaborate easily by committing changes to trunk that others may need. Other users can just merge back from trunk to get the changes. From what I have read or heard people talking about, this seems to fly in the face of how Git works. Or, at least what Git is good at. Also, I think the way Git works, 10 commits in a branch would merge as 10 commits in trunk. So, we would lose the unification of a lot of little changes getting merged into one change. Some of our developers will use their branch to commit things in progress and not at a finished point. So, by the time they are done, we will have 10+ commits for one task. Any input here?

What are you using for visualization?

We use Trac. We love and hate Trac. Its good enough. I know Redmine supports Git natively and I know there is a Trac Hack for making Trac support it. Anything else? Any comments on Redmine?

Any tips for importing from Subversion?

I tried importing our Subversion repos into Git using svn2git. It got hung up on some circular branch reference or something. The error message was confusing. So, I guess if we do migrate, we will be starting with little to no history. Perhaps we just import our current production branch and start from there. Any tips?

DevOps at dealnews.com

I was telling someone how we roll changes to production at dealnews and they seemed really amazed by it. I have never really thought it was that impressive. It just made sense. It has kind of happened organically here over the years. Anyhow, I thought I would share.

Version Control

So, to start with, everything is in SVN. PHP code, Apache configs, DNS and even the scripts we use to deploy code. That is huge. We even have a misc directory in SVN where we put any useful scripts we use on our laptops for managing our code base. Everyone can share that way. Everyone can see what changed when. We can roll things back, branch if we need to, etc. I don't know how anyone lives with out. We did way back when. It was bad. People were stepping on each other. It was a mess. We quickly decided it did not work.

For our PHP code, we have trunk and a production branch. There are also a couple of developers (me) that like to have their own branch because they break things for weeks at a time. But, everything goes into trunk from my branch before going into production. We have a PHP script that can merge from a developer branch into trunk with conflict resolution assistance built in. It is also capable of merging changes from trunk back into a branch. Once it is in trunk we use our staging environment to put it into production.

Staging/Testing

Everything has a staging point. For our PHP code, it is a set of test staging servers in our home office that have a checkout of the production branch. To roll code, the developer working on the project logs in via ssh to a staging server as a restricted user and uses a tool we created that is similar to the Python based svnmerge.py. Ours is written in PHP and tailored for our directory structure and roll out procedures. It also runs php -l on all .php and .html files as a last check for any errors. Once the merge is clean, the developer(s) use the staging servers just as they would our public web site. The database on the staging server is updated nightly from production. It is as close to a production view of our site as you can get without being on production. Assuming the application performs as expected, the developer uses the merge tool to commit the changes to the production branch. They then use the production staging servers to deploy.

Rolling to Production

For deploying code and hands on configuration changes into our production systems, we have a staging server in our primary data center. The developer (that is key IMO) logs in to the production staging servers, as a restricted user, and uses our Makefile to update the checkout and rsync the changes to the servers. Each different configuration environment has an accompanying nodes file that lists the servers that are to receive code from the checkout. This ensures that code is rolled to servers in the correct order. If an application server gets new markup before the supporting CSS or images are loaded onto the CDN source servers, you can get an ugly page. The Makefile is also capable of copying files to a single node. We will often do this for big changes. We can remove a node from service, check code out to it, and via VPN access that server directly to review how the changes worked.

For some services (cron, syslog, ssh, snmp and ntp) we use Puppet to manage configuration and to ensure the packages are installed. Puppet and Gentoo get along great. If someone mistakenly uninstalls cron, Puppet will put it back for us. (I don't know how that could happen, but ya never know). We hope to deploy more and more Puppet as we get comfortable with it.

Keeping Everyone in the Loop

Having everyone know what is going on is important. To do that, we start with Trac for ticketing. Secondly, we use OpenFire XMPP server throughout the company. The devops team has a channel that everyone is in all day. When someone rolls code to production, the scripts mentioned above that sync code out to the servers sends a message via an XMPP bot that we wrote using Ruby (Ruby has the best multi-user chat libraries for XMPP). It interfaces with Trac via HTTP and tells everyone what changesets were just rolled and who committed them. So, in 5 minutes if something breaks, we can go back and look at what just rolled.

In addition to bots telling us things, there is a cultural requirement. Often before a big roll out, we will discuss it in chat. That is the part than can not be scripted or programmed. You have to get your developers and operations talking to each other about things.

Final Thoughts

There are some subtle concepts in this post that may not be clear. One is that the code that is written on a development server is the exact same code that is used on a production server. It is not massaged in any way. Things like database server names, passwords, etc. are all kept in configuration files on each node. They are tailored for the data center that server lives in. Another I want to point out again is that the person that wrote the code is responsible all the way through to production. While at first this may make some developers nervous, it eventually gives them a sense of ownership. Of course, we don't hire someone off the street and give them that access.  But it is expected that all developers will have that responsibility eventually.