Using socket_connect with a timeout

TL;DR

I was having trouble with socket connections timing out reliably. Sometimes, my timeout would be reached. Other times, the connect would fail after three to six seconds. I finally figured out it had to do with trying to connect to a routable, non-localhost address. This function is what I finally ended up with that reliably connects to a working server, fails quickly for a server that has an address/port that is not reachable and will reach the timeout for routable addresses that are not up.

I have put a version of my final function into a Gist on Github. I hope someone finds it useful.

Full Story

So, it seems that when you try and connect to an IP that is routable on the network, but not answering, the TCP stack has some built in timeouts that are not obvious. This differs from trying to connect to an IP address that is up, but not listening on a given port. We took a Gearman server down for maintenance and I noticed our warning logs were showing a 3 to 7 second delay between the attempt to queue jobs and the warning log. The timeout we had set was only 100ms. So, this seemed odd.

After a lot of messing around, a coworker pointed out that in production, the failures were happening for an IP that was routable on the network, but that had no host listening on the IP. I had been using localhost and some foreign port for my "failed" server. After using an IP that was local to our LAN but had no host listening on the IP, I was able to recreate it on a dev server. I figured out that if you set the send and receive timeouts really low before calling connect, you can loop while calling connect. You check the error state and timeout. As long as the error is an acceptable one and the timeout is not reached, keep trying until it connects. It works like a charm.

I found several similar examples to this on the web. However, none of them mixed all these techniques.

You can simply set the send and receive timeouts to your actual timeout and it will return quicker. However, the timeouts apply to the packets. And there are retry rules in place. So, I found that a 100ms timeout for each send and receive would wind up taking 500ms or so to actually fail. This was not what I wanted. I wanted more control. So, I set a 100 microsecond timeout during connect. This makes socket_connect return quickly. As long as the socket error is 115 (in progress) or 114 (already trying), we keep calling it. Unless of course our timeout is reached. Then we fail.

It works really well. Should help for doing server maintenance on our Gearman servers.

Lock Wait Timeout Errors or Leave Your Data on the Server

If you use MySQL with InnoDB (most everyone) then you will likely see this error at some point. There is some confusion sometimes about what this means. Let me try and explain it.

Let's say we have a connection called A to the database. Connection A tries to update a row. But, it receives a lock wait timeout error. That does not mean that connection A did anything wrong. It means that another connection, call it B, is also updating a row that connection A wants to update. But, connection B has an open transaction that has not been committed yet. So, MySQL won't let you update that row from connection A. Make sense?

The first mistake people may make is looking at the code that throws the error to find a solution. It is hardly ever the code that throws the error that is the problem. In our case, it was code that was doing a simple insert into a table. I had a look at our processing logs around the time that the errors were thrown and I found a job that was running during that time. I then looked for code in that job that updates the table that was locked. This was where the problem lied.

So, why does this happen? Well, there can be very legitimate reasons. There can also be very careless reasons. The genesis of this blog post was some code that appeared to be legitimate at first, but upon further inspection was careless. This is basically what the code did.

  1. Start Transaction on database1
  2. Clear out some old data from the table
  3. Select a bunch of data from database2.table
  4. Loop in PHP, updating each row in its own query to update one column
  5. Select a bunch of data from database2.other_table
  6. Loop in PHP, updating each row in its own query to update another column
  7. Commit database1

This code ran in about 20 minutes on the data set we had. It kept a transaction open the whole time. It appeared legit at first because you can't join the data as there are sums and counts going on that have a one to many relationship which would cause some duplication of the sums and counts. It also looks legit because you are having to pull data from one database into another. However, there is a solution for this. We need to stop pulling all this data into PHP land and let it stay on the server where it lives. So, I changed it to this.

  1. Create temp table on database2 to hold mydata
  2. Select data from database2.table into my temp table
  3. Select data from database2.other_table into my temp table
  4. Move my temp table using extended inserts via PHP from database2 to database1
  5. Start Transaction on database1
  6. Clear out some old data from the table
  7. Do a multi-table bulk update of my real table using the temp table
  8. Commit database1

This runs in 3 minutes and only requires a 90 second transaction lock. Our lock wait timeout on this server is 50 seconds though. However, we have a 3 time retry rule for any lock wait timeout in our DB code. So, this should allow for our current workload to be processed without any data loss.

So, why did this help so much? We are not moving data from MySQL to PHP over and over. This applies to any language, not just PHP. The extended inserts for moving the temp table from one db to another really help. That is the fastest part of the whole thing. It moves about 2 million records from one to the other in about 1.5 seconds.

So, if you see a lock wait timeout, don't think you should sleep longer between retries. And don't dissect the code that is throwing the error. You have to dig in and find what else is running when it happens. Good luck.

Bonus: If you have memory issues in your application code, these techniques can help with those too.

Stop comparing stuff you don't understand

I normally don't do this. When I see someone write a blog post I don't agree with, I often just dismiss it and go on. But, this particular one caught my attention. It was titled PHP vs Node.js: Yet Another Versus. The summary was:

Node.js = PHP + Apache + Memcached + Gearman - overhead

What the f**k? Are you kidding me? Clearly this person has NEVER used memcached or Gearman in a production environment that had any actual load.

Back in the day, when URLs and filesystems had a 1:1 mapping, it made perfect sense to have a web server separate from the language it is running. But, nowadays, any PHP app with attractive URLs running behind the Apache web server is going to need a .htaccess file, which tells the server a regular expression to check before serving up a file. Sound complex and awkward with unnecessary overhead? That’s because it is.

Node has a web server built in. Some people call this a bad thing, I call those people crazy. Having the server built in means that you don’t have the awkward .htaccess config thing going on. Every request is understood to go through the same process, without having to hunt through the filesystem and figure out which script to run.
He believes that PHP inside Apache requires a .htaccess file. Welcome to 1999. I have not used a .htaccess file since then. Anyone that cares at all about scaling Apache would disable .htaccess files. And as for running regexs, how does he propose you decide your code path in the controller of his Node.js code? Something somewhere has to decide what code is going to answer a given request. mod_rewrite is wire speed fast and compiled in C. Javascript nor PHP code could ever beat that.

The official website is quite ugly and outdated.
Really? You choose your tools based on that? I don't know what to say.

Since a PHP process starts, does some boilerplate work, performs the taks the user actually wants, and then dies, data is not persistent in memory. You can keep this data persistent using third party tools like Memcache or traditional database, but then there is the overhead of communicating with those external processes.
He clearly has no understanding of how memcached is supposed to be used. You don't put things in Memcached so you can use it on the next request on this server. You put things in memcached so you can use it in any request on any server in your server pool. If you just have one web server, you can write Perl CGI scripts. Performance and up time is not important to you. If you want to share things across requests in PHP, APC and XCache fill the need very well.

The number one bottleneck with web apps is not the time it takes to calculate CPU hungry operations, but rather network I/O. If you need to respond to a client request after making a database call and sending an email, you can perform the two actions and respond when both are complete.
This sums up the mythical magic of Node.js. People think just because your code is not "running" that somehow the server is not doing anything. The process does not get to go do other shit. No, that would be multi-threaded. And Node.js is not multi-threaded. It is single threaded. That means the process can only be doing one thing at a time. If you are waiting on a DB call, you are waiting. I don't care what world you think you live in. You are waiting on that DB call. How your code is structured is irrelevant to how computers actually work. The event driven nature that is Node.js is much like OOP. You are abstracting yourself from how computers really work. The further you get from that, the less you will be able to control the computer.
Node.js is a very new, unstable, untested platform. If you are going to be building a large corporate scale app with a long lifetime, Node.js is not a good solution. The Node API’s are changing almost daily, and large/longterm apps will need to be rewritten often.
So, if I plan on making a living, don't use Node.js. Got it. We finally agree on something.
Being so new, it doesn’t have a lot of baggage leftover from days of old. Having a server built in, the stack is a lot simpler, there are less points of failure, and there is more control over what you can do with HTTP responses (ever try overwriting the name of the web server using PHP?).
No, why would you? It is not the web server. Apache or nginx would be your web server.
  • Are you building some sort of daemon? Use Node.
  • Are you making a content website? Use PHP.
  • Do you want to share data between visitors? Use Node.
  • Are you a beginner looking to make a quick website? Use PHP.
  • Are you going to run a bunch of code in parallel? Use Node.
  • Are you writing software for clients to run on shared hosts? Use PHP.
  • Do you want push events from the server to the client using websockets? Use Node.
  • Does your team already know PHP? Use PHP.
  • Does your team already know frontend JavaScript? Node would be easier to learn.
  • Are you building a command line script? Both work.
Yes, of course you would not build a daemon in PHP. Do you plan to share that same data across servers? He already told us Node is not multi-threaded, so how can it run code in parallel? Websockets have a ton of their own pain to deal with that is not even related to Node vs. PHP. Things like proxies. If I was building a command line tool and wanted to use Javascript, I would just use V8.

Listen, I write code in PHP and JavaScript all day. I also use some Ruby, Lua and even dabble in C. I am not a language snob. Use what works for you. I do however take exception when people write about things they clearly have no idea about. He claims to have written a lot of PHP. He clearly has never deployed a lot of PHP in environments that matter. If you are building small sites that don't have a lot of traffic, you can use anything. If you are building massive sites that have to scale, any technology is going to require a full understanding of what it takes to scale it out. I leave you with this wisdom that I am reminded of by his blog post.

PHP Coding Standards

Update: Matthew Weier O'Phinney, one of the core members of the group, has cleared up the naming history in the comments.

During the /dev/hell podcast at Tek12, someone asked the guys their opinion about PSR. I did not know what PSR was by that name. A quick search lead me to the Google Group named PHP Standards Working Group. I had vaguely remembered a consortium of frameworks, libraries and applications that were organizing to attempt to make their projects cooperate better. But, this did not sound like the same project. Another search and I found the PHP Framework Interoperability Group on Github. A bit more searching led me to a post where apparently the PHP FIG changed their name at some point citing people not knowing what FIG meant. But, this is not a history post. The group had done some work on setting a standard for auto loaders in PHP. This is a very good thing and much needed. That is a real thing that impacts real developers.

The person asking the question had asked about PSR1 and PSR2. These are the first two standards proposals in the group and they deal with coding standards. There were mixed feelings in the room about the proposals. I asked (being me, probably with very little tact) why in 2012 were a group of really smart people still discussing coding standards such as tabs vs. spaces. Because this is what immediately came to mind for me.


Source: http://xkcd.com/927/

There are already coding standards for PHP and any other language out there. Why does anyone need to make a new one? For Phorum we chose the PEAR standard (ok, with 2 minor modifications). On top of that, every one of the projects in this group already have coding standards. Why not just pick one of those? Are 10 projects that currently have their own standards going to actually all change to something else? I highly doubt it. My guess is that, at best, they will all end up with a modified version of the groups standards.

This reminds me a lot of Open Source licenses. There are tons of these things. And in the end, most (GPL has its issues I know) of the open source licenses represent the same idea. I suppose you could say that most of all of them fall into GPL like or BSD like. Anyhow, I quit worrying about having my own license years ago. I now just use a BSD style license that you can generate with several online BSD license generators.

When I voiced my concern about what is, in my opinion, a waste of very smart people's time, my good friend Cal Evans (He has bled in my car. So, I think he is my friend. And I hope he feels the same.) said that I was misunderstanding the point of the group. It was a group of projects that were collaborating to try and use similar standards and practices to make the PHP OSS community better. And that is exactly what I thought PHP FIG was. However, the group name is now "PHP Standards Working Group". That reminds me of the W3C HTML Working Group. And in my mind that means a group that is deciding the future of a technology. In addition the proposal being discussed is titled "PSR-1, a standard coding convention for PHP". If you pair that with the name of the group, it sounds very authoritative. And I don't think that is by accident. If I was heading up such an effort, I would hope that every PHP developer on the planet would follow it too. If you saw Terry Chay's keynote at the PHP Community Conference last year, he talked about frameworks and platforms. He pointed out that the reason people like Facebook were sharing their data center technology was in hopes that people would start using it and it would become common. Thus meaning the equipment they are custom building would be cheaper and people they hire would already be familiar with it. But, if the point of the group is *only* cooperation between lage OSS PHP projects, I wish they would pick a name that is more indicative of that. As it stands, when I landed on the page, my immediate assumption was that this groups intention was to dictate to the rest of the PHP world how to write their PHP code.

In the end, cooperation is good. And if these guys want to cooperate I say more power to them. I just hope they get into really good things soon. Like, can we talk about a maximum number of files, functions or classes used for any one single page execution? *That* would be valuable to the PHP community. I can deal with funny formatting. I can't deal with poorly performing code that his dragged down in abstracton and extension. Or how about things like *never* running queries inside loops that are reading results from another query. That would be a great thing to make examples of and show people the best practices. Tabs vs. spaces? That should have been solved 10 years ago. When in doubt, PHP code should do what the PHP core does. This is PHP we are talking about. Would it not make sense to have the people who write PHP writing code that is somewhat similar in style to those that make PHP? C and PHP syntax are very, very similar. So, why don't we all just refer to the PHP CODING_STANDARDS file when in doubt and not even worry with the little stuff that does not affect performance?

So, as of a few minutes ago, I have joined the group. If for no other reason, just to see what is discussed. Perhaps I should follow the advice I give people when they ask for features in my projects and do something about my issues and worries.

Living in the Prove It Culture

Engineering cultures differ from shop to shop. I have been in the same culture for 13 years so I am not an expert on what all the different types are. Before that I was living in Dilbert world. The culture there was really weird. The ideas were never yours. It was always some need some way off person had. A DBA, a UI "expert" and some product manager would dictate what code you wrote. Creativity was stifled and met with resistance.

I then moved to the early (1998) days of the web. It was a start up environment. In the beginning there were just two of us writing code. So, we thought everything we did was awesome. Then we added some more guys. Lucky for us we mostly hired well. The good hires where type A personalities that had skills we didn't have. They challenged us and we challenged them. On top of that, we had a CEO who had been a computer hacker in his teens. So, he had just enough knowledge to challenge us as well. Over the years we kept hiring more and more people. We always asked in the interview if the person could take criticism and if they felt comfortable defending their ideas. We decided to always have a white board session. We would ask them questions and have them work it out on a white board or talk it out with us in a group setting. The point of this was not to see if they always knew the answer. The point was to see how they worked in that setting. Looking back, the hires that did not work out also did not excel in that phase of the interview. The ones that have worked out always questioned our methods in the interview. They did not belittle our methods or dismiss them. They just asked questions. They would ask if we had tried this or that. Even if we could quickly explain why our method was right for us, they still questioned it. They challenged us.

When dealing with people outside the engineering team, we subconsciously applied these same tactics. The philosophy came to be that if you came to us with an idea, you had to throw it up on the proverbial wall. We would then try to knock it down. If it stuck, it was probably a good idea. Some people could handle this and some could not. The ones that could not handle that did not always get their ideas pushed through. It may not mean they were bad ideas. And that is maybe the down side of this culture. But, it has worked pretty well for us.

We apply this to technology too. My first experience on Linux was with RedHat. The mail agent I cut my teeth on was qmail. I used djbdns. When Daniel Beckham, our now director of operations, came on, he had used sendmail and bind. He immediately challenged qmail. I went through some of the reasons I prefered it. He took more shots. In the end, he agreed that qmail was better than sendmail. However, his first DNS setup for us was bind. It took a few more years of bind hell for him to come around to djbdns.

When RedHat splintered into RedHat Enterprise and Fedora, we tried out Fedora on one server. We found it to be horribly unstable. It got the axe. We looked around for other distros. We found a not very well known distro that was known as the ricer distro of the Linux world called Gentoo. We installed it on one server to see what it was all about. I don't remember now whose idea it was. Probably not mine. We eventually found it to be the perfect distro for us. It let us compile our core tools like Apache, PHP and MySQL while at the same time using a package system. We never trusted RPMs for those things on RedHat. Sure, bringing a server online took longer but it was so worth it. Eventually we bought in and it is now the only distro in use here.

We have done this over and over and over. From the fact that we all use Macs now thanks to Daniel and his willingness to try it out at our CEO's prodding to things like memcached, Gearman, etc. We even keep evaluating the tools we already have. When we decided to write our own proxy we discounted everything we knew and evaluated all the options. In the end, Apache was known and good at handling web requests and PHP could do all we needed in a timely, sane manner. But, we looked at and tested everything we could think of. Apache/PHP had to prove itself again.

Now, you might think that a culture of skepticism like this would lead to new employees having a hard time getting any traction. Quite the opposite. Because we hire people that fit the culture, they can have a near immediate impact. We have a problem I want solved and a developer that has been here less than a year suggested that Hadoop may be a solution, but was not sure we would use it. I recently sent this in an email to the whole team in response to that.
The only thing that is *never* on the table is using a Windows server. If you can get me unique visitors for an arbitrary date range in milliseconds and it require Hadoop, go for it.
You see, we don't currently use Hadoop here. But, if that is what it takes to solve my problem and you can prove it and it will work, we will use it.

Recently we had a newish team member suggest we use a SAN for our development servers to use as a data store. Specifically he suggested we could use it to house our MySQL data for our development servers. We told him he was insane. SANs are magical boxes of pain. He kept pushing. He got Dell to come in and give us a test unit. Turns out it is amazing. We can have a hot copy of our production database on our dev slices in about 3 minutes. A full, complete copy of our production database in 3 minutes. Do you know how amazing that is? Had we not had the culture we do and had not hired the right person that was smart enough to pull it off and confident enough to fight for the solution, we would not have that. He has been here less than a year and has had a huge impact to our productivity. There is talk of using this in production too. I am still in the "prove it" mode on this. We will see.
I know you will ask how our dev db works, here you go:
1. Replicate production over VPN to home office
2. Write MySQL data on SAN
3. Stop replication, flush tables, snapshot FS
4. Copy snapshot to a new location
5. On second dev server, umount SAN, mount new snapshot
6. Restart MySQL all around
7. Talk in dev chat how bad ass that is

We had a similar thing happen with our phone system. We had hired a web developer that previously worked for a company that created custom Asterisk solutions. When our propietary PBX died, he stepped up and proved that Asterisk would work for us. Not a job for a web developer. But he was confident he could make it work. It now supports 3 offices and several home bound users world wide. He also had only been here a short time when that happened.

Perhaps it sounds like a contradiction. It may sound like we just hop on any bandwagon technology out there. But no. We still use MySQL. We are still on 5.0 in fact. It works. We are evaluating Percona 5.5 now. We tried MySQL 5.1. We found no advantage and the Gentoo package maintainer found it to be buggy. So, we did not switch. We still use Apache. It works. Damn well. We do use Apache with the worker MPM with PHP which is supposedly bad. But, it works great for us. But, we had to prove it would work. We ran a single node with worker for months before trusting it. Gearman was begrudgingly accepted. The idea of daemonized PHP code was not a comforting one. But once you write a worker and use it, you feel like a god. And then you see the power. Next thing you know, it is a core, mission critical part of your infrastructure. That is how it is with us now. In fact, Gearman has went from untrusted to the go to tech. When someone proposes a solution that does not involve Gearman, someone will ask if part of the problem can be solved using Gearman and not whatever idea they have. There is then a discussion about why it is or is not a good fit. Likewise, if you want to a build a daemon to listen on a port and answer requests, the question is "Why can't you just use Apache and a web service?" And it is a valid question. If you can solve your problem with a web service on already proven tech, why build something new?

This culture is not new. We are not unique. But, in a world of "brogramming" where "engineers" rage on code that is awesome before it is even working and people are said to be "killing it" all the time, I am glad I live in a world where I have to prove myself everyday. I am the most senior engineer on the team. And even still I get shot down. I often pitch an idea in dev chat and someone will shoot it down or point out an obvious flaw. Anyone, and I mean anyone, on the team can question my code, ideas or decisions and I will listen to them and consider their opinion. Heck, people outside the team can question me too. And regularly do. And that is fine. I don't mind the questions. I once wrote here that I like to be made to feel dumb. It is how I get smarter. I have met people that thought they were smarter than everyone else. They were annoying. I have interviewed them. It is hard to even get through those interviews.

Is it for everyone? Probably not. It works for us. And it has gotten us this far. You can't get comfortable though. If you do foster this type of culture, there is a risk of getting comfortable. If you start thinking you have solved all the hard problems, you will look up one day and realize that you are suffering. Keep pushing forward and questioning your past decisions. But before you adopt the latest and greatest new idea, prove that the decisions your team makes are the right ones at every step. Sometimes that will take a five minute discussion and sometimes it will take a month of testing. And other times, everyone in the room will look at something and think "Wow that is so obvious how did we not see it?" When it works, it is an awesome world to live in.

Errors when adding/subtracing dates using seconds

This just came up today again for me. I have said it before, but even I get lazy and forget. When doing math with dates such as adding days it is really quick to think this works:
<?php

$date = "2011-11-01";

// add 15 days

$new_date = date("Y-m-d", strtotime($date) + (86400 * 15));

// $new_date should be 2011-11-16 right?

echo $new_date;

?>
This yields `2011-11-15`. The problem with this is that it assume that there are only 86400 seconds in every day. There are in fact not. On days when the clocks change for daylight savings time, there are either 1 hour more than that or 1 hour less than that. In addition, there are also leap seconds put into our time system to keep us in line with the sun. There is one this year, 2012, on June 30th in fact. Since they don't happen with the regularity that daylight savings time does, it may be easy to forget those. Luckily, for this problem, the solution is the same. You have two choices. And the solution you choose depends on the particular problem you have. For the simple problem above, you can simply let strtotime take care of it for you.
$new_date = date("Y-m-d", strtotime($date." +15 days"));
strtotime() is the most awesome date/time related function in all of computer programming. I have written about it before. It handles all those nasty weird seconds issues. But, if you are not solving a problem this simple or you are reading this and need help with another language, there is another solution. You do all your date math at noon. Simply only run code that does date math during lunch time. No, just joking. That would be silly. What I mean is to adjust all your date/time variables to represent the time at 12:00 hours.
$new_date = date("Y-m-d", strtotime($date." 12:00:00") + 86400 * 15);
Now that you are doing date math at noon, you will be safe for most any date range you are doing math on. Daylight savings time always gives and hour and takes an hour every year, so those cancel each other out. It would take a whole lot of leap seconds to cause the offset from the start date to the end date to shift enough to make this technique no longer work, but it is technically feasible. So, if you are adding hundreds of years to dates, this won't work for you. There, disclaimer added.

UPDATE: I should point out that the examples in this post only apply to US time zones that recognized Daylight Savings Time and the rules for DST as they apply after the changes in 2007.

Check for a TTY or interactive terminal in PHP

Many UNIX tools do different things if they are connected to an interactive terminal, also called a TTY. This can be handy for lots of reasons. I had a use case today that prompted me to find out how to do it in PHP.

Here is the situation. We log errors to the PHP error log. We then have processes that monitor that error log and alert us about any uncaught exceptions or fatal errors very quickly so we can address issues. We also monitor non-fatal errors and alert on those on a less frequent schedule. However, this can be annoying if a user is running some code on a terminal that is generating errors. Let's say I am trying to find out why some file import did not happen. Running the job that is supposed to do it may yield an error. Maybe it was a file permission issue or something. There are other people watching the alerts. What they don't know is that I am running the code and looking at these errors in real time. So, they may start digging into the issue when I am the one causing it and can see it happening already. So, I thought it would be nice, if in my error handler, I could not send errors to the error log that are being sent to an interactive terminal. A few quick searches for "php check for tty" did not find anything. In the end, a coworker cracked open a book to see how it was done in C. That got me on the right path to finding two PHP functions: posix_isatty and posix_ttyname. These seem to do the trick. They take a file descriptor like STDOUT and will tell you if that is an interactive terminal and what the tty name is if it is one.

It took me a few tries to get the full effect I wanted in PHP. The thing I always forget is that fatal errors don't use my error handler. I understand why. The engine is in an unknown state when that happens, so it can't keep running code. In the end I added this code to my auto prepend file which is where my error handler, auto loader and other start up stuff is defined for all PHP code we run, both CLI and Apache. Since PHP will send errors to STDERR if it is defined, that is what we want to check for. If it is defined and it is a TTY, we just disable error logging. I should note that I already check the log_errors ini setting in my error handler before I even call the error_log function.
if(defined("STDERR") && posix_isatty(STDERR)){
    ini_set("log_errors", false);
    ini_set("error_log", null);
}
Hopefully anyone else searching for "php check for tty" or "php interactive terminal" will find this blog post and it will help them out.

Talking about Gearman at Etsy Labs

I find myself flying to New York on Monday for some dealnews related business. Anytime I travel I try and find something fun to do at night. (Watching a movie by myself in Provo, Utah was kinda not that fun.) So, this week I asked on Twitter if anything was happening while I would be in town. Anything would do. A meetup of PHP/MySQL users or some design/css/js related stuff for example. Pretty much anything interesting. Well, later that day I received an IM from the brilliant John Allspaw, Senior VP of Technical Operations at Etsy. He wanted me to swing by the Etsy offices and say hi. Turns out it is only a block away from where I would be. Awesome! He also mentioned that he would like to have me come and speak at their offices some time. That would be neat too. I will have to plan better next time I am traveling up there.

Fast forward another day. I get an email from Kellan Elliott-McCrea, CTO of Etsy wanting to know if I would come to the Etsy offices and talk about Gearman. At first I thought "That is short notice, man. I don't know that I can pull that off." Then I remembered the last time I was asked to speak at an event on short notice based off a recommendation from John Allspaw.

It was in 2008 for some new conference called Velocity. That only turned out to be the best conference I have ever attended. I have been to Velocity every year since and this year took our whole team. In addition, I spoke again in 2009 at Velocity, wrote a chapter for John's book Web Operations that was released at Velocity in 2010 and was invited to take part in the Velocity Summit this year (2011) which helps kick off the planning for the actual conference. The moral of that story for me is: when John Allspaw wants you to take part in something, you do it.

In reality, it was not that tough a decision. Even without John's involvement, I love the chance to talk about geeky stuff. The Etsy and dealnews engineering teams are like two twins separated at birth. Every time we compare notes, we are doing the same stuff. For example, we have been trading Open Source code lately. They are using my GearmanManager and we just started using their statistics collection daemon, statsd. So, speaking to their people about what we do seem like a great opportunity to share and get input.

The event is open to the public. So, if you use Gearman, want to use Gearman, or just want to hear how we use Gearman at dealnews, come here me ramble on about how awesome it is Tuesday night in Dumbo at Etsy Labs. You can RSVP on the event page.

Best Practices for Gearman by Brian Moon
Etsy Labs
55 Washington St. Ste 712
NY 11222

Tuesday, August 09, 2011 from 7:00 PM - 10:00 PM (ET)

PHP Frameworks

Last week I spoke at and attended the first ever PHP Community Conference. It was very good. It was also very different from my normal conference. I usually go for very technical stuff. I don't often stop and smell the roses of the community or history of my chosen (by me or it I am not sure sometimes) profession. There was a lot of community at this one.

One thing that seemed to be a hot topic at the conference was frameworks. CakeDC, the money behind CakePHP was the platinum sponsor. I chatted with Larry Masters the owner of CakeDC for a bit while walking one night. Great guy. Joël Perras gave a tutorial about Lithum. I attended most of this one. He did very well. Joël was frank and honest about the benefits and problems with using frameworks, including having to deal with more than one at a time. There was also a tutorial about Zend Framework patterns by Matthew Weier O'Phinney. I missed this one. On the second day, things were different. Rasmus Lerdorf warned about the bloat in most of the PHP frameworks and expressed hope that they would do better in the newer versions. I received several questions about frameworks in my session. I also spoke out about them a bit. Terry Chay wrapped up the day with his closing keynote and touched on them again. More on that later. I want to kind of summarize what I said (or meant to say).

PHP is a framework

In my session, I talked about the history of Phorum. One of the things I covered was the early days of PHP. Back in the 90s, before PHP, most dynamic web work was done in C or Perl. At that time, in those worlds, you had to do all the HTTP work yourself. If you wanted a content type of text/html, you had to set it, in code, on every single response. Errors in CGI scripts would often result in Apache internal error pages and made debugging very hard. All HTML work had to be done by writing to output. There was no embedding code with HTML even as a templating language. PHP changed all that. You had a default content type of text/html. You had automatic handling of request variables. Cookies were easily ingested and output. You could template your HTML with script instead of having to write everything out via print or stdout. It was amazing. Who could ask for more?

Frameworks as a tool

Well, apparently a world that I honestly don't work in could ask for more. There are three major segments of web developers these days. There are developers that work for a company that has a web site, but its business is not the web site. Maybe it is a network hardware company or some other industry where their business merits having a staff to run their site, but it is not their core business. Then there are developers like myself that work for a company where the web site is the business. Everything about the business goes through the web. We have the public web site and the internal web site our team uses. It is everything. The last type are those developers that are constantly building new sites or updating existing sites for clients. I will be honest, this is not a segment I have considered much in the past when writing code or blog posts. But, I met more of those people at this conference than any of the other two types. They seem to be the ones that are motivated and interested. Or at least, because PHP and the web are their business, they sent their people to the conference.

You see, I have spoken out about frameworks. Not very publicly, but those that know me have heard me comment about them. I have never really seen the point. Why start with something generic that will most likely not fit your ultimate need when you need to scale or expand beyond its abilities? Well, for thousands of web sites, that are likely being built by agencies, that time never occurs. Most likely, before that happens, the site will be redesigned and completely replaced. So, if you spend every day building a new site, why do all that groundwork every time?

In addition, why have to deal with every different client's needs? I often say that Apache is my controller. I don't like to use PHP as my controller. But, if I was deploying a site every week to a different stack, I can't rely on Apache with mod_rewrite or whatever things I rely on in my job today. So, you need to have full control in the application. What database will the client this week use? I don't care, the framework abstracts that for me. These are all very good reasons to use a framework.

Framework Trade-Off

There are some trade-offs though. The biggest one I see is the myriad of choices. Several of the pro-framework people even mentioned that there are a lot out there. And it seems that someone is making a new one everyday. With all these choices, it is likely that some of the benefit you get from a framework could be lost. If a client already has a site based on CakePHP and your agency uses Lithium what do you do? Say no to the work or have to deal with the differences? Some of them are big enough to be a real issue. Some are so small, you may not notice them until it's too late. That is a tough place to be.

The other issue is performance. Frameworks are notoriously inefficient. It has just been their nature. The more you abstract away from the core, the less efficient you are. This is even true with PHP. Terry Chay pointed out that PHP is less efficient than Java or C in his keynote. But, you gain power with PHP in way of quicker development cycles. Frameworks have that same benefit. But, have not solved this issue any better than PHP has over C. They abstract away the low level (for PHP at least) stuff that is going on. And that means loss of efficiency. This can be solved or at least worked on, however, and I hope it is.

Frameworks as a Commodity

So, this gets me back to something Terry Chay said. He talked about the motivation of companies to open source their technology. He used Facebook's Open Compute Project as an example. He pointed out that a major reason Facebook would open up this information would be in hopes that others would do the same in their data centers. If that happened, it would be easier for Facebook to move to a new data center because it was already mostly setup the way they like it.

Transitioning this same thought frameworks, the commoditization here, that I see, is in the interest of developers. If the framework you support becomes the de facto standard, then all those developers working in agencies using it are now ready to come to work for you. Plus, if you are the company behind it, there are opportunities for books, conferences, training, support, and all the other peripherals that come from the commercial/open source interaction. Need proof of that? Look no further than the "PHP Company", Zend. They could have committed developers to PEAR, but instead created Zend Framework. I see job listings very often for Zend Framework experience. Originally Zend tried to monetize the server with their optimizers and Zend Server. They had moderate success. The community came up with APC and XCache that sort of stole their thunder. I feel they have had much better success with Zend Framework in terms of market penetration. The money is with the people that write the code, not run the servers.

Frameworks are EVERYWHERE

I will close with something else that Terry Chay said. This was kind of an aha! moment for me. Terry pointed out that frameworks are everywhere. Wordpress, Drupal and even my project, Phorum, are frameworks. You can build a successful site using just those applications. It is not just the new breed code libraries that can be viewed as frameworks. In fact, Phorum's very own Maurice Makaay is building his new web site using only Phorum 5.3 (current development branch). Phorum offers easier database interaction, output handling, templating, a pluggable module system and even authentication and group based permissions. Wow, I have always kind of shunned this idea. In fact, when Maurice first showed me his site, I kind of grimaced. Why would you want to do that? You know why? Because the main thing that drives his site is Phorum. His users come to the site for Phorum. So, why would he want to install Phorum, invest in making it all it can be and then have to start from scratch for all the other parts of the site that are not part of the message board. Duh, I kind of feel stupid for never looking at things from this perspective before. Feeling dumb is ok. I get smarter when I feel dumb. New ideas make me a better developer. And I hope that is what comes out of this experience for me. You never know, I may throw my name in this hat and see how Phorum's groundwork could be useful outside of the application itself.