Lock Wait Timeout Errors or Leave Your Data on the Server

Wed, Jun 27, 2012 01:12 AM
If you use MySQL with InnoDB (most everyone) then you will likely see this error at some point. There is some confusion sometimes about what this means. Let me try and explain it.

Let's say we have a connection called A to the database. Connection A tries to update a row. But, it receives a lock wait timeout error. That does not mean that connection A did anything wrong. It means that another connection, call it B, is also updating a row that connection A wants to update. But, connection B has an open transaction that has not been committed yet. So, MySQL won't let you update that row from connection A. Make sense?

The first mistake people may make is looking at the code that throws the error to find a solution. It is hardly ever the code that throws the error that is the problem. In our case, it was code that was doing a simple insert into a table. I had a look at our processing logs around the time that the errors were thrown and I found a job that was running during that time. I then looked for code in that job that updates the table that was locked. This was where the problem lied.

So, why does this happen? Well, there can be very legitimate reasons. There can also be very careless reasons. The genesis of this blog post was some code that appeared to be legitimate at first, but upon further inspection was careless. This is basically what the code did.

  1. Start Transaction on database1
  2. Clear out some old data from the table
  3. Select a bunch of data from database2.table
  4. Loop in PHP, updating each row in its own query to update one column
  5. Select a bunch of data from database2.other_table
  6. Loop in PHP, updating each row in its own query to update another column
  7. Commit database1

This code ran in about 20 minutes on the data set we had. It kept a transaction open the whole time. It appeared legit at first because you can't join the data as there are sums and counts going on that have a one to many relationship which would cause some duplication of the sums and counts. It also looks legit because you are having to pull data from one database into another. However, there is a solution for this. We need to stop pulling all this data into PHP land and let it stay on the server where it lives. So, I changed it to this.

  1. Create temp table on database2 to hold mydata
  2. Select data from database2.table into my temp table
  3. Select data from database2.other_table into my temp table
  4. Move my temp table using extended inserts via PHP from database2 to database1
  5. Start Transaction on database1
  6. Clear out some old data from the table
  7. Do a multi-table bulk update of my real table using the temp table
  8. Commit database1

This runs in 3 minutes and only requires a 90 second transaction lock. Our lock wait timeout on this server is 50 seconds though. However, we have a 3 time retry rule for any lock wait timeout in our DB code. So, this should allow for our current workload to be processed without any data loss.

So, why did this help so much? We are not moving data from MySQL to PHP over and over. This applies to any language, not just PHP. The extended inserts for moving the temp table from one db to another really help. That is the fastest part of the whole thing. It moves about 2 million records from one to the other in about 1.5 seconds.

So, if you see a lock wait timeout, don't think you should sleep longer between retries. And don't dissect the code that is throwing the error. You have to dig in and find what else is running when it happens. Good luck.

Bonus: If you have memory issues in your application code, these techniques can help with those too.

Stop comparing stuff you don't understand

Mon, Jun 25, 2012 11:09 PM
I normally don't do this. When I see someone write a blog post I don't agree with, I often just dismiss it and go on. But, this particular one caught my attention. It was titled PHP vs Node.js: Yet Another Versus. The summary was:

Node.js = PHP + Apache + Memcached + Gearman - overhead

What the f**k? Are you kidding me? Clearly this person has NEVER used memcached or Gearman in a production environment that had any actual load.

Back in the day, when URLs and filesystems had a 1:1 mapping, it made perfect sense to have a web server separate from the language it is running. But, nowadays, any PHP app with attractive URLs running behind the Apache web server is going to need a .htaccess file, which tells the server a regular expression to check before serving up a file. Sound complex and awkward with unnecessary overhead? That’s because it is.

Node has a web server built in. Some people call this a bad thing, I call those people crazy. Having the server built in means that you don’t have the awkward .htaccess config thing going on. Every request is understood to go through the same process, without having to hunt through the filesystem and figure out which script to run.
He believes that PHP inside Apache requires a .htaccess file. Welcome to 1999. I have not used a .htaccess file since then. Anyone that cares at all about scaling Apache would disable .htaccess files. And as for running regexs, how does he propose you decide your code path in the controller of his Node.js code? Something somewhere has to decide what code is going to answer a given request. mod_rewrite is wire speed fast and compiled in C. Javascript nor PHP code could ever beat that.

The official website is quite ugly and outdated.
Really? You choose your tools based on that? I don't know what to say.

Since a PHP process starts, does some boilerplate work, performs the taks the user actually wants, and then dies, data is not persistent in memory. You can keep this data persistent using third party tools like Memcache or traditional database, but then there is the overhead of communicating with those external processes.
He clearly has no understanding of how memcached is supposed to be used. You don't put things in Memcached so you can use it on the next request on this server. You put things in memcached so you can use it in any request on any server in your server pool. If you just have one web server, you can write Perl CGI scripts. Performance and up time is not important to you. If you want to share things across requests in PHP, APC and XCache fill the need very well.

The number one bottleneck with web apps is not the time it takes to calculate CPU hungry operations, but rather network I/O. If you need to respond to a client request after making a database call and sending an email, you can perform the two actions and respond when both are complete.
This sums up the mythical magic of Node.js. People think just because your code is not "running" that somehow the server is not doing anything. The process does not get to go do other shit. No, that would be multi-threaded. And Node.js is not multi-threaded. It is single threaded. That means the process can only be doing one thing at a time. If you are waiting on a DB call, you are waiting. I don't care what world you think you live in. You are waiting on that DB call. How your code is structured is irrelevant to how computers actually work. The event driven nature that is Node.js is much like OOP. You are abstracting yourself from how computers really work. The further you get from that, the less you will be able to control the computer.
Node.js is a very new, unstable, untested platform. If you are going to be building a large corporate scale app with a long lifetime, Node.js is not a good solution. The Node API’s are changing almost daily, and large/longterm apps will need to be rewritten often.
So, if I plan on making a living, don't use Node.js. Got it. We finally agree on something.
Being so new, it doesn’t have a lot of baggage leftover from days of old. Having a server built in, the stack is a lot simpler, there are less points of failure, and there is more control over what you can do with HTTP responses (ever try overwriting the name of the web server using PHP?).
No, why would you? It is not the web server. Apache or nginx would be your web server.
  • Are you building some sort of daemon? Use Node.
  • Are you making a content website? Use PHP.
  • Do you want to share data between visitors? Use Node.
  • Are you a beginner looking to make a quick website? Use PHP.
  • Are you going to run a bunch of code in parallel? Use Node.
  • Are you writing software for clients to run on shared hosts? Use PHP.
  • Do you want push events from the server to the client using websockets? Use Node.
  • Does your team already know PHP? Use PHP.
  • Does your team already know frontend JavaScript? Node would be easier to learn.
  • Are you building a command line script? Both work.
Yes, of course you would not build a daemon in PHP. Do you plan to share that same data across servers? He already told us Node is not multi-threaded, so how can it run code in parallel? Websockets have a ton of their own pain to deal with that is not even related to Node vs. PHP. Things like proxies. If I was building a command line tool and wanted to use Javascript, I would just use V8.

Listen, I write code in PHP and JavaScript all day. I also use some Ruby, Lua and even dabble in C. I am not a language snob. Use what works for you. I do however take exception when people write about things they clearly have no idea about. He claims to have written a lot of PHP. He clearly has never deployed a lot of PHP in environments that matter. If you are building small sites that don't have a lot of traffic, you can use anything. If you are building massive sites that have to scale, any technology is going to require a full understanding of what it takes to scale it out. I leave you with this wisdom that I am reminded of by his blog post.

PHP Coding Standards

Fri, May 25, 2012 07:27 PM
Update: Matthew Weier O'Phinney, one of the core members of the group, has cleared up the naming history in the comments.

During the /dev/hell podcast at Tek12, someone asked the guys their opinion about PSR. I did not know what PSR was by that name. A quick search lead me to the Google Group named PHP Standards Working Group. I had vaguely remembered a consortium of frameworks, libraries and applications that were organizing to attempt to make their projects cooperate better. But, this did not sound like the same project. Another search and I found the PHP Framework Interoperability Group on Github. A bit more searching led me to a post where apparently the PHP FIG changed their name at some point citing people not knowing what FIG meant. But, this is not a history post. The group had done some work on setting a standard for auto loaders in PHP. This is a very good thing and much needed. That is a real thing that impacts real developers.

The person asking the question had asked about PSR1 and PSR2. These are the first two standards proposals in the group and they deal with coding standards. There were mixed feelings in the room about the proposals. I asked (being me, probably with very little tact) why in 2012 were a group of really smart people still discussing coding standards such as tabs vs. spaces. Because this is what immediately came to mind for me.


Source: http://xkcd.com/927/

There are already coding standards for PHP and any other language out there. Why does anyone need to make a new one? For Phorum we chose the PEAR standard (ok, with 2 minor modifications). On top of that, every one of the projects in this group already have coding standards. Why not just pick one of those? Are 10 projects that currently have their own standards going to actually all change to something else? I highly doubt it. My guess is that, at best, they will all end up with a modified version of the groups standards.

This reminds me a lot of Open Source licenses. There are tons of these things. And in the end, most (GPL has its issues I know) of the open source licenses represent the same idea. I suppose you could say that most of all of them fall into GPL like or BSD like. Anyhow, I quit worrying about having my own license years ago. I now just use a BSD style license that you can generate with several online BSD license generators.

When I voiced my concern about what is, in my opinion, a waste of very smart people's time, my good friend Cal Evans (He has bled in my car. So, I think he is my friend. And I hope he feels the same.) said that I was misunderstanding the point of the group. It was a group of projects that were collaborating to try and use similar standards and practices to make the PHP OSS community better. And that is exactly what I thought PHP FIG was. However, the group name is now "PHP Standards Working Group". That reminds me of the W3C HTML Working Group. And in my mind that means a group that is deciding the future of a technology. In addition the proposal being discussed is titled "PSR-1, a standard coding convention for PHP". If you pair that with the name of the group, it sounds very authoritative. And I don't think that is by accident. If I was heading up such an effort, I would hope that every PHP developer on the planet would follow it too. If you saw Terry Chay's keynote at the PHP Community Conference last year, he talked about frameworks and platforms. He pointed out that the reason people like Facebook were sharing their data center technology was in hopes that people would start using it and it would become common. Thus meaning the equipment they are custom building would be cheaper and people they hire would already be familiar with it. But, if the point of the group is *only* cooperation between lage OSS PHP projects, I wish they would pick a name that is more indicative of that. As it stands, when I landed on the page, my immediate assumption was that this groups intention was to dictate to the rest of the PHP world how to write their PHP code.

In the end, cooperation is good. And if these guys want to cooperate I say more power to them. I just hope they get into really good things soon. Like, can we talk about a maximum number of files, functions or classes used for any one single page execution? *That* would be valuable to the PHP community. I can deal with funny formatting. I can't deal with poorly performing code that his dragged down in abstracton and extension. Or how about things like *never* running queries inside loops that are reading results from another query. That would be a great thing to make examples of and show people the best practices. Tabs vs. spaces? That should have been solved 10 years ago. When in doubt, PHP code should do what the PHP core does. This is PHP we are talking about. Would it not make sense to have the people who write PHP writing code that is somewhat similar in style to those that make PHP? C and PHP syntax are very, very similar. So, why don't we all just refer to the PHP CODING_STANDARDS file when in doubt and not even worry with the little stuff that does not affect performance?

So, as of a few minutes ago, I have joined the group. If for no other reason, just to see what is discussed. Perhaps I should follow the advice I give people when they ask for features in my projects and do something about my issues and worries.

Living in the Prove It Culture

Tue, Mar 6, 2012 10:45 PM
Engineering cultures differ from shop to shop. I have been in the same culture for 13 years so I am not an expert on what all the different types are. Before that I was living in Dilbert world. The culture there was really weird. The ideas were never yours. It was always some need some way off person had. A DBA, a UI "expert" and some product manager would dictate what code you wrote. Creativity was stifled and met with resistance.

I then moved to the early (1998) days of the web. It was a start up environment. In the beginning there were just two of us writing code. So, we thought everything we did was awesome. Then we added some more guys. Lucky for us we mostly hired well. The good hires where type A personalities that had skills we didn't have. They challenged us and we challenged them. On top of that, we had a CEO who had been a computer hacker in his teens. So, he had just enough knowledge to challenge us as well. Over the years we kept hiring more and more people. We always asked in the interview if the person could take criticism and if they felt comfortable defending their ideas. We decided to always have a white board session. We would ask them questions and have them work it out on a white board or talk it out with us in a group setting. The point of this was not to see if they always knew the answer. The point was to see how they worked in that setting. Looking back, the hires that did not work out also did not excel in that phase of the interview. The ones that have worked out always questioned our methods in the interview. They did not belittle our methods or dismiss them. They just asked questions. They would ask if we had tried this or that. Even if we could quickly explain why our method was right for us, they still questioned it. They challenged us.

When dealing with people outside the engineering team, we subconsciously applied these same tactics. The philosophy came to be that if you came to us with an idea, you had to throw it up on the proverbial wall. We would then try to knock it down. If it stuck, it was probably a good idea. Some people could handle this and some could not. The ones that could not handle that did not always get their ideas pushed through. It may not mean they were bad ideas. And that is maybe the down side of this culture. But, it has worked pretty well for us.

We apply this to technology too. My first experience on Linux was with RedHat. The mail agent I cut my teeth on was qmail. I used djbdns. When Daniel Beckham, our now director of operations, came on, he had used sendmail and bind. He immediately challenged qmail. I went through some of the reasons I prefered it. He took more shots. In the end, he agreed that qmail was better than sendmail. However, his first DNS setup for us was bind. It took a few more years of bind hell for him to come around to djbdns.

When RedHat splintered into RedHat Enterprise and Fedora, we tried out Fedora on one server. We found it to be horribly unstable. It got the axe. We looked around for other distros. We found a not very well known distro that was known as the ricer distro of the Linux world called Gentoo. We installed it on one server to see what it was all about. I don't remember now whose idea it was. Probably not mine. We eventually found it to be the perfect distro for us. It let us compile our core tools like Apache, PHP and MySQL while at the same time using a package system. We never trusted RPMs for those things on RedHat. Sure, bringing a server online took longer but it was so worth it. Eventually we bought in and it is now the only distro in use here.

We have done this over and over and over. From the fact that we all use Macs now thanks to Daniel and his willingness to try it out at our CEO's prodding to things like memcached, Gearman, etc. We even keep evaluating the tools we already have. When we decided to write our own proxy we discounted everything we knew and evaluated all the options. In the end, Apache was known and good at handling web requests and PHP could do all we needed in a timely, sane manner. But, we looked at and tested everything we could think of. Apache/PHP had to prove itself again.

Now, you might think that a culture of skepticism like this would lead to new employees having a hard time getting any traction. Quite the opposite. Because we hire people that fit the culture, they can have a near immediate impact. We have a problem I want solved and a developer that has been here less than a year suggested that Hadoop may be a solution, but was not sure we would use it. I recently sent this in an email to the whole team in response to that.
The only thing that is *never* on the table is using a Windows server. If you can get me unique visitors for an arbitrary date range in milliseconds and it require Hadoop, go for it.
You see, we don't currently use Hadoop here. But, if that is what it takes to solve my problem and you can prove it and it will work, we will use it.

Recently we had a newish team member suggest we use a SAN for our development servers to use as a data store. Specifically he suggested we could use it to house our MySQL data for our development servers. We told him he was insane. SANs are magical boxes of pain. He kept pushing. He got Dell to come in and give us a test unit. Turns out it is amazing. We can have a hot copy of our production database on our dev slices in about 3 minutes. A full, complete copy of our production database in 3 minutes. Do you know how amazing that is? Had we not had the culture we do and had not hired the right person that was smart enough to pull it off and confident enough to fight for the solution, we would not have that. He has been here less than a year and has had a huge impact to our productivity. There is talk of using this in production too. I am still in the "prove it" mode on this. We will see.
I know you will ask how our dev db works, here you go:
1. Replicate production over VPN to home office
2. Write MySQL data on SAN
3. Stop replication, flush tables, snapshot FS
4. Copy snapshot to a new location
5. On second dev server, umount SAN, mount new snapshot
6. Restart MySQL all around
7. Talk in dev chat how bad ass that is

We had a similar thing happen with our phone system. We had hired a web developer that previously worked for a company that created custom Asterisk solutions. When our propietary PBX died, he stepped up and proved that Asterisk would work for us. Not a job for a web developer. But he was confident he could make it work. It now supports 3 offices and several home bound users world wide. He also had only been here a short time when that happened.

Perhaps it sounds like a contradiction. It may sound like we just hop on any bandwagon technology out there. But no. We still use MySQL. We are still on 5.0 in fact. It works. We are evaluating Percona 5.5 now. We tried MySQL 5.1. We found no advantage and the Gentoo package maintainer found it to be buggy. So, we did not switch. We still use Apache. It works. Damn well. We do use Apache with the worker MPM with PHP which is supposedly bad. But, it works great for us. But, we had to prove it would work. We ran a single node with worker for months before trusting it. Gearman was begrudgingly accepted. The idea of daemonized PHP code was not a comforting one. But once you write a worker and use it, you feel like a god. And then you see the power. Next thing you know, it is a core, mission critical part of your infrastructure. That is how it is with us now. In fact, Gearman has went from untrusted to the go to tech. When someone proposes a solution that does not involve Gearman, someone will ask if part of the problem can be solved using Gearman and not whatever idea they have. There is then a discussion about why it is or is not a good fit. Likewise, if you want to a build a daemon to listen on a port and answer requests, the question is "Why can't you just use Apache and a web service?" And it is a valid question. If you can solve your problem with a web service on already proven tech, why build something new?

This culture is not new. We are not unique. But, in a world of "brogramming" where "engineers" rage on code that is awesome before it is even working and people are said to be "killing it" all the time, I am glad I live in a world where I have to prove myself everyday. I am the most senior engineer on the team. And even still I get shot down. I often pitch an idea in dev chat and someone will shoot it down or point out an obvious flaw. Anyone, and I mean anyone, on the team can question my code, ideas or decisions and I will listen to them and consider their opinion. Heck, people outside the team can question me too. And regularly do. And that is fine. I don't mind the questions. I once wrote here that I like to be made to feel dumb. It is how I get smarter. I have met people that thought they were smarter than everyone else. They were annoying. I have interviewed them. It is hard to even get through those interviews.

Is it for everyone? Probably not. It works for us. And it has gotten us this far. You can't get comfortable though. If you do foster this type of culture, there is a risk of getting comfortable. If you start thinking you have solved all the hard problems, you will look up one day and realize that you are suffering. Keep pushing forward and questioning your past decisions. But before you adopt the latest and greatest new idea, prove that the decisions your team makes are the right ones at every step. Sometimes that will take a five minute discussion and sometimes it will take a month of testing. And other times, everyone in the room will look at something and think "Wow that is so obvious how did we not see it?" When it works, it is an awesome world to live in.

/dev/hell Podcast Episode #5

Fri, Feb 3, 2012 12:12 PM
I was privileged to be invited to be a part of the /dev/hell podcast this week. Thanks to Chris and Ed for having me on. Check it out. And subscribe to their podcast.

Errors when adding/subtracing dates using seconds

Mon, Jan 16, 2012 03:55 PM
This just came up today again for me. I have said it before, but even I get lazy and forget. When doing math with dates such as adding days it is really quick to think this works:
<?php

$date = "2011-11-01";

// add 15 days

$new_date = date("Y-m-d", strtotime($date) + (86400 * 15));

// $new_date should be 2011-11-16 right?

echo $new_date;

?>
This yields `2011-11-15`. The problem with this is that it assume that there are only 86400 seconds in every day. There are in fact not. On days when the clocks change for daylight savings time, there are either 1 hour more than that or 1 hour less than that. In addition, there are also leap seconds put into our time system to keep us in line with the sun. There is one this year, 2012, on June 30th in fact. Since they don't happen with the regularity that daylight savings time does, it may be easy to forget those. Luckily, for this problem, the solution is the same. You have two choices. And the solution you choose depends on the particular problem you have. For the simple problem above, you can simply let strtotime take care of it for you.
$new_date = date("Y-m-d", strtotime($date." +15 days"));
strtotime() is the most awesome date/time related function in all of computer programming. I have written about it before. It handles all those nasty weird seconds issues. But, if you are not solving a problem this simple or you are reading this and need help with another language, there is another solution. You do all your date math at noon. Simply only run code that does date math during lunch time. No, just joking. That would be silly. What I mean is to adjust all your date/time variables to represent the time at 12:00 hours.
$new_date = date("Y-m-d", strtotime($date." 12:00:00") + 86400 * 15);
Now that you are doing date math at noon, you will be safe for most any date range you are doing math on. Daylight savings time always gives and hour and takes an hour every year, so those cancel each other out. It would take a whole lot of leap seconds to cause the offset from the start date to the end date to shift enough to make this technique no longer work, but it is technically feasible. So, if you are adding hundreds of years to dates, this won't work for you. There, disclaimer added.

UPDATE: I should point out that the examples in this post only apply to US time zones that recognized Daylight Savings Time and the rules for DST as they apply after the changes in 2007.

Check for a TTY or interactive terminal in PHP

Thu, Sep 1, 2011 11:42 PM
Many UNIX tools do different things if they are connected to an interactive terminal, also called a TTY. This can be handy for lots of reasons. I had a use case today that prompted me to find out how to do it in PHP.

Here is the situation. We log errors to the PHP error log. We then have processes that monitor that error log and alert us about any uncaught exceptions or fatal errors very quickly so we can address issues. We also monitor non-fatal errors and alert on those on a less frequent schedule. However, this can be annoying if a user is running some code on a terminal that is generating errors. Let's say I am trying to find out why some file import did not happen. Running the job that is supposed to do it may yield an error. Maybe it was a file permission issue or something. There are other people watching the alerts. What they don't know is that I am running the code and looking at these errors in real time. So, they may start digging into the issue when I am the one causing it and can see it happening already. So, I thought it would be nice, if in my error handler, I could not send errors to the error log that are being sent to an interactive terminal. A few quick searches for "php check for tty" did not find anything. In the end, a coworker cracked open a book to see how it was done in C. That got me on the right path to finding two PHP functions: posix_isatty and posix_ttyname. These seem to do the trick. They take a file descriptor like STDOUT and will tell you if that is an interactive terminal and what the tty name is if it is one.

It took me a few tries to get the full effect I wanted in PHP. The thing I always forget is that fatal errors don't use my error handler. I understand why. The engine is in an unknown state when that happens, so it can't keep running code. In the end I added this code to my auto prepend file which is where my error handler, auto loader and other start up stuff is defined for all PHP code we run, both CLI and Apache. Since PHP will send errors to STDERR if it is defined, that is what we want to check for. If it is defined and it is a TTY, we just disable error logging. I should note that I already check the log_errors ini setting in my error handler before I even call the error_log function.
if(defined("STDERR") && posix_isatty(STDERR)){
    ini_set("log_errors", false);
    ini_set("error_log", null);
}
Hopefully anyone else searching for "php check for tty" or "php interactive terminal" will find this blog post and it will help them out.

Talking about Gearman at Etsy Labs

Fri, Aug 5, 2011 08:11 AM
I find myself flying to New York on Monday for some dealnews related business. Anytime I travel I try and find something fun to do at night. (Watching a movie by myself in Provo, Utah was kinda not that fun.) So, this week I asked on Twitter if anything was happening while I would be in town. Anything would do. A meetup of PHP/MySQL users or some design/css/js related stuff for example. Pretty much anything interesting. Well, later that day I received an IM from the brilliant John Allspaw, Senior VP of Technical Operations at Etsy. He wanted me to swing by the Etsy offices and say hi. Turns out it is only a block away from where I would be. Awesome! He also mentioned that he would like to have me come and speak at their offices some time. That would be neat too. I will have to plan better next time I am traveling up there.

Fast forward another day. I get an email from Kellan Elliott-McCrea, CTO of Etsy wanting to know if I would come to the Etsy offices and talk about Gearman. At first I thought "That is short notice, man. I don't know that I can pull that off." Then I remembered the last time I was asked to speak at an event on short notice based off a recommendation from John Allspaw.

It was in 2008 for some new conference called Velocity. That only turned out to be the best conference I have ever attended. I have been to Velocity every year since and this year took our whole team. In addition, I spoke again in 2009 at Velocity, wrote a chapter for John's book Web Operations that was released at Velocity in 2010 and was invited to take part in the Velocity Summit this year (2011) which helps kick off the planning for the actual conference. The moral of that story for me is: when John Allspaw wants you to take part in something, you do it.

In reality, it was not that tough a decision. Even without John's involvement, I love the chance to talk about geeky stuff. The Etsy and dealnews engineering teams are like two twins separated at birth. Every time we compare notes, we are doing the same stuff. For example, we have been trading Open Source code lately. They are using my GearmanManager and we just started using their statistics collection daemon, statsd. So, speaking to their people about what we do seem like a great opportunity to share and get input.

The event is open to the public. So, if you use Gearman, want to use Gearman, or just want to hear how we use Gearman at dealnews, come here me ramble on about how awesome it is Tuesday night in Dumbo at Etsy Labs. You can RSVP on the event page.

Best Practices for Gearman by Brian Moon
Etsy Labs
55 Washington St. Ste 712
NY 11222

Tuesday, August 09, 2011 from 7:00 PM - 10:00 PM (ET)

PHP Frameworks

Mon, Apr 25, 2011 10:00 AM
Last week I spoke at and attended the first ever PHP Community Conference. It was very good. It was also very different from my normal conference. I usually go for very technical stuff. I don't often stop and smell the roses of the community or history of my chosen (by me or it I am not sure sometimes) profession. There was a lot of community at this one.

One thing that seemed to be a hot topic at the conference was frameworks. CakeDC, the money behind CakePHP was the platinum sponsor. I chatted with Larry Masters the owner of CakeDC for a bit while walking one night. Great guy. Joël Perras gave a tutorial about Lithum. I attended most of this one. He did very well. Joël was frank and honest about the benefits and problems with using frameworks, including having to deal with more than one at a time. There was also a tutorial about Zend Framework patterns by Matthew Weier O'Phinney. I missed this one. On the second day, things were different. Rasmus Lerdorf warned about the bloat in most of the PHP frameworks and expressed hope that they would do better in the newer versions. I received several questions about frameworks in my session. I also spoke out about them a bit. Terry Chay wrapped up the day with his closing keynote and touched on them again. More on that later. I want to kind of summarize what I said (or meant to say).

PHP is a framework

In my session, I talked about the history of Phorum. One of the things I covered was the early days of PHP. Back in the 90s, before PHP, most dynamic web work was done in C or Perl. At that time, in those worlds, you had to do all the HTTP work yourself. If you wanted a content type of text/html, you had to set it, in code, on every single response. Errors in CGI scripts would often result in Apache internal error pages and made debugging very hard. All HTML work had to be done by writing to output. There was no embedding code with HTML even as a templating language. PHP changed all that. You had a default content type of text/html. You had automatic handling of request variables. Cookies were easily ingested and output. You could template your HTML with script instead of having to write everything out via print or stdout. It was amazing. Who could ask for more?

Frameworks as a tool

Well, apparently a world that I honestly don't work in could ask for more. There are three major segments of web developers these days. There are developers that work for a company that has a web site, but its business is not the web site. Maybe it is a network hardware company or some other industry where their business merits having a staff to run their site, but it is not their core business. Then there are developers like myself that work for a company where the web site is the business. Everything about the business goes through the web. We have the public web site and the internal web site our team uses. It is everything. The last type are those developers that are constantly building new sites or updating existing sites for clients. I will be honest, this is not a segment I have considered much in the past when writing code or blog posts. But, I met more of those people at this conference than any of the other two types. They seem to be the ones that are motivated and interested. Or at least, because PHP and the web are their business, they sent their people to the conference.

You see, I have spoken out about frameworks. Not very publicly, but those that know me have heard me comment about them. I have never really seen the point. Why start with something generic that will most likely not fit your ultimate need when you need to scale or expand beyond its abilities? Well, for thousands of web sites, that are likely being built by agencies, that time never occurs. Most likely, before that happens, the site will be redesigned and completely replaced. So, if you spend every day building a new site, why do all that groundwork every time?

In addition, why have to deal with every different client's needs? I often say that Apache is my controller. I don't like to use PHP as my controller. But, if I was deploying a site every week to a different stack, I can't rely on Apache with mod_rewrite or whatever things I rely on in my job today. So, you need to have full control in the application. What database will the client this week use? I don't care, the framework abstracts that for me. These are all very good reasons to use a framework.

Framework Trade-Off

There are some trade-offs though. The biggest one I see is the myriad of choices. Several of the pro-framework people even mentioned that there are a lot out there. And it seems that someone is making a new one everyday. With all these choices, it is likely that some of the benefit you get from a framework could be lost. If a client already has a site based on CakePHP and your agency uses Lithium what do you do? Say no to the work or have to deal with the differences? Some of them are big enough to be a real issue. Some are so small, you may not notice them until it's too late. That is a tough place to be.

The other issue is performance. Frameworks are notoriously inefficient. It has just been their nature. The more you abstract away from the core, the less efficient you are. This is even true with PHP. Terry Chay pointed out that PHP is less efficient than Java or C in his keynote. But, you gain power with PHP in way of quicker development cycles. Frameworks have that same benefit. But, have not solved this issue any better than PHP has over C. They abstract away the low level (for PHP at least) stuff that is going on. And that means loss of efficiency. This can be solved or at least worked on, however, and I hope it is.

Frameworks as a Commodity

So, this gets me back to something Terry Chay said. He talked about the motivation of companies to open source their technology. He used Facebook's Open Compute Project as an example. He pointed out that a major reason Facebook would open up this information would be in hopes that others would do the same in their data centers. If that happened, it would be easier for Facebook to move to a new data center because it was already mostly setup the way they like it.

Transitioning this same thought frameworks, the commoditization here, that I see, is in the interest of developers. If the framework you support becomes the de facto standard, then all those developers working in agencies using it are now ready to come to work for you. Plus, if you are the company behind it, there are opportunities for books, conferences, training, support, and all the other peripherals that come from the commercial/open source interaction. Need proof of that? Look no further than the "PHP Company", Zend. They could have committed developers to PEAR, but instead created Zend Framework. I see job listings very often for Zend Framework experience. Originally Zend tried to monetize the server with their optimizers and Zend Server. They had moderate success. The community came up with APC and XCache that sort of stole their thunder. I feel they have had much better success with Zend Framework in terms of market penetration. The money is with the people that write the code, not run the servers.

Frameworks are EVERYWHERE

I will close with something else that Terry Chay said. This was kind of an aha! moment for me. Terry pointed out that frameworks are everywhere. Wordpress, Drupal and even my project, Phorum, are frameworks. You can build a successful site using just those applications. It is not just the new breed code libraries that can be viewed as frameworks. In fact, Phorum's very own Maurice Makaay is building his new web site using only Phorum 5.3 (current development branch). Phorum offers easier database interaction, output handling, templating, a pluggable module system and even authentication and group based permissions. Wow, I have always kind of shunned this idea. In fact, when Maurice first showed me his site, I kind of grimaced. Why would you want to do that? You know why? Because the main thing that drives his site is Phorum. His users come to the site for Phorum. So, why would he want to install Phorum, invest in making it all it can be and then have to start from scratch for all the other parts of the site that are not part of the message board. Duh, I kind of feel stupid for never looking at things from this perspective before. Feeling dumb is ok. I get smarter when I feel dumb. New ideas make me a better developer. And I hope that is what comes out of this experience for me. You never know, I may throw my name in this hat and see how Phorum's groundwork could be useful outside of the application itself.

What is next for message board software?

Thu, Feb 24, 2011 07:00 AM
When I was hired at dealnews.com in 1998, my primary focus was to get our message board (Phorum) up to speed. I had written the first version as a side project for the site. Message boards were a lot simpler back then. Matt's WWWBoard was the gold standard of the time. And really, the functionality has been only evolutionary since. We added attachments to Phorum in 2003 or something. That was a major new feature. In Phorum 5 we added a module system that was awesome. But, that was just about the admin and not the user. From the user's perspective, message boards have not changed much since 1997. I saw this tweet from Amy Hoy and it got me to thinking about how message boards work. Here is the typical user experience:
  1. Go to the message board
  2. See a list of categories
  3. Drill down to the category they want to read
  4. Scroll through a list of messages that are in reverse cronological order by original post date or most recent post date
  5. Click a message and read it.
  6. Go to #3, repeat
Every message board software package pretty much works like that and has for over 10 years. And it kind of sucks. What a user would probably rather experience is:
  1. Go to the message board
  2. The most interesting things (to this user) are listed right there on the page. No drill down needed.
  3. Click one and read it.
  4. Goto #2, repeat.
Sounds easy? That #2 is easy to type but very hard to accomplish. I think it is conceivably doable if you are running a site that has all the data. Stackoverflow comes close. When you land on the site, they default the page to the "interesting" posts. However, they are not always interesting to me. They are making general assumptions about their audience. For example, right now, the first one is tagged "delphi". I could care less about that language and any posts about it. Its a good try, but misses by oh so far. This is not a Stackoverflow hate post. They are doing a good job. So, what do I do when I land there? I ignore the front page and click Tags (#2 in the first list), then pick a tag I want to read about (#3 in the first list). Low and behold the page I get is "newest". So, I end up doing exactly what is in the first list I mentioned. They do offer other sort options. But, they chose newest as the default. And from years of watching user behavior, 80% - 90% of people go with the good ol' default. This kind of brings me to another point though about the types of message boards there are.

Stackoverflow is a classic example of a help message board. People come there and ask a question. Other people come along and answer the question. Then more people come along and vote on whether the answers (and questions) are any good. This is one really nice feature that I think will have to become a core feature in any message board of the future. The signal to noise ratio can get so out of whack, you need human input to help decide what is good and what is noise. I think the core of the application has to rely on that if we are ever going to achieve the desired experience.

The second type of message board is a conversational system. It is almost like a delayed chat room. People come to a message board and post about their cat or asking who watched a TV show, that kind of thing. This has a completely different dynamic to it than the help message board. You can't really vote if a post is good or bad. The obvious exception being spam would of course want to be recognized and dealt with.

So, how do you know what content is desirable for the user that is entering the site right now? This concept has already been laid out for us: the social graph. You have to give users a way to associate with other users. If Bob really likes Tom's posts, he is probably more interested to read Tom's post from 30 minutes ago than some new guy that just joined the site and posted 1 minute ago. The challenge here is getting people to interconnect...but not too much. Everyone has that aunt on Facebook that follows you, your roommate and anybody else she can. She would follow your dog if he had a Facebook account. So, those people would still get a crappy experience if the whole system relied on the social graph. The other side is the people that will never "follow", "like" or whatever you call it another person. Their experience would lack as well. One key ingredient here is that you need to own this data. You can't just throw like buttons and Facebook connect on your message board and think you can leverage that data. That data is for Facebook, not you. I think the help message boards could benefit from the social graph as well.

Another aspect of what is most important to a user is discussions they are involved in. That could mean ones they started, ones they have replied to or simply ones they have read. Which of those aspects are more important than the others? Clearly if you started a discussion and someone has replied, that is going to interest you. If you posted a reply, you may be done with the topic or you may be waiting on a response. It would take some serious natural language algorithms to decide which is the case. For things you have read, I think you have to consider how many times the user has read the discussion. If every time it is updated they read it, they probably will want to read it again the next time it is updated. If they have only read it once, maybe they are not as interested.

The last aspect of message boards is grouping things. This is the part I actually struggle with the most. The easy first answer is tagging. Don't force the user down a single trail, let them tag posts instead of only posting them in one neat contained area. That gets you half way there. Let's use Stackoverflow (I really do like the site) as an example again. The first thing I do is go to Tags and click on PHP. I like helping people with PHP problems. So,  is that really any different from categorization? Sure, there could be someone out there that really likes helping with Javascript. And if the same post was tagged with both tags then their coverage of potential help is larger. But, some of the time those tags are wrong when they tag it with more than one tag. The problem they need help with is either PHP or Javascript, most likely not both. They just don't know what they are doing. For example, there is this post on Stackoverflow. The user tagged it PHP and database-design. There is no PHP in the question. I am guessing he is using PHP for the app. But, it really never comes up and he is only talking about database design. So, who did the PHP tag help there? I don't think it helped him. And it only wasted my time. Having written all that, a free-for-all approach where there is no filtering sucks too. ARGH! It just all sucks. That brings us back to what Amy said in a way. Perhaps moderated tagging is an answer. I have not seen a way on Stackoverflow to untag a post. That would let people correct others. I am gonna write that down. If you work at Stackoverflow and are reading this, you can use that idea. Just put a comment in the code about how brilliant I am or something that aliens will find one day.

So, I am done. I know exactly what to do right? I just have to make code that does everything I put in the previous paragraphs. Man I wish it were that easy. When you want to write a distributed application to do it, the task is even more daunting. If I controlled the data and the servers and the code, I could do crazy things that would make great conference talks. But, it kind of falls apart when I want to give this code to a 60 year old retired guy that is starting a hobby site for watching humming birds on a crappy GoDaddy account. Yeah, he is not installing Sphinx or HandlerSocket or Gearman. Those are all things I would want to use to solve this problem in a scalable fashion. At that point you have two choices. Aim for the small time or the big time. If you aim for the small time, you may get lots of installs, but, you will be hamstrung. If you aim for the big time, you may be the only guy that ever uses the code. That is a tough decision.

What have I missed? I know I missed something. Are there other types of message boards? I can definitely see some sub-types. Perhaps a board where ideas instead of help messages are posted. Or maybe the conversations are more show off based as in a user posting pictures or videos for comment. Is there already something out there doing this and I have just missed it? Let me know what I have missed please.