Code Organization Dilemma

Wed, Nov 18, 2009 08:00 AM
So, we have been building up our code library at dealnews for 9 years. It was started at the end of PHP3 and the beginning of PHP4. So, we did not have autoloading, static functions, and all that jazz. Classes had lots of overhead in early PHP4 so we started down a pure procedural road in 2000. And for a long time, it was very maintainable. We had 2 or 3 developers for most of this time. We now have 5 or 6 depending on whether we have contractors. There are starting to be too many files and too many functions. We find ourselves adding new files when some new function is created instead of adding it to an existing file because we don't want to have huge files with 100 functions in them. File names and function names are getting longer and more ambiguous. For example, we have a file called url_functions.php. It contains functions to generate URLs for different types of pages on the site, functions to fetch URLs from the web and functions to parse URLs from an article. Those probably don't all belong in one file. But, they got nickle and dimed in there over time. So, now, we are inclined to not add anything to that file and make new files for new semi-URL related functions. Ugh.

It is time to start thinkinb about a reorganization. There are 1,900+ functions in 400+ files in our code library. This is just our library. This does not include the code that actually builds a page and generates output. It does not include our cron jobs or system administration scripts. Yeah, that is a lot. So, where do we go from here? Some things are easy to do. For example, we have a file called string.php. Most all the functions in that file can easily be moved a String class with static functions that can be accessed via an autoloader.

Then we have the various ways we deal with the articles on the web site. I have written about our front end vs. back end system before. What this means for our code base is that we have two ways to deal with an article. One is in our highly relational backend system. The other is in our optimized front end database servers. So, one Article object won't really do. We already have an Article object that serves as an ORM interface for the backend. To access the front end data, we currently have a library of functions (fetch_article for a single, fetch_articles for a set, etc.) but it does not fit with an autoloading environment. It also is not related to the object (the article) and is associated with where the data is stored. New developers don't grok the server infrastructure, so the code organization may not make sense to them. We have about 10 different objects that need both a back end and front end interface.

On the other hand, I really don't want to end up with a class named FrontEndArticle and BackEndArticle. Much less do we want to have stuff like BackEnd_Article where the file is actually in BackEnd/Article.php somewhere. The verbosity becomes overwhelming and hard to read, IMO.

So, what are others doing with huge code bases? I see lots of projects with 100 or so functions/methods in 20-30 files.  Frameworks have it easy because they don't have a CEO that wants something on this one page to be different than it is on every other page where that data is used. We have to deal with those types of hacks in an elegant way that can be maintained.

Forums are crap. Can we get some help?

Mon, Oct 12, 2009 10:01 AM
Amy Hoy has written a blog post about why forums are crap. And she is right. Forum software does not always do a good job of helping people communicate. I have worked with Amy. She did a great analysis of dealnews.com that led to our new design. So, she is not to be ignored.

However, as a software developer (Phorum), I see a lot of problems and no answers.  And it is not all on the software.  Web site owners use forums to solve problems that they really, really suck at.  Ideally, every web site would be very unique for their audience.  They would use a custom solution that fits a number of patterns that best solves their problem.  However, most web site owners don't want to take the time to do such things.  They want a one stop, drop in solution. See the monolith that is vBulletin, scary.

And what if a forum is the best solution? Well, software developers, in general, are not good designers. They don't think like normal people. And they don't see their applications as a whole, but as pieces that do jobs. The forum software market has been run by software developers for over 10 years. Most of them all are still copies of what UBB was 13 years ago. And software (like Phorum) that has tried to be different is shunned by the online communities of the world because they don't work/look/feel like every other forum software on the planet.

So, as software developers, what are we to do? We want to make great software. We want to help our users help their users. But, what we have been doing for 10+ years has only been adequate. As the leader of an open source forum software project, I am open to any and all ideas.

Memcached: What is it and what does it do?

Wed, Sep 30, 2009 11:49 AM
I spoke at CodeWorks in Atlanta, GA this week.  I totally dropped the ball promoting it on my blog.  It was a neat venue.  Rather than a large conference they are doing a traveling show.  Seven cities in 14 days.  Many of the presenters are working in every city.  Crazy.  I was just in Atlanta.  It is close to home and easy for me to get to.

I spoke about memcached.  I tried to dig a bit deeper into how memcached works.  On the mailing list we get a lot of new people that make assumptions about memcached.  Most talks I have seen focus on why caching is good, how to use memcached, the performance gain.  I kind of assumed everyone knew that stuff already.  I guess you could say I gave a talk that was the real FAQs of the project.

Here are the slides.  Derick Rethans took video of the talk.  When he gets that online I will add it to this post.

Wordcraft 0.10 available

Mon, Aug 10, 2009 12:12 AM
The latest package of Wordcraft, the PHP/MySQL based blog software that runs this site, is available for download from Google Code.  Just some minor bug fixes and cosmetic stuff.  Its getting a little use in the wild.  That is always fun to see.

Forking PHP!

Thu, Jul 23, 2009 01:17 PM
We use PHP everywhere in our stack. For us, it makes sense because we have hired a great staff of PHP developers. So, we leverage that talent by using PHP everywhere we can.

One place where people seem to stumble with PHP is with long running PHP processes or parallel processing. The pcntl extension gives you the ability to fork PHP processes and run lots of children like many other unix daemons might. We use this for various things. Most notably, we use it run Gearman worker processes. While at the OReilly Open Sourc Convention in 2009, we were asked about how we pulled this off. So, we are releasing the two scripts that handle the forking and some instructions on how we use them.

This is not a detailed post about long running PHP scripts.  Maybe I can get to the dos and don'ts of that another time.  But, these are the scripts we use to manage long running processes.  They work great for us on Linux.  They will not run on Windows at all.  We also never had any trouble running them on Mac OS X.

The first script, prefork.php, is for forking a given function from a given file and running n children that will execute that function. There can be a startup function that is run before any forking begins and a shutdown function to run when all the children have died.

The second script, prefork_class.php, uses a class with defined methods instead of relying on the command line for function names. This script has the added benefit of having functions that can be run just before each fork and after each fork. This allows the parent process to farm work out to each child by changing the variables that will be present when the child starts up. This is the script we use for managing our Gearman workers. We have a class that controls how many workers are started and what functions they provide. I may release a generic class that does that soon. Right now it is tied to our code library structure pretty tightly.

We have also included two examples. They are simple, but do work to show you how the scripts work.

You can download the code from the dealnews.com developers' page.

UPDATE: I have released a Gearman Worker Manager on Github.

Building PECL/memcache on Mac OS X

Sun, May 31, 2009 10:23 PM
My coworker Rob ran into an issue building the PECL/memcache extension on his Mac.  He did find the solution however.  You can read and leave comments on his blog.

The rise of the GLAMMP stack

Fri, May 22, 2009 11:34 AM
First there was LAMP.  But are you using GLAMMP?  You have probably not heard of it because we just coined the term while chatting at work.  You know LAMP (Linux, Apache, MySQL and PHP or Perl and sometimes Python). So, what are the extra letters for?

The G is for Gearman - Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.

The extra M is for Memcached - memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

More and more these days, you can't run a web site on just LAMP.  You need these extra tools (or ones like them) to do all the cool things you want to do.  What other tools do we need to work into the acronym?  PostgreSQL replaces MySQL in lots of stacks to form LAPP.  I guess Drizzle may replace MySQL in some stacks soon.  For us, it will likely be added to the stack.  Will that make it GLAMMPD?  We need more vowels!  If you are starting the next must use tool for running web sites on open source software, please use a vowel for the first letter.

Is there a program for finding uses of register_globals?

Fri, May 15, 2009 12:38 PM
register_globals is going way in PHP6.  That is fine with me.  Super globals are cool and I have taken to using filter_input_array these days anyhow.  However, our code base is now 10+ years old at dealnews.  Most of the forward facing code was completely rewritten in the last couple of years due to architecture changes.  Many new projects had register_globals turned off via php_admin_flag in Apache.  So, that area is not that big of a problem.  However, our internal admin areas have not all be rewritten because, well frankly, they still work.  Yeah, stuff written for PHP4 in 2000 is still working.  KISS helps a lot with that.  But, this code, somewhere in there, may still be relying on register_globals.  Now, we could go line by line and try and fix it.  But, it seems like a program could be written to do this job.  I mean, I use jEdit and it can highlight unset vars using the PHPParserPlugin just fine.  I bet Zend IDE can do the same.  Has anyone written such a tool for the command line?  There will be false positives I know.  Things like passing a variable by reference to a function would look like a use before set.  But, I can deal with those if I don't have to go line by line through tons of old code.  What would the rules look like for such an animal?  This would be a great project to get off the ground before PHP6 hits.  Ideally you could provide a list of variables for it to ignore.  We have some globals we set up in prepends and includes.

Scaling for the Expected and Unexpected - Speaking at Velocity

Thu, May 7, 2009 11:57 AM
Last year I was surprised to be going to Velocity.  Read the post, it was an adventure.  But, I really like the conference.  It is the perfect conference for me.  While a good majority of my work is done coding PHP/MySQL apps, I tend to focus on architecture, frameworks, performance and that kind of stuff.  So, a web performance and operations conference is just perfect.

Last year, I was on a panel with some great guys.  I was able to share just a bit about my experience dealing with the instant success of a web site.  This year, my proposal was accepted to talk more about dealing with success of a web site.  The talk will be focused on my experience at dealnews.com and from working with power users for Phorum.  Here is the summary:

Lots of people talk about scaling and performance. But, are they preparing for all the things that could happen? There are multiple problems and there is not one solution to solve them all.

Everything is running fine and BAM! – your site is linked from the front page of Yahoo! What do you do? How can you handle that sudden rush of traffic. Requests per second are running 5x normal levels. Servers have CPU spikes. Daemons are hitting the maximums. You are running out of bandwidth. How could you have been prepared for this? What are the tools and techniques for this type of sudden rush?

Or, lets say you have just come out of a meeting where everyone discovered that your site is growing in traffic 70% – 80% year over year. That means that 1 million page views this month will be nearly 3 million this time in 2 years. How can you plan for that? You don’t want to redesign the whole architecture every 2 years. What methods could be used to deal with this constant long term growth?

While there is no magic bullet for either of these scenarios, there are techniques used by many sites out there to help you get through these situations. This session will cover some of these techniques and talk about their pros and cons.

I must admit, this if the first time since 2000 that I am a little intimidated to speak at a conference.  The people that present and attend Velocity are so awesome.  I just hope I don't disappoint.

Net::Gearman and PHP 5.2.9

Tue, Apr 21, 2009 12:48 PM
I just discovered an incompatibility between Net Gearman and PHP 5.2.9+.  json_decode was changed in 5.2.9 to return NULL on invalid JSON strings.  Previously, the bare string had been returned if it was not valid JSON.  This was nice in a way as you could pass a scalar string to json_decode and not worry about it.  But, in reality, it would make debugging a nightmare for JSON.

I have updated my github fork and requested a pull into the main branch.  Once that is done a new PEAR release can be done.