The rise of the GLAMMP stack

First there was LAMP.  But are you using GLAMMP?  You have probably not heard of it because we just coined the term while chatting at work.  You know LAMP (Linux, Apache, MySQL and PHP or Perl and sometimes Python). So, what are the extra letters for?

The G is for Gearman - Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.

The extra M is for Memcached - memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

More and more these days, you can't run a web site on just LAMP.  You need these extra tools (or ones like them) to do all the cool things you want to do.  What other tools do we need to work into the acronym?  PostgreSQL replaces MySQL in lots of stacks to form LAPP.  I guess Drizzle may replace MySQL in some stacks soon.  For us, it will likely be added to the stack.  Will that make it GLAMMPD?  We need more vowels!  If you are starting the next must use tool for running web sites on open source software, please use a vowel for the first letter.

Is there a program for finding uses of register_globals?

register_globals is going way in PHP6.  That is fine with me.  Super globals are cool and I have taken to using filter_input_array these days anyhow.  However, our code base is now 10+ years old at dealnews.  Most of the forward facing code was completely rewritten in the last couple of years due to architecture changes.  Many new projects had register_globals turned off via php_admin_flag in Apache.  So, that area is not that big of a problem.  However, our internal admin areas have not all be rewritten because, well frankly, they still work.  Yeah, stuff written for PHP4 in 2000 is still working.  KISS helps a lot with that.  But, this code, somewhere in there, may still be relying on register_globals.  Now, we could go line by line and try and fix it.  But, it seems like a program could be written to do this job.  I mean, I use jEdit and it can highlight unset vars using the PHPParserPlugin just fine.  I bet Zend IDE can do the same.  Has anyone written such a tool for the command line?  There will be false positives I know.  Things like passing a variable by reference to a function would look like a use before set.  But, I can deal with those if I don't have to go line by line through tons of old code.  What would the rules look like for such an animal?  This would be a great project to get off the ground before PHP6 hits.  Ideally you could provide a list of variables for it to ignore.  We have some globals we set up in prepends and includes.

Scaling for the Expected and Unexpected - Speaking at Velocity

Last year I was surprised to be going to Velocity.  Read the post, it was an adventure.  But, I really like the conference.  It is the perfect conference for me.  While a good majority of my work is done coding PHP/MySQL apps, I tend to focus on architecture, frameworks, performance and that kind of stuff.  So, a web performance and operations conference is just perfect.

Last year, I was on a panel with some great guys.  I was able to share just a bit about my experience dealing with the instant success of a web site.  This year, my proposal was accepted to talk more about dealing with success of a web site.  The talk will be focused on my experience at dealnews.com and from working with power users for Phorum.  Here is the summary:

Lots of people talk about scaling and performance. But, are they preparing for all the things that could happen? There are multiple problems and there is not one solution to solve them all.

Everything is running fine and BAM! – your site is linked from the front page of Yahoo! What do you do? How can you handle that sudden rush of traffic. Requests per second are running 5x normal levels. Servers have CPU spikes. Daemons are hitting the maximums. You are running out of bandwidth. How could you have been prepared for this? What are the tools and techniques for this type of sudden rush?

Or, lets say you have just come out of a meeting where everyone discovered that your site is growing in traffic 70% – 80% year over year. That means that 1 million page views this month will be nearly 3 million this time in 2 years. How can you plan for that? You don’t want to redesign the whole architecture every 2 years. What methods could be used to deal with this constant long term growth?

While there is no magic bullet for either of these scenarios, there are techniques used by many sites out there to help you get through these situations. This session will cover some of these techniques and talk about their pros and cons.

I must admit, this if the first time since 2000 that I am a little intimidated to speak at a conference.  The people that present and attend Velocity are so awesome.  I just hope I don't disappoint.

Net::Gearman and PHP 5.2.9

I just discovered an incompatibility between Net Gearman and PHP 5.2.9+.  json_decode was changed in 5.2.9 to return NULL on invalid JSON strings.  Previously, the bare string had been returned if it was not valid JSON.  This was nice in a way as you could pass a scalar string to json_decode and not worry about it.  But, in reality, it would make debugging a nightmare for JSON.

I have updated my github fork and requested a pull into the main branch.  Once that is done a new PEAR release can be done.

The death of die()

I am calling it.  The death of the PHP die function.  Now, I have no actual authority to do so.  My PHP CVS karma does not extend that far.  And I doubt it will actually get removed despite it being nothing more than an alias for exit now.

No, what I would like to call a death to is the usage of die such as:

$conn = mysql_connect($server, $user, $pass) or die("Could not connect to MySQL, but needed to tell the whole world");
I don't know who thought that particular usage was good, but they need to .... no, that is harsh.  I just really wish they had never done that.

So, what should you use?  Well, there are a couple of options depending on what context you are working in and whether or not the failure is actually catastrophic.

Exceptions

If you are using OOP in your PHP code, Exceptions are the logic choice for dealing with errors.  I have mixed feelings about them.  But, it has more to do with the catching of exceptions than the throwing of them.  If you are going to live in a world of exceptions, please catch them and provide useful error messages.  The PHP world is not too bad about that, but I have read too many Java error logs full of huge, verbose exception dumps in my life already.  Please don't follow that technique in PHP.

trigger_error

The function trigger_error is quite handy.  It allows you, a common PHP coder, to create errors just like the core system.  So, the error messages are familiar to anyone that is used to seeing PHP errors.  So, if your system is configured to log errors and not display them, errors from trigger_error will be treated the same as built in errors.

Also, errors thrown with trigger_error are caught by a custom error handler just like built in errors.  They can be logged, printed, whatever you want from that error handler, just like normal PHP errors.  There are even several levels of errors you can raise like notices, warnings, errors, and even deprecated.  Again, just like the built in PHP errors.

FATAL Errors

trigger_error is also the most suitable way, IMO, to end a script immediately.

$conn = mysql_connect($server, $user, $pass);
if(!$conn) {
    trigger_error("Could not connect to MySQL database.", E_USER_ERROR);
}

Now that will not be told to the whole world if you have display_errors set to Off as you should in any production environment.

Wordcraft 0.9.1 available

There are several key changes in Wordcraft 0.9.1. The two big things are:
  • Tokens on post forms in the admin to help ward off CSRF attacks.  
  • Database schema updates automated.
The first comes as a result of us doing the same work on Phorum recently.  I realized I needed the same protection in Wordcraft.  The second was done out of neccesity as I changed the datetime fields in the database schema into int fields.  Not sure why I ever made them datetime fields.  Unix timestamps are much easier to work with.  It saves many strtotime() calls and will make eventual time zone settings much easier to implement.

In addition to those two big ones, there were some notable small ones:
  • HTML 4.01 validation fixes
  • Ensuring UTF-8 on all encoding function calls
  • Protection against hitting the back button when writing a post (most annoying on Macs as the back button and the beginning of line keystroke is the same).
And there were other a few other bug fixes.

I will or course need many more testers and users before I can ever declare this software as stable.  If you need a simple blog, give it a try.

About Wordcraft
Wordcraft aims to be a simple, lightweight blogging application.  Wordcraft is written exclusively for PHP 5+ and MySQL 5.0+ using only the PHP mysqli extension, UTF-8, and HTML 4.01 to achieve that simpleness.

mod_substitute is cool. But, be careful with mod_proxy

For our development servers, we have always used output buffering to replace the URLs (dealnews.com) with the URL for that development environment.  Where we run into problems is with CSS and JavaScript.  If those files contains URLs for images (CSS) or AJAX (JS) the URLS would not get replaced.  Our solution has been to parse those files as PHP (on the dev boxes only) and have some output buffering replace the URLs in those files.  That has caused various problems over the years and even some confusion for new developers.  So, I got to looking for a different solution.  Enter mod_substitute for Apache 2.2.
mod_substitute provides a mechanism to perform both regular expression and fixed string substitutions on response bodies. - Apache Documentation
Cool!  I put in the URL mappings and VIOLA!  All was right in the world.

Fast forward a day.  Another developer is testing some new code and finds that his XML is getting munged.  At first we blamed libxml because we had just been through an ordeal with a bad combination of a libxml compile option and PHP a while back.  Maybe we missed that box when we fixed it.  We recompiled everything on the dev box but there was no change.  So I started to think what was recently different with the dev boxes.  So, I turn off mod_substitute.  Dang, that fixed it.  I looked at my substitution strings and everything looked fine.  After cursing and being depressed that such a cool tool was not working, I took a break to let it settle in my mind.

I came back to the computer and decided to try a virgin Apache 2.2 build.  I downloaded the source from the web site instead of building from Gentoo's Portage.  Sure enough, a simple test worked fine.  No munging.  So, I loaded up the dev box Apache configuration into the newly compiled Apache.  Sure enough, munged XML.  ARGH!!

Up until this point, I had configured the substitutions globally and not in a particular virtual host.  So, I moved it all into one virtual host configuration.  Still broken.

A little more background on our config.  We use mod_proxy to emulate some features that we get in production with our F5 BIG-IP load balancers.  So, all requests to a dev box hit a mod_proxy virtual host and are then directed to the appropriate virtual host via a proxied request. 

So, I got the idea to hit the virtual host directly on its port and skip mod_proxy.  Dang, what do you know.  It worked fine.  So, something about the output of the backend request and mod_proxy was not playing nice.  So, hmm.  I got the idea to move the mod_substitute directives into the mod_proxy virtual hosts configuration.  Tested and working fine.  So, basically, this ensures that the substitution filtering is done only after the proxy and all other requests have been processed.  I am no Apache developer, so I have not dug any deeper.  I have a working solution and maybe this blog post will reach someone that can explain it.  As for mod_substitute, here is the way my config looks.

In the VirtualHost that is our global proxy, I have this:

FilterDeclare DN_REPLACE_URLS
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $text/
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $/xml
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $/json
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $/javascript
FilterChain DN_REPLACE_URLS


Elsewhere, in a file that is local to each dev host, I keep the actual mappings for that particular host:

Substitute "s|http://dealnews.com|http://somedevbox.dealnews.com|in"
Substitute "s|http://dealmac.com|http://somedevbox.dealmac.com|in"
# etc....


I am trying to think of other really cool uses for this.  Any ideas?

Best practices for escaping HTML

I am working on Wordcraft, trying to get the last annoying HTML validation errors worked out.  Thinks like ampersands in URLs.  In doing so, I am asking myself where the escaping should take place. In the case of Wordcraft, there are several parts to it.
  1. The code that pulls data from the database.  Obviously not the right place.
  2. The code that formats data like dates and such.  It also organizes data from several data sources into one nice tidy array.  Hmm, maybe
  3. The parts of the code that set up the output data for the templates.
  4. The templates themselves.
Now, I am sure 1 is not the place.  And I really would not want 4 to be the place.  That would make for some ugly templating.  Plus, the templates, IMO, should assume the data is ready to be output.  So, that leaves the code that does the formatting and the code that does the data setup.

Of those two, I guess the place to do this job is in the data setup.  Wordcraft has a $WCDATA array that is available in the scope of the templates.  I suppose anything that goes into that array should be escaped as appropriate.

I largely wrote this blog post as a teddy bear exercise.  But, I am curious.  Where and when do you escape your data for use in HTML documents?

The history of PHP eating newlines after the closing tag

Have you ever noticed that PHP eats the newlines after a closing PHP tag?  Not sure what I mean?  There is lots on Google about it.  Here is an example.

Hello there!
<?php

// this is just a dump PHP block

?>
How are you?

becomes:

Hello there!
How are you?

I was talking about this with a coworker tonight.  He is trying to generate some XML and, like me and Chis Shiflett, is anal about his output.  You see, what happens in modern use of PHP as a template language is something like this:

<?php

$subelement = range(1, 10);

?>
<somexml>
    <element>
        <?php foreach($subelement as $e) { ?>
            <subelement><?php echo $e; ?></subelement>
        <?php } ?>
    </element>
</somexml>

That code will output this mess:

<somexml>
    <element>
                    <subelement>1</subelement>
                    <subelement>2</subelement>
                    <subelement>3</subelement>
                    <subelement>4</subelement>
                    <subelement>5</subelement>
                    <subelement>6</subelement>
                    <subelement>7</subelement>
                    <subelement>8</subelement>
                    <subelement>9</subelement>
                    <subelement>10</subelement>
            </element>
</somexml>

So, why does PHP do this?  Well, you have to go back 11 years.  PHP 3 was emerging.  I was just starting to use it for Phorum at the time.  There were two reasons.

The first was that you would want the newline after the first closing tag to be removed as it would remove the existence of the PHP block completely.  At the time, people were shunned for writing PHP as a tag looking language.  ColdFusion was new then too and the PHP community liked to point and laugh at it.

The second case (and this is probably a more legitimate one) was that many editors (some still do this for some insane reason) force every friggin file to end in a newline.  We did not have output buffering in those days.  It was the stone age man.  So, to get around the "Headers already sent" errors, Zeev decided to make the PHP ending tag be "?> with an optional newline".  It was a heated debate on the PHP Internals (then php-dev) list.  So much that I remembered it and dug it up on MARC.

Heck, now I want to add to it.  I would like it please if PHP could remove any leading, non-newline whitespace before an open tag.  That would solve this problem.  Yeah, more magic!  Nothing like it.

To me, the worst alternative to all this is the lack of a closing tag in a file.  My OCD just can't deal with that.  Please, baby seals cry when you don't use a closing tag.

Wordcraft 0.8 available

I am pleased to announce the release of Wordcraft 0.8.  I have managed to release about once a month since November.  I also have actually gotten some feedback and tickets posted.  Thanks to those that have tried it out.

I have decided to go back to YUI's Editor.  I tried TinyMCE in the last release.  But, using it full time I found it messed with my HTML too much for my liking.  When I would switch to raw HTML mode and add something like a <code> tag, it would be lost when saving the data back into the WYSIWYG editor.

I also converted the admin HTML to HTML 4.01 Transitional.  I never use XHTML anymore these days.  So, I was writing invalid XHTML inadvertantly.

I worked on the session handling some more in this release.  Users should stay logged in to the admin better now.

I put comment blocks in all the files and documented every function.  This should help anyone wanting to dig in and help out.

I fixed several bugs reported by users (or maybe just testers, not sure).  Thanks for that and keep the feedback coming.