The rise of the GLAMMP stack
The G is for Gearman - Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.
The extra M is for Memcached - memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
More and more these days, you can't run a web site on just LAMP. You need these extra tools (or ones like them) to do all the cool things you want to do. What other tools do we need to work into the acronym? PostgreSQL replaces MySQL in lots of stacks to form LAPP. I guess Drizzle may replace MySQL in some stacks soon. For us, it will likely be added to the stack. Will that make it GLAMMPD? We need more vowels! If you are starting the next must use tool for running web sites on open source software, please use a vowel for the first letter.
Is there a program for finding uses of register_globals?
Scaling for the Expected and Unexpected - Speaking at Velocity
Last year, I was on a panel with some great guys. I was able to share just a bit about my experience dealing with the instant success of a web site. This year, my proposal was accepted to talk more about dealing with success of a web site. The talk will be focused on my experience at dealnews.com and from working with power users for Phorum. Here is the summary:
Lots of people talk about scaling and performance. But, are they preparing for all the things that could happen? There are multiple problems and there is not one solution to solve them all.
Everything is running fine and BAM! – your site is linked from the front page of Yahoo! What do you do? How can you handle that sudden rush of traffic. Requests per second are running 5x normal levels. Servers have CPU spikes. Daemons are hitting the maximums. You are running out of bandwidth. How could you have been prepared for this? What are the tools and techniques for this type of sudden rush?
Or, lets say you have just come out of a meeting where everyone discovered that your site is growing in traffic 70% – 80% year over year. That means that 1 million page views this month will be nearly 3 million this time in 2 years. How can you plan for that? You don’t want to redesign the whole architecture every 2 years. What methods could be used to deal with this constant long term growth?
While there is no magic bullet for either of these scenarios, there are techniques used by many sites out there to help you get through these situations. This session will cover some of these techniques and talk about their pros and cons.
I must admit, this if the first time since 2000 that I am a little intimidated to speak at a conference. The people that present and attend Velocity are so awesome. I just hope I don't disappoint.
Net::Gearman and PHP 5.2.9
I have updated my github fork and requested a pull into the main branch. Once that is done a new PEAR release can be done.
The death of die()
No, what I would like to call a death to is the usage of die such as:
$conn = mysql_connect($server, $user, $pass) or die("Could
not connect to MySQL, but needed to tell the whole
world");I don't know who thought that particular usage was good, but they need to .... no, that is harsh. I just really wish they had never done that.
So, what should you use? Well, there are a couple of options depending on what context you are working in and whether or not the failure is actually catastrophic.
Exceptions
If you are using OOP in your PHP code, Exceptions are the logic choice for dealing with errors. I have mixed feelings about them. But, it has more to do with the catching of exceptions than the throwing of them. If you are going to live in a world of exceptions, please catch them and provide useful error messages. The PHP world is not too bad about that, but I have read too many Java error logs full of huge, verbose exception dumps in my life already. Please don't follow that technique in PHP.
trigger_error
The function trigger_error is quite handy. It allows you, a common PHP coder, to create errors just like the core system. So, the error messages are familiar to anyone that is used to seeing PHP errors. So, if your system is configured to log errors and not display them, errors from trigger_error will be treated the same as built in errors.
Also, errors thrown with trigger_error are caught by a custom error handler just like built in errors. They can be logged, printed, whatever you want from that error handler, just like normal PHP errors. There are even several levels of errors you can raise like notices, warnings, errors, and even deprecated. Again, just like the built in PHP errors.
FATAL Errors
trigger_error is also the most suitable way, IMO, to end a script immediately.
$conn = mysql_connect($server, $user, $pass);
if(!$conn) {
trigger_error("Could not connect to MySQL
database.", E_USER_ERROR);
}Now that will not be told to the whole world if you have display_errors set to Off as you should in any production environment.
Wordcraft 0.9.1 available
- Tokens on post forms in the admin to help ward off CSRF attacks.
- Database schema updates automated.
In addition to those two big ones, there were some notable small ones:
- HTML 4.01 validation fixes
- Ensuring UTF-8 on all encoding function calls
- Protection against hitting the back button when writing a post (most annoying on Macs as the back button and the beginning of line keystroke is the same).
I will or course need many more testers and users before I can ever declare this software as stable. If you need a simple blog, give it a try.
About Wordcraft
Wordcraft aims to be a simple, lightweight blogging application. Wordcraft is written exclusively for PHP 5+ and MySQL 5.0+ using only the PHP mysqli extension, UTF-8, and HTML 4.01 to achieve that simpleness.
mod_substitute is cool. But, be careful with mod_proxy
mod_substitute provides a mechanism to perform both regular expression and fixed string substitutions on response bodies. - Apache DocumentationCool! I put in the URL mappings and VIOLA! All was right in the world.
Fast forward a day. Another developer is testing some new code and finds that his XML is getting munged. At first we blamed libxml because we had just been through an ordeal with a bad combination of a libxml compile option and PHP a while back. Maybe we missed that box when we fixed it. We recompiled everything on the dev box but there was no change. So I started to think what was recently different with the dev boxes. So, I turn off mod_substitute. Dang, that fixed it. I looked at my substitution strings and everything looked fine. After cursing and being depressed that such a cool tool was not working, I took a break to let it settle in my mind.
I came back to the computer and decided to try a virgin Apache 2.2 build. I downloaded the source from the web site instead of building from Gentoo's Portage. Sure enough, a simple test worked fine. No munging. So, I loaded up the dev box Apache configuration into the newly compiled Apache. Sure enough, munged XML. ARGH!!
Up until this point, I had configured the substitutions globally and not in a particular virtual host. So, I moved it all into one virtual host configuration. Still broken.
A little more background on our config. We use mod_proxy to emulate some features that we get in production with our F5 BIG-IP load balancers. So, all requests to a dev box hit a mod_proxy virtual host and are then directed to the appropriate virtual host via a proxied request.
So, I got the idea to hit the virtual host directly on its port and skip mod_proxy. Dang, what do you know. It worked fine. So, something about the output of the backend request and mod_proxy was not playing nice. So, hmm. I got the idea to move the mod_substitute directives into the mod_proxy virtual hosts configuration. Tested and working fine. So, basically, this ensures that the substitution filtering is done only after the proxy and all other requests have been processed. I am no Apache developer, so I have not dug any deeper. I have a working solution and maybe this blog post will reach someone that can explain it. As for mod_substitute, here is the way my config looks.
In the VirtualHost that is our global proxy, I have this:
FilterDeclare DN_REPLACE_URLS
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$text/
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$/xml
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$/json
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$/javascript
FilterChain DN_REPLACE_URLSElsewhere, in a file that is local to each dev host, I keep the actual mappings for that particular host:
Substitute
"s|http://dealnews.com|http://somedevbox.dealnews.com|in"
Substitute
"s|http://dealmac.com|http://somedevbox.dealmac.com|in"
# etc....I am trying to think of other really cool uses for this. Any ideas?
Best practices for escaping HTML
- The code that pulls data from the database. Obviously not the right place.
- The code that formats data like dates and such. It also organizes data from several data sources into one nice tidy array. Hmm, maybe
- The parts of the code that set up the output data for the templates.
- The templates themselves.
Of those two, I guess the place to do this job is in the data setup. Wordcraft has a $WCDATA array that is available in the scope of the templates. I suppose anything that goes into that array should be escaped as appropriate.
I largely wrote this blog post as a teddy bear exercise. But, I am curious. Where and when do you escape your data for use in HTML documents?
The history of PHP eating newlines after the closing tag
Hello there!
<?php
// this is just a dump PHP block
?>
How are you?becomes:
Hello there!
How are you?I was talking about this with a coworker tonight. He is trying to generate some XML and, like me and Chis Shiflett, is anal about his output. You see, what happens in modern use of PHP as a template language is something like this:
<?php
$subelement = range(1, 10);
?>
<somexml>
<element>
<?php
foreach($subelement as $e) { ?>
<subelement><?php echo $e; ?></subelement>
<?php } ?>
</element>
</somexml>That code will output this mess:
<somexml>
<element>
<subelement>1</subelement>
<subelement>2</subelement>
<subelement>3</subelement>
<subelement>4</subelement>
<subelement>5</subelement>
<subelement>6</subelement>
<subelement>7</subelement>
<subelement>8</subelement>
<subelement>9</subelement>
<subelement>10</subelement>
</element>
</somexml>So, why does PHP do this? Well, you have to go back 11 years. PHP 3 was emerging. I was just starting to use it for Phorum at the time. There were two reasons.
The first was that you would want the newline after the first closing tag to be removed as it would remove the existence of the PHP block completely. At the time, people were shunned for writing PHP as a tag looking language. ColdFusion was new then too and the PHP community liked to point and laugh at it.
The second case (and this is probably a more legitimate one) was that many editors (some still do this for some insane reason) force every friggin file to end in a newline. We did not have output buffering in those days. It was the stone age man. So, to get around the "Headers already sent" errors, Zeev decided to make the PHP ending tag be "?> with an optional newline". It was a heated debate on the PHP Internals (then php-dev) list. So much that I remembered it and dug it up on MARC.
Heck, now I want to add to it. I would like it please if PHP could remove any leading, non-newline whitespace before an open tag. That would solve this problem. Yeah, more magic! Nothing like it.
To me, the worst alternative to all this is the lack of a closing tag in a file. My OCD just can't deal with that. Please, baby seals cry when you don't use a closing tag.
