PHP command line progress bar

Wed, Mar 10, 2010 11:33 AM
Was just looking through some code and came across this function I wrote some time ago. If you do a lot of your processing scripts in PHP like we do, you probably need to know what is going on sometimes. So, I made a progress bar for use on the cli. I thought I would share it.  Here is a video of it in action. And the code can be found here.

Developers should write code for production

Wed, Mar 3, 2010 11:20 AM
Having development and staging environments that reflect production is a key component of DevOps.  An example for us is dealing with our CDN.

I can imagine in some dysfunctional, fragmented company, a developer works on a web application and sticks all the images in the local directory with his scripts. Then some operations/deployment guy has to first move the images where they need to be and then change all the code that references those images.  If he is lucky, he has a script that does it for him. This is a needless exercise. If you have a development environment that looks and acts like production, this is all handled for you.

Here is an example of how it works for us. We use a CDN for all images, javascript, CSS and more. Those files come from a set of domains: s1.dlnws.com - s5.dlnws.com. So, our dev environments have similar domains. somedev.s5.dlnws.com points to a virtual server. We then use mod_substitute in Apache to rewrite those URLs on the dev machine. Each developer and staging instances will have an Apache configuration such as:
Substitute "s|http://s1.dlnws.com|http://somedev.s1.dev.dlnws.com|in"
Substitute "s|http://s2.dlnws.com|http://somedev.s2.dev.dlnws.com|in"
Substitute "s|http://s3.dlnws.com|http://somedev.s3.dev.dlnws.com|in"
Substitute "s|http://s4.dlnws.com|http://somedev.s4.dev.dlnws.com|in"
Substitute "s|http://s5.dlnws.com|http://somedev.s5.dev.dlnws.com|in"
So our developers put the production URLs for images into our code. When they test on the development environment, they get URLs that point to their instance, not production. No fussing with images after the fact.

In addition to this, we use mod_proxy to emulate our production load balancers. Special request routing happens in production. We need to see that when developing so we don't deploy code that does not work in that circumstance. If the load balancers send all requests coming in to /somedir to a different set of servers, we have mod_proxy do the same thing to a different VirutalHost in our Apache configuration. It is not always perfect, but it gets us as close to production as we can get without buying very expensive load balancers for our development environments.

Of course, we did not come to this overnight. It took us years to get to this point. Maybe it won't take you that long. Keep in mind when creating your development environments to make them work like production. It is neat to be able to write code on your laptop. I did it for years. But, at some point before you send out code for production, the developer should run it on a production like environment. Then deploying should be much easier.

DevOps is the only way it has ever made sense

Mon, Feb 22, 2010 08:24 PM
DevOps is the label being given to the way we have always done things. This is not the first time this has happened. As it says on my About Me page,

Brian Moon has been working with the LAMP platform since before it was called LAMP.

At some point, not sure when, someone came up with LAMP. I started working on what is now considered LAMP in 1996. I have seen lots of acronyms come and some go. We started using "Remote Scripting" after hearing Terry Chay talk about it at OSCON. The next OSCON, AJAX was all the rage. Technically, we never used AJAX. The X stands for XML. We didn't use XML. What made sense for us was to send back javascript arrays and objects that the javascript interpreter could deal with easily. We wrote a PHP function called to_javascript that converted a PHP array into a javascript array. Sound familiar? Yeah, two years later, JSON was all the rage.

We also have seen the same thing with how we run our development process.  We always considered our team to be an agile development team. That is agile with little a. Nowadays, "Agile" with the big A is usually all about how you develop software and not about actually delivering the software. So, I am always perplexed when people ask me if we use "Agile" development. Are they talking little a or big A?

Today I came across the term DevOps on twitter (there is no Wikipedia page yet). We have always had an integrated development and operations team. I could be writing code in the morning and configuring servers in the afternoon. Developers all have some level of responsibility over managing their development environment. They updated their Apache configurations from SVN and make changes as needed for their applications. The development environments are simulated as close as possible to production. Developers roll code to the production servers. It is their responsibility to make sure it works on production. They also roll it when it is ready rather than letting it sit around for days. This means if there is an unforeseen issue, the code is fresh on their minds and the problem can quickly be solved. We have done things this way since 1998. We are not the only ones. The great guys at Flickr gave a great talk last year at Velocity about their DevOps environment. People were amazed at how their teams worked together.

One of the huge benefits of being a DevOps team is that we can utilize the full stack in our application. If we can use the load balancers, Apache or our proxy servers to do something that offloads our application servers, we plan for that in the development cycle. It is a forethought instead of an afterthought. I see lots of PHP developers that do everything in code. Their web servers and hardware are just there to run their code. Don't waste those resources. They can do lots of things for you.

One cool thing about this is that I now have a label to use when people ask us about our team. I can now say we are an agile DevOps team. They can then go look that up and see what it means. Maybe it will lead to less explanation of how we work too. And if we are lucky, maybe we can find people to hire that have been in a similar environment.

So, I welcome all the new people into the "DevOps movement". Adopt the ideas and avoid any rules that books, blogs, "experts" may want to come up with. The first time I see someone list themselves as a DevOps management specialist, I will die a little on the inside. It is not a set of rules, it is a way of thinking, even a way of life. If the process prevents you from doing something, you are using it wrong, IMO.

State of the Browsers and ad blocking

Mon, Feb 1, 2010 11:25 PM
In my last post about CSS layout and ads, a commenter brought up that the dealnews.com web site did not handle extensions like Ad Block very gracefully. To which I responded that I don't care. To which he responded with download counts. Well, the reason I don't care is that ad impressions when compared to page views on dealnews.com are within 2% of each other. So, at best, less than 2% of users are blocking ads. In reality, that is going to include some DNS failures, network issues, or something else. I would bet our logo graphic has about the same difference. The reality is that normal people don't block ads. In my opinion, if you make your money by working on the web, you shouldn't either. I should add that this site's (my geeky blog) ad views was about 16% lower than the recorded page views. So, geeks block ads more I guess. But, geeks have dominated the web for a long time.

This got me thinking that I had not look at the browser stats very much lately. dealnews has a very odd graph on browser statistics. We do not follow the industry averages. Our audience is dominantly tech savy (that does not mean geeks). Our users don't just use the stuff that is installed on the computer when they get it. This kind of proves my point about ad blocking even more. We have non-moron users and they still don't block ads.



Browser   % of Visits
Internet Explorer 42.34%
Firefox 36.94%
Safari 9.55%
Chrome 8.34%
Mozilla 1.46%
Opera 0.68%
Netscape 0.41%
Avant 0.08%
Camino 0.06%
IE Mobile 0.02%

As you can see, Firefox is very prevalent on our site. We generally test in IE7/8, Firefox 3, Safari and Chrome. I will occasionally test a major change in Opera. Typically, well formed HTML and CSS works fine in Opera so everything is all good.

As for operating systems, Windows still dominates, but we have more Macs than the average site I would guess.



OS   % of Visits
Windows 82.95%
Macintosh 11.27%
iPhone 3.80%
Linux 1.19%
Android 0.17%

Interesting that iPhone beats out Linux. That is just another sign to me that Linux is still not a real choice for real people. Be that a product issue from OEMs or user choice. That is debatable. It is notable that most of our company uses Macs. I don't think we make up a speck of that traffic though. If we did, our home state of Alabama would be our most dominant. It isn't. We are very typical in that regard, California is number one. We only have one employee there.

Tables for layout

Fri, Jan 29, 2010 10:01 AM
I just read A case for table-based design and was thrilled to know I am not the only one that drowns in div soup from time to time.  I do not for the most part use tables for layout, but there are some cases where I just can't make a set of divs do my bidding. The classic example is having a two column layout where the left column LOADS FIRST and is elastic and the right column is a fixed size.  The "loads first" is important in a world where the rendering time of pages has become important.  Ideally with any page, the most important content would render first for the user. In my case, this fixed column is an ad. As a web developer, I don't care when the ad loads. The ads are a necessary evil in my page layout. I must ensure that they load in an acceptable time frame, but certainly not the first thing on the page. The specific layout I am talking about is that of the top of dealnews.com.  It has a fixed size 300x250 ad on the right of the page and the left side is elastic. I fiddled with divs for hours to get that to act the way I wanted it to act. We use the grid CSS from OOCSS.org. Wonderful piece of CSS that is. But, even with that in hand, I could not get the elements to behave, in all browsers, the way I could with a simple two column table where the left column's width is set to 100% and the right column contains a div of width 300 pixels. It was so easy to pull that off. Maybe CSS3 is going to solve this problem? I don't know. If you have the magic CSS that can do what this page does, let me know.

ob_start and HTTP headers

Fri, Jan 29, 2010 10:00 AM
I was helping someone in IRC deal with some "headers already sent" issues and told them to use ob_start. Very diligently, the person went looking for why that was the right answer. He did not find a good explination. I looked around and I did not either. So, here is why this happens and why ob_start can fix it.

How HTTP works

HTTP is the communication protocol that happens between your web server and the user's browser.  Without too much detail, this is broken into two pieces of data: headers and the body.  The body is the HTML you send. But, before the body is sent, the HTTP headers are sent. Here is an example of an HTTP request response including headers:
HTTP/1.1 200 OK
Date: Fri, 29 Jan 2010 15:30:34 GMT
Server: Apache
X-Powered-By: PHP/5.2.12-pl0-gentoo
Set-Cookie: WCSESSID=xxxxxxxxxxxxxxxxxxxxxxxxxxxx; expires=Sun, 28-Feb-2010 15:30:34 GMT; path=/
Content-Encoding: gzip
Vary: Accept-Encoding
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Ramblings of a web guy</title>
.
.
So, all those lines before the HTML starts have to come first. HTTP headers are where things like cookies and redirection occur. When a PHP script starts to send HTML out to the browser, the headers are stopped and the body begins. When your code tries to set a cookie after this has started, you get the "headers already sent" error message.

How ob_start works

So, how does ob_start help? The ob in ob_start stands for output buffering. ob_start will buffer the output (HTML) until the page is completely done. Once the page is completely done, the headers are sent and then the output is sent. This means any calls to setcookie or the header function will not cause an error and will be sent to the browser properly. You do need to call ob_start before any output occurs. If you start output, it is too late.

The down side

The down side of doing this is that the output is buffered and sent all at once. That means that the time between the user request and the time the first byte gets back to the user is longer than it has to be. However, in modern PHP application design, this is often already the case. An MVC framework for example would do all the data gathering before any presentation is done. So, your application may not have any issue with this.

Another down side is that you (or someone) could get lazy and start throwing setcookie calls in any old place. This should be avoided. It is simply not good programming design. In a perfect world, we would not need output buffering to solve this problem for us.

Using ini files for PHP application settings

Tue, Jan 19, 2010 05:24 PM
At dealnews we have three tiers of servers. First is our development servers, then staging and finally production. The complexity of the environment increases at each level. On a development server, everything runs on the localhost: mysql, memcached, etc. At the staging level, there is a dedicated MySQL server. In production, it gets quite wild with redundant services and two data centers.

One of the challenges of this is where and how to store the connection information for all these services. We have done several things in the past. The most common thing is to store this information in a PHP file. It may be per server or there could be one big file like:

<?php

if(DEV){
    $server = "localhost";
} else {
    $server = "10.1.1.25";
}

?>


This gets messy quickly. Option two is to deploy a single file that has the settings in a PHP array. And that is a good option. But, we have taken that one step further using some PHP ini trickeration. We use ini files that are loaded at PHP's startup and therefore the information is kept in PHP's memory at all times.

When compiling PHP, you can specify the --with-config-file-scan-dir to tell PHP to look in that directory for additional ini files. Any it finds will be parsed when PHP starts up. Some distros (Gentoo I know) use this for enabling/disabling PHP extensions via configuration. For our uses we put our custom configuration files in this directory. FWIW, you could just put the above settings into php.ini, but that is quite messy, IMO.

To get to this information, you can't use ini_get() as you might think.  No, you have to use get_cfg_var() instead. get_cfg_var returns you the setting, in php.ini or any other .ini file when PHP was started. ini_get will only return values that are registered by an extension or the PHP core. Likewise, you can't use ini_set on these variables. Also, get_cfg_var will always reflect the initial value from the ini file and not anything changed with ini_set.

So, lets look at an example.

; db.ini
[myconfig]
myconfig.db.mydb.db     = mydb
myconfig.db.mydb.user   = user
myconfig.db.mydb.pass   = pass
myconfig.db.mydb.server = host


This is our ini file. the group in the braces is just for looks. It has no impact on our usage. Because this is parsed along with the rest of our php.ini, it needs a unique namespace within the ini scope. That is what myconfig is for. We could have used a DSN style here, but it would have required more parsing in our PHP code.

<?php

/**
 * Creates a MySQLi instance using the settings from ini files
 *
 * @author     Brian Moon <brianm@dealnews.com>
 * @copyright  1997-Present dealnews.com, Inc.
 *
 */

class MyDB {

    /**
     * Namespace for my settings in the ini file
     */
    const INI_NAMESPACE = "dealnews";

    /**
     * Creates a MySQLi instance using the settings from ini files
     *
     * @param   string  $group  The group of settings to load.
     * @return  object
     *
     */
    public static function init($group) {

        static $dbs = array();

        if(!is_string($group)) {
            throw new Exception("Invalid group requested");
        }

        if(empty($dbs["group"])){

            $prefix = MyDB::INI_NAMESPACE.".db.$group";

            $db   = get_cfg_var("$prefix.db");
            $host = get_cfg_var("$prefix.server");
            $user = get_cfg_var("$prefix.user");
            $pass = get_cfg_var("$prefix.pass");

            $port = get_cfg_var("$prefix.port");
            if(empty($port)){
                $port = null;
            }

            $sock = get_cfg_var("$prefix.socket");
            if(empty($sock)){
                $sock = null;
            }

            $dbs[$group] = new MySQLi($host, $user, $pass, $db, $port, $sock);

            if(!$dbs[$group] || $dbs[$group]->connect_errno){
                throw new Exception("Invalid MySQL parameters for $group");
            }
        }

        return $dbs[$group];

    }

}

?>


We can now call DB::init("myconfig") and get a mysqli object that is connected to the database we want. No file IO was needed to load these settings except when the PHP process started initially.  They are truly constant and will not change while this process is running.

Once this was working, we created separate ini files for our different datacenters. That is now simply configuration information just like routing or networking configuration. No more worrying in code about where we are.

We extended this to all our services like memcached, gearman or whatever. We keep all our configuration in one file rather than having lots of them. It just makes administration easier. For us it is not an issue as each location has a unique setting, but every server in that location will have the same configuration.

Here is a more real example of how we set up our files.

[myconfig.db]
myconfig.db.db1.db         = db1
myconfig.db.db1.server     = db1hostname
myconfig.db.db1.user       = db1username
myconfig.db.db1.pass       = db1password

myconfig.db.db2.db         = db2
myconfig.db.db2.server     = db2hostname
myconfig.db.db2.user       = db2username
myconfig.db.db2.pass       = db2password

[myconfig.memcache]
myconfig.memcache.app.servers    = 10.1.20.1,10.1.20.2,10.1.20.3
myconfig.memcache.proxy.servers  = 10.1.20.4,10.1.20.5,10.1.20.6

[myconfig.gearman]
myconfig.gearman.workload1.servers = 10.1.20.20
myconfig.gearman.workload2.servers = 10.1.20.21

Autoloading for fun and profit

Fri, Jan 8, 2010 09:44 AM
So, as I stated in a previous post, the code base here at dealnews is going through some changes.  I have been working heavily on those changes since then.  Testing and benchmarking to see what works best. One of those changes is the heavy use of autoloading

During the holidays, I always like to read the PHP Advent.  One of the posts this year was by Marcel Esser titled You Don’t Need All That. It was a great post and echos many things I have said about PHP and web development. In his post, Marcel benchmarks the difference between using an autoloader and using straight require statements. I was not surprised by the result. The autoloading overhead and class overhead is well known to me. It is one thing that kept me from using any heavy OOP (we banned classes on our user facing pages for a long time) in PHP 4 and PHP 5.0. It has gotten a lot better however. Class overhead is very, very small now. Especially when using it as a namespacing for static functions. However, there is one word of caution I would like to add to his statements. When you use require/include statements instead of autoloading, you end up with a file like this:
<?php

require "DB.php";
require "Article.php";


function myfunc1() {
    DB::somemethod();
}

function myfunc2(){
    Article::somemethod();
}

?>
That file needs to require two files, but each one is only needed by one function in the file.  This is the dilemma we have found ourselves in.  We have a file that is filled with functions for building, taking apart, repairing, fetching, or anything else you can think to do with a URL. So, at the top of that file are 13 require statements. This all happened organically over time. But, now we are in a situation where we load lots of files that may not even be needed to begin with. By moving to an autloading system, we will only include the code we need. This saves cycles and file IO.

Again, Marcel's post was dead on. Our front end is written with multiple entry points.  We use auto_prepend files to do any common work that all requests need (sessions, loading the application settings, etc).  The front page of dealnews.com is about 600 lines of PHP that does the job as quickly as possible. But, we are moving it to use autoloaded code because its require list has grown larger than I would like and some of those requires are not always "required" to generate the page.

One point Marcel did make was about how modern MVC systems are part of the problem. We don't have a highly structured traditional MVC layout of our code. We use a very wide, flat directory structure rather than deep nesting of classes and objects. We also don't do a lot of overloading. Maybe 1% of our classes extend another and never more than one deep. All that makes for much quicker loading objects vs. some of the packaged frameworks like Zend Framework.

So, as Marcel warned, be aware of both your use of autoloading and require statements. Both can be bad when used the wrong way.

Bing ads are such bull crap

Mon, Dec 21, 2009 12:03 PM
Have you see these Bing ads? The insinuate that web search gives you a lot of crap. But, that Bing can cure that problem. Really? One simple question Bing:

How big is the sun?

Google

Google tells me the mass right off the bat. Neat. The following links are all very relevant too.



Yahoo

Yahoo gives me an interesting alternate query. The links are all relevant too.



Bing

Bing starts well by suggesting some queries.



But, it soon falls apart IMO.



Big Red Sun? A garden design company? That is the most relevant? Really? The third result is way off too. Their related searches clearly seem to indicate that Bing understands what I am looking for, but their results fail badly. Get the log out of your eye Bing.

Code Organization Dilemma

Wed, Nov 18, 2009 08:00 AM
So, we have been building up our code library at dealnews for 9 years. It was started at the end of PHP3 and the beginning of PHP4. So, we did not have autoloading, static functions, and all that jazz. Classes had lots of overhead in early PHP4 so we started down a pure procedural road in 2000. And for a long time, it was very maintainable. We had 2 or 3 developers for most of this time. We now have 5 or 6 depending on whether we have contractors. There are starting to be too many files and too many functions. We find ourselves adding new files when some new function is created instead of adding it to an existing file because we don't want to have huge files with 100 functions in them. File names and function names are getting longer and more ambiguous. For example, we have a file called url_functions.php. It contains functions to generate URLs for different types of pages on the site, functions to fetch URLs from the web and functions to parse URLs from an article. Those probably don't all belong in one file. But, they got nickle and dimed in there over time. So, now, we are inclined to not add anything to that file and make new files for new semi-URL related functions. Ugh.

It is time to start thinkinb about a reorganization. There are 1,900+ functions in 400+ files in our code library. This is just our library. This does not include the code that actually builds a page and generates output. It does not include our cron jobs or system administration scripts. Yeah, that is a lot. So, where do we go from here? Some things are easy to do. For example, we have a file called string.php. Most all the functions in that file can easily be moved a String class with static functions that can be accessed via an autoloader.

Then we have the various ways we deal with the articles on the web site. I have written about our front end vs. back end system before. What this means for our code base is that we have two ways to deal with an article. One is in our highly relational backend system. The other is in our optimized front end database servers. So, one Article object won't really do. We already have an Article object that serves as an ORM interface for the backend. To access the front end data, we currently have a library of functions (fetch_article for a single, fetch_articles for a set, etc.) but it does not fit with an autoloading environment. It also is not related to the object (the article) and is associated with where the data is stored. New developers don't grok the server infrastructure, so the code organization may not make sense to them. We have about 10 different objects that need both a back end and front end interface.

On the other hand, I really don't want to end up with a class named FrontEndArticle and BackEndArticle. Much less do we want to have stuff like BackEnd_Article where the file is actually in BackEnd/Article.php somewhere. The verbosity becomes overwhelming and hard to read, IMO.

So, what are others doing with huge code bases? I see lots of projects with 100 or so functions/methods in 20-30 files.  Frameworks have it easy because they don't have a CEO that wants something on this one page to be different than it is on every other page where that data is used. We have to deal with those types of hacks in an elegant way that can be maintained.