PHP generated code tricks

Fri, Jun 18, 2010 01:56 PM
Something that is great about PHP is that you can write code that generates more PHP code to be used later. Now, I am not saying this a best practice. I am sure it violates some rule in some book somewhere. But, sometimes you need to be a rule breaker.

A simple example is taking a database of configuration information and dumping it to an array. We do this for each publication we operate. We have a publication table. It contains the name, base URL and other stuff that is specific to that publication. But, why query the database for something that only changes once in a blue moon? We could cache it, but that would still require an on demand database hit. The easy solution is to just dump the data to a PHP array and put it on disk.
<?php

$sql = "select * from publications";

$res = $mysqli->query($sql);

while($row = $res->fetch_assoc()){

    $pubs[$row["publication_id"]] = $row;

}

$pubs_ser = str_replace("'", "\\'", serialize($pubs));

$php_code = "<?php global \$PUBLICATIONS; \$PUBLICATIONS = unserialize('$pubs_ser'); ?>";

file_put_contents("/some/path/publications.php", $php_code);

?>
Now you can include the publications.php file and have a global variable named $PUBLICATIONS that holds the publication settings. But, how do we load a single publication without knowing numeric ids? Well, you could make some constants.
<?php

$sql = "select * from publications";

$res = $mysqli->query($sql);

while($row = $res->fetch_assoc()){

    $pubs[$row["publication_id"]] = $row;

    $constants[$row["publication_id"]] = strtoupper($row["name"]);

}

$pubs_ser = str_replace("'", "\\'", serialize($pubs));

$php_code = "<?php\n";

$php_code.= "global \$PUBLICATIONS;\n";

$php_code.= "\$PUBLICATIONS = unserialize('$pubs_ser');\n";

foreach($constants as $id=>$const){

    $php_code.= "define('$const', $id);\n";

}

$php_code.= "?>";

file_put_contents("/some/path/publications.php", $php_code);

?>

So, now, we have constants. We can do stuff like:
<?php

//load a publication

require_once "publications.php";

echo $PUBLICATIONS[DEALNEWS]["name"];

?>
But, how about autoloading? It would be nice if I could just autoload the constants.
<?php

$sql = "select * from publications";

$res = $mysqli->query($sql);

while($row = $res->fetch_assoc()){

    $pubs[$row["publication_id"]] = $row;

    $constants[$row["publication_id"]] = strtoupper($row["name"]);

}

$pubs_ser = str_replace("'", "\\'", serialize($pubs));

$php_code = "<?php\n";

$php_code.= "class PUB_DATA {\n";

foreach($constants as $id=>$const){

    $php_code.= " const $const = $id;\n";

}

$php_code.= "    protected \$pubs_ser = '$pubs_ser';\n";

$php_code.= "}";

$php_code.= "?>";

file_put_contents("/some/path/pub_data.php", $php_code);

?>
Then we create a class in our autoloading directory that extends that object.
<?php

require_once "pub_data.php";

class Publication extends PUB_DATA {

    private $pub;

    public function __construct($pub_id) {

        $pubs = unserialize($this->pubs_ser);

        $this->pub = $pubs[$pub_id];

    }

    public function __get($var) {

        if(isset($this->pub[$var])){

            return $this->pub[$var];

        } else {

            // Exception

        }

    }

}

?>
Great, now we can do things like:
$pub = new Publication(Publication::DEALNEWS);

echo $pub->name;
The only problem that remains is dealing with getting the generated code to all your servers. We use rsync. It works quite well. You may have a different solution for your team. Back when we ran our own in house ad server we did all the ad work this way. None of the ad calls ever hit the database to get ads. We stored stats on disk in logs and processed them on a schedule. It was a very solid solution.

One more benefit of using generated files on disk is that they can be cached by APC or XCache. This means you don't have to actually hit disk for them all the time.
17 comments
Gravatar for Mohsen Heshmati

Mohsen Heshmati Says:

Hi,

Good topic to put your hands on. I do the same thing except I do it with memcache. When you have a distributed memcache installation they will synchronize over the whole network.

I just need to know if your version of writing the information to disk is faster than that of using memcache or hitting the database. Some might have a good and stable database installation and might not need to make sacrifice on the hit but still need to know if it makes a difference reading from disk than using the database.

Can you please let us know of your experience in this regards.

Gravatar for Brian Moon

Brian Moon Says:

Sure, memcached is going to be faster. But, memcached is volatile. We use memcached as well. Just not for this. The hit here is no more than any other PHP include.

Gravatar for Mats Lindh

Mats Lindh Says:

We actually use the same trick in one of our codebases (in fact with publications too.. weird). The difference is that we simply use var_export() after building the array and dumping that to the configuration file. We also use return ..() so that we can introduce the value properly in the current scope instead of relying on global magic stuff :-)

With APC this is a no-brainer, where you're able to have your configuration data parsed and ready to go in memory, without hitting the network or the disk.

Gravatar for David Zuelke

David Zuelke Says:

Yes. Use var_export() and a return statement. Easier to use from the calling code (just a $cfg = include(...);), and unlike your solution, APC can cache it.
I would also try to avoid this constant hackery. Not a new trick, by the way; a few frameworks have done this for ages to compile config files.

Gravatar for Brian Moon

Brian Moon Says:

@Mats and @David, there is a problem using var_export. Have you seen what your memory usage is like for those files. Its bad. See http://brian.moonspot.net/2007/02/28/big-arrays-in-php/.

As for new, I never claimed it was new. I started doing this in 1999. When did you start?

Gravatar for EllisGL

EllisGL Says:

If you had a large data set I would say use JSON instead of serialize. A quick look, there's talk about igbinary and BSON (MongoDB uses BSON). I haven't seen a BSON bench and I haven't seen mem usage benchmarks, just time based one.

Gravatar for Sven

Sven Says:

Your last example won't work because PUB_DATA::pubs_ser ist private and will not be inherited.

Gravatar for Brian Moon

Brian Moon Says:

@Sven thanks, fixed. That is what happens when you create code for a blog post without running all of it.

Gravatar for Brian Moon

Brian Moon Says:

Yes, JSON is another option. It is faster on large variables. However, like var_dump, you will use references if you use them. Serialize keeps those intact.

Gravatar for Arnold Daniels

Arnold Daniels Says:

Kind of nice, however why would you save the data in the database in the first place. You can also just serialize it to JSON, Yaml, XML or whatever and load it from there.

I do have to say kudos for generating the base class. That is nice :).

Gravatar for Brian Moon

Brian Moon Says:

@Arnold We use the data in other places with SQL joins and such internally. That makes our main database the official source of that information.

Gravatar for Chris Henry

Chris Henry Says:

Code generation is totally underrated tool. Especially in situations where configuration for other software needs to be built out in a different language. For example, I use a bash script to build out the database values in a sphinx.conf file based on values in my php app. It's taken a lot of the pain out of versioning and maintaing my sandboxes. I also recommend it if you need to do lots of heavy lifting in apache.conf files. http://stackoverflow.com/questions/3181252/3181652

Add A Comment

Your Name:


Your Email:


Your URL:


Your Comment: