Using GearmanManager to Run PHP Gearman Workers

Gearman is a job management system that has burst on to the scene in the last couple of years. It allows a developer to spin off work to be done on other servers that may be more powerful or more capable in other ways. In addition, you can have multiple jobs done asyncronously. It is a wonderful system. But, it is non-trivial to get it all set up and working.

There are 3 parts to Gearman. There is the manager (Gearman is an anagram for manager, btw). It is a daemon that does little more than broker work between clients and servers. Its simplicy is beautiful. The next part is the client. This is also quite simple to deal with. You basically just create a connection to a Gearman daemon and ask for work to be done. Not much harder than calling a function or using a class. The last part is the worker. This is the code that actually does all the work. While writing the meat of your worker is no harder than it would be if you did it inline, managing and running workers is a bit of work. Especially for developers that work mostly in a web based environment.

There are lots of simple examples of how to create a Gearman client and worker. The PHP documentation includes one. Rasmus has written one. But, once everyone gets past the "Hello World!" examples of running Gearman they start trying to figure out how to daemonize the workers. If you go by the examples, a worker would run as an independent process. Each process would do just one job at at time. So if you want to have lots of these, you have to have lots of independent processes running. Manging this is a bit of a chore. You could do all the work yourself and start them all up by hand. You could use something like Supervisor to manage all your processes. But, you would still be left with doing a lot of the heavy lifting of talking to the Gearman daemon in your PHP code. That code would be duplicated over and over in each worker you write. Not fun. So, to combat this, I wrote GearmanManager.

GearmanManager handles all the process management, Gearman daemon communication and logging for you and just lets you write the code that you need to write to get your work done. It does come with some trade offs, but they are very small.

To use GearmanManager, you have to conform your code to how GearmanManager expects it to be architected. For each job type, there is a single function or class that lives in a single file in a specified directory. This was adpoted from the PEAR::Gearman library originally written by Joe Stump for Digg. I found it helps keep my Gearman jobs organized. A worker file may look something like this.

$ cat fetch_url.php
<?php

function fetch_url($job, &$log) {

 $workload = $job--->workload();

 $url = filter_var($workload, FILTER_VALIDATE_URL);

 $result = false;

 if($url){

 $result = file_get_contents($url);

 if($result){
 $log[] = "Success";
 } else {
 $log[] = "Failed to fetch URL";
 $result = false;
 }

 } else {

 $log[] = "Invalid URL passed in";
 }

 return $result;

}

?>

So, once GearmanManager is all set up, that is all the code you have to write. Of course, you have to get it running. A simple command line for GearmanManager would look like this.

$./pecl-manager.php -c /path/to/config/file -w /path/to/worker/directory -l /path/to/log/file -h hostname:port

You will notice the script name here is pecl-manager.php. There are two Gearman libraries for PHP. One is PECL::Gearman and the other is PEAR::Gearman which I mentioned before. GearmanManager supports both libraries. There is a script for each that takes care of the unique needs of each library. There are a few differences in how each works but they are minor. Mostly in how the workers are written for PEAR::Gearman. If you already use PEAR::Gearman, you are familiar with how to write worker code for it. Just keep writing them the way you already do.

Nearly every configuration option on the command line is also configurable via the config file. Additionally, you can control how many of each type of worker is available for work. Here is an example config.

; Example advanced ini config
;
; The result of this config file will be 21 total workers
; 10 will do all jobs except Sum
; 5 will do Sum only because of the dedicated_count in the [Sum] section
; 5 will do fetch_url only because of count in the [fetch_url] section
; 1 will do only reverse_string because of dedicated_count in the main section

[GearmanManager]

; workers can be found in this dir
; separate multiple dirs with a comma
; multiple dirs only supported by pecl workers
worker_dir=./pecl-workers,./pecl-worker-classes

; All workers in worker_dir will be loaded
include=*

; 10 workers will do all jobs
count=10

; Each job will have minimum 1 worker
; that does only that job
dedicated_count=1

; Workers will only live for 1 hour
max_worker_lifetime=3600

; Reload workers as new code is available
auto_update=1

[reverse_string]
; We are guaranteed 3 workers that can do job reverse_string
count = 3

[Sum]
; There will be a minimum 5 workers that do only the Sum job
; and all those workers will be dedicated to the Sum job
dedicated_count=5
dedicated_only=1

[fetch_url]
; There will be a minimum 15 workers that do only the fetch_url job
count=15

As you can see in that config, you can have some processes that do multiple jobs and others that are dedicated to doing only certain jobs. Depending on your workload and your application you may want to mix and match these options. For example, you may have some jobs that are done very seldomly. You could group all of those jobs into a few processes so that you have more resources for the mission critical jobs.

Another feature of GearmanManager is that it can monitor your worker code directory and restart the workers if and when the code changes. However, it will not monitor other code that your workers may include. Nor does it currently monitor the directory for new worker files.

If you have jobs that consume a lot of memory occasionally and are concerned with them causing memory issues, you can configure the workers to shut down after a given amount of time or after they have performed a certain number of jobs. This is a global option that will apply to all workers and not only certain job types.

There are other features in GearmanManager. You can check it out on Github at https://github.com/brianlmoon/GearmanManager. It is licensed under a BSD license. As always with OOS, pull requests are welcome. If you have any questions, feel free to ask them on the Gearman mailing list.