Monitoring PHP Errors

Mon, Nov 8, 2010 08:00 AM
PHP errors are just part of the language. Some internal functions throw warnings or notices and seem unavoidable. A good case is parse_url. The point of parse_url is to take apart a URL and tell me the parts. Until recently, the only way to validate a URL was a regex. You can now use filter_var with the FILTER_VALIDATE_URL filter. But, in the past, I would use parse_url to validate the URL. It worked as the function returns false if the value is not a URL. But, if you give parse_url something that is not a URL, it throws a PHP Warning error message. The result is I would use the evil @ to suppress errors from parse_url. Long story short, you get errors on PHP systems. And you don't need to ignore them.

In the past we just logged them and then had the log emailed daily. On some unregular schedule we would look through them and fix stuff to be more solid. But, when you start having lots of traffic, one notice error on one line could cause 1,000 error messages in a short time. So, we had to find a better way to deal with them.

Step 1: Managing the aggregation

The first thing we did was modify our error handler to log both human readable logs to the defined php error log and to write a serialzed (json_encode actually) version of the error to a second file. This second file makes parsing of the error data super easy. Now we could aggregate the logs on a schedule, parse them and send reports as needed. We have fatal errors monitored more aggressively than non-fatal errors. We also get daily summaries. The one gotcha is that PHP fatal errors do not get sent to the error handler. So, we still have to parse the human readable file for fatal errors. A next step may be to have the logs aggregated to a central server via syslog or something. Not sure where I want to go with that yet.

Step 2: Monitoring

Even with the logging, I had no visibility into how often we were logging errors. I mean, I could grep logs, do some scripting to calculate dates, etc. Bleh, lots of work. We recently started using Circonus for all sorts of metric tracking. So, why not PHP errors? Circonus has a data format that allows us to send any random metrics we want to track to them. They will then graph them for us. So, in my error handler, I just increment a counter in memached every time I log an error. When Circonus polls me for data, I just grab that value out of memcached and give it to them. I can then have really cool graphs that show me my error rate. In addition I can set alerts up. If we start seeing more than 1 error per second, we get an email. If we start seeing more than 5 per second, we get paged. You get the idea.

The Twitterverse:

I polled my friends on Twitter and two different services were mentioned. I don't know that I would actually send my errors to a service, but it sounds like an interesting idea. One was HopToad. It appears to be specifically for application error handling. The other was loggly which appears to be a more generic log service. Both very interesing concepts. The control freak in me would have issues with it. But, that is a personal problem.

Most others either manually reviewed things or had some similar notification system in place. Anyone else have a brilliant solution to monitoring PHP errors? Or any application level errors for that matter?

P.S. For what it is worth, our PHP error rate is about .004 per second. This includes some notices. Way too high for me. I want to approach absolute zero. Lets see if all these new tools can help us get there.
15 comments
Gravatar for troelskn

troelskn Says:

Exceptional is another option - www.getexceptional.com/

Gravatar for till

till Says:

Just tell the inner control freak to let go. Loggly is pretty awesome -- e.g. just do "graph PHP" in their webshell and you see how many notices, warnings, etc. you hit over the last 24 hours.

A lot of people I know push all errors into Hoptoad or getexceptional, but personally, I don't see any real value from what they offer. We've tried it too, but decided not to continue with it.

E.g. getexceptional seemed "dead" up until a week ago when they sent a newsletter and as for Hoptoad -- about a year ago or so their interface was rather clunky and it also required you to attend to each error by clicking "solved" or similar.

The problem with that is that IMHO a notice is not always important enough to be fixed right away. Sometimes you can't do much about it anyway and just need the data for debugging later.

At the rate we threw them in there at times, we got like 1000 or so and their grouping didn't work always (which lead to too many events to attend to) and we also hit their server's limits a bunch of times.

All of a sudden your error page is slow or you need to debug the error handling and add crazy asynchronous to scale that?

Anyway -- don't mean to put anyone down, but so far I'm a big fan of loggly because it helped me out already.

Generally, I found that (ab)using syslog seems to be the best.

After all all you have to do is error_log = syslog in your php.ini and off you go.

Depending on how many servers you do have, it's probably wise to centralize them too. And aside from PHP, you can collect all kinds of other logfiles with it without writing crazy wrappers to parse data.

The API is simple and already available.

So whenever we notice a hickup in our performance monitoring, we just go into loggly and graph data for the same interval. So far I collect haproxy, nginx, php(-fpm)-related and php errors.

Gravatar for Ilia Alshanetsky

Ilia Alshanetsky Says:

There is a simple solution I find worked for me. Added an error handler that converted ALL (yes, even notices and E_STRICT) errors to fatal errors and generated and e-mail once an error was created to speed-up resolution. Within days the code had no more errors, and current rate of errors is something like 2 per month.

Gravatar for Edmar

Edmar Says:

Our load processing is pretty light compared to "web scale", so the solution is rather simple too:

All error monitoring is done reading syslog from a pipe. A shell script does that continuously, and if/when it detects somehting, it fires a PHP script to read more stuff for up to 60 seconds, summarize an subject line, format and send the email.

This way we would get at most 1440 emails/server/day, which for us is acceptable, perhaps because we haven't fallen into this situation yet. We have this in place since 2008.

Gravatar for Dado

Dado Says:

Fatal errors don't get to the error handler, but they do kick in the shutdown handler! So, you can do something like

function shutdownHandler()
{
    $error = error_get_last();
    if (null !== $error) {
     errorHandler($error['type'], $error['message'], $error['file'], $error['line'], debug_backtrace(false));
    }
}

Gravatar for Markus Wolff

Markus Wolff Says:

You could also have a look at http://www.graylog2.org/, which aggregates log messages in a MongoDB database and has a nice web interface to access error statistics.

Gravatar for Jason Dixon

Jason Dixon Says:

@till - Loggly is awesome for searching through your log data. But you still need something to keep an eye on the data and let you know when something is amiss. For what it's worth, we're adding support in Circonus for pulling Search queries out of Loggly. This will allow you to easily trend and alert on any of your log data.

Gravatar for Caio

Caio Says:

I agree with Ilia Alshanetsky, I really don't see what's the fuss about monitoring errors. If you have any, at all, including as mentioned notices, deprecated, warnings or whatever (thus, E_ALL), then you have something to work on, and the BEST time to fix it is the FIRST time it happens. ALAS I just set error mode to E_ALL, log them all to me (I don't abort or anything, but I do throw into my mail), and fix it right away. Fatal or Parse errors that won't get treated by the error handler are dumped into my err_log which I check daily.

As with Ilia Alshanetsky, my error rate is near zero.

IMHO If you need to "organize" your log errors, then you need to start fixing them instead of loosing time "organizing" and putting them into fancy graphics ...

Gravatar for Brian Moon

Brian Moon Says:

@Caio What type of scale are you dealing with? How do you deal with transient errors? i.e. errors from connect function in PHP that are not due to application error, but due to infastructure issues? Do you get spammed when a daemon is down?

Gravatar for Artur Ejsmont

Artur Ejsmont Says:

I on the other hand have similar approach to Brian. Logs being saved with all additional data like stack traces etc and then indexed in the db with custom categorization rules.

If you have two dozen developers working non stop on tough deadlines with code accumulating for 5 years there is no way you can just fix all the warnings and notices : ) Even that we try to gain some ground every now and then, its still a lost battle with everything else being a priority :)

The idea of pulling application and apache error log size into cacti is also a good one in my opinion.

cheers

Art

Gravatar for Sergey

Sergey Says:

This is not very good idea, because:

1. set_error_handler(...) does not work with fatal errors
2. When fatal error happens all Memcache & DB connections are closed so you can't log fatal errors.

I just recommend to use http://code.google.com/p/lagger/ - IMHO this is best solution for PHP to log errors/exceptions/debug.

Comments are disabled for this post.