Wed, Jan 12, 2011 01:38 PM
I have made a small change to vlualogger to flush the file handle after every write. I discovered that lua was not flushing the writes when I was tailing a the log file to do some debugging. This fixes that issue.
Also see my original post about vlualogger.
Sat, Mar 13, 2010 02:41 AM
By default, most distributions use logrotate to rotate Apache logs. Or worse, they don't rotate them at all. I find the idea of a cron job restarting my web server every night to be very disturbing. So, years ago, we started using cronolog. Cronolog separates logs using a date/time picture. So, you get nice logs per day.
But, what if you are running 5 or 6 virtual hosts on the server? Do you really want all those logs in one file? You might. But, I don't. So, we ended up running a cronolog command per virtual host. At one time, this was 10 cronolog processes. Now, they are tiny at about 500k of resident memory used when running. But still, it seemed like a waste. Enter vlogger. Vlogger could take a virtual host name in its file name picture. And it would create the directories if they did not exist. So, now, we could have logs separated by virtual host and date. Alll was good.
But, vlogger has not been updated for a while. It started spitting out errors, right into my access logs. And I could not find a solution. The incoming log data did not change. My only assumption is that some Perl library it used changed and broke it. So, here I am again with cronolog.
I decided I could just write one. So, I started thinking about the problem. It needs to be small. PHP would be a stupid choice. One PHP process would be more than 10 cronolog processes. I decided on Lua.
"Lua is a powerful, fast, lightweight, embeddable scripting language." It is also usable as a shell scripting language, which is what I needed. So, I got to hacking and came up with a script that does the job quite well. When running, it uses about 800k of resident memory. You can download the script here on my site.
vlualogger - 3.7k
Sun, May 31, 2009 10:23 PM
My coworker Rob ran into an issue building the PECL/memcache
extension on his Mac. He did
find the solution however. You can read and leave
comments on his blog.
Fri, May 22, 2009 11:34 AM
First there was LAMP.
But are you using GLAMMP? You have probably not heard of it
because we just coined the term while chatting at work. You
know LAMP (Linux, Apache, MySQL and PHP or Perl and sometimes
Python). So, what are the extra letters for?
The G is for Gearman - Gearman is a system to farm out work
to other machines, dispatching function calls to machines that are
better suited to do work, to do work in parallel, to load balance
lots of function calls, or to call functions between languages.
The extra M is for Memcached - memcached is a
high-performance, distributed memory object caching system, generic
in nature, but intended for use in speeding up dynamic web
applications by alleviating database load.
More and more these days, you can't run a web site on just
LAMP. You need these extra tools (or ones like them) to do
all the cool things you want to do. What other tools do we
need to work into the acronym? PostgreSQL replaces MySQL in lots
of stacks to form LAPP. I guess Drizzle may replace MySQL in some stacks
soon. For us, it will likely be added to the
stack. Will that make it GLAMMPD? We need more
vowels! If you are starting the next must use tool for
running web sites on open source software, please use a vowel for
the first letter.
Tue, Apr 7, 2009 08:03 PM
For our development servers, we have always used output buffering
to replace the URLs (dealnews.com) with the URL for that
development environment. Where we run into problems is with
CSS and JavaScript. If those files contains URLs for images
(CSS) or AJAX (JS) the URLS would not get replaced. Our
solution has been to parse those files as PHP (on the dev boxes
only) and have some output buffering replace the URLs in those
files. That has caused various problems over the years and
even some confusion for new developers. So, I got to looking
for a different solution. Enter mod_substitute
for Apache 2.2.
mod_substitute
provides a mechanism to perform both regular expression and
fixed string substitutions on response bodies. - Apache
Documentation
Cool! I put in the URL mappings and VIOLA! All was
right in the world.
Fast forward a day. Another developer is testing some new
code and finds that his XML is getting munged. At first we
blamed libxml because we had just been through an ordeal with a bad
combination of a libxml compile option and PHP a while back.
Maybe we missed that box when we fixed it. We recompiled
everything on the dev box but there was no change. So I
started to think what was recently different with the dev
boxes. So, I turn off mod_substitute. Dang, that fixed
it. I looked at my substitution strings and everything looked
fine. After cursing and being depressed that such a cool tool
was not working, I took a break to let it settle in my mind.
I came back to the computer and decided to try a virgin Apache 2.2
build. I downloaded the source from the web site instead of
building from Gentoo's Portage. Sure enough, a simple test
worked fine. No munging. So, I loaded up the dev box
Apache configuration into the newly compiled Apache. Sure
enough, munged XML. ARGH!!
Up until this point, I had configured the substitutions globally
and not in a particular virtual host. So, I moved it all into
one virtual host configuration. Still broken.
A little more background on our config. We use mod_proxy to
emulate some features that we get in production with our F5 BIG-IP
load balancers. So, all requests to a dev box hit a mod_proxy
virtual host and are then directed to the appropriate virtual host
via a proxied request.
So, I got the idea to hit the virtual host directly on its port and
skip mod_proxy. Dang, what do you know. It worked
fine. So, something about the output of the backend request
and mod_proxy was not playing nice. So, hmm. I got the
idea to move the mod_substitute directives into the mod_proxy
virtual hosts configuration. Tested and working fine.
So, basically, this ensures that the substitution filtering is done
only after the proxy and all other requests have been
processed. I am no Apache developer, so I have not dug any
deeper. I have a working solution and maybe this blog post
will reach someone that can explain it. As for
mod_substitute, here is the way my config looks.
In the VirtualHost that is our global proxy, I have this:
FilterDeclare DN_REPLACE_URLS
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$text/
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$/xml
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$/json
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type
$/javascript
FilterChain DN_REPLACE_URLS
Elsewhere, in a file that is local to each dev host, I keep the
actual mappings for that particular host:
Substitute
"s|http://dealnews.com|http://somedevbox.dealnews.com|in"
Substitute
"s|http://dealmac.com|http://somedevbox.dealmac.com|in"
# etc....
I am trying to think of other really cool uses for this. Any
ideas?
Wed, Oct 3, 2007 09:53 PM
This has been covered before, but I was just setting up a new force type on our servers and thought I would mention it for the fun of it. You see lots of stuff about using mod_rewrite to make friendly URLs or SEO friendly URLs. But, if you are using PHP (and I guess other Apache modules) you can do it without mod_rewrite. We have been doing this for a while at dealnews. Even before SEO was an issue.
Setting up Apache
From the docs, the ForceType directive "forces all matching files to be served as the content type given by media type." Here is an example configuration:
<Location /deals>
ForceType application/x-httpd-php
</Location>
Now any URL like http://dealnews.com/deals/Cubicle-Warfare/186443.html will attempt to run a file called deals that is in your document root.
Making the script
First save a file called deals witout the .php extension. Modern editors will look for the <?php tag at the first and will color it right. Normally you take input to your PHP scripts with the $_SERVER["QUERY_STRING"] or the $_GET variables. But, in this case, those are not filled by the URL above. They will still be filled if there is a query string, but the path part is not included. We need to use $_SERVER["PATH_INFO"]. In the case above, $_SERVER["PATH_INFO"] will be filled with /Cubicle-Warfare/186443.html. So, you will have to parse the data yourself. In my case, all I need is the numeric ID toward the end.
$id = (int)basename($_SERVER["PATH_INFO"]);
Now I have an id that I can use to query a database or whatever to get my content.
Avoid "duplicate content"
The bad part of my use case is that any URL that starts with /deals/ and ends in 186443.html will work. So, now we have duplicate content on our site. You may have a more exact URL pattern and not have this issue. But, to work around this in my case, we should verify that the $_SERVER["PATH_INFO"] is the proper data for the content requested. This code will vary depending on your URLs. In my code, I generate the URL for the content and see if it matches. Kind of a reverse lookup on the URI. If it does not match, I issue a 301 redirect to the proper location.
header("HTTP/1.1 301 Moved Permanently");
header("Location: $new_url");
exit();
Returning 404
Now, you have to be careful to always return meaningful data when using this technique. Search engines won't like you if you return status 200 for every possible random URL that falls under /deals. I know that Yahoo! will put random things on your URLs to see if you are doing the right thing. So, if you get your id and decide this is not a valid URL, you can return a 404. In my case, I have a 404 file in my document root. So, I just send the proper headers and include my regular 404 page.
header('HTTP/1.1 404 Not Found');
header('Status: 404 Not Found');
include $_SERVER["DOCUMENT_ROOT"]."/404.html";
exit();