mod_substitute is cool. But, be careful with mod_proxy

Tue, Apr 7, 2009 09:03 PM
For our development servers, we have always used output buffering to replace the URLs (dealnews.com) with the URL for that development environment.  Where we run into problems is with CSS and JavaScript.  If those files contains URLs for images (CSS) or AJAX (JS) the URLS would not get replaced.  Our solution has been to parse those files as PHP (on the dev boxes only) and have some output buffering replace the URLs in those files.  That has caused various problems over the years and even some confusion for new developers.  So, I got to looking for a different solution.  Enter mod_substitute for Apache 2.2.
mod_substitute provides a mechanism to perform both regular expression and fixed string substitutions on response bodies. - Apache Documentation
Cool!  I put in the URL mappings and VIOLA!  All was right in the world.

Fast forward a day.  Another developer is testing some new code and finds that his XML is getting munged.  At first we blamed libxml because we had just been through an ordeal with a bad combination of a libxml compile option and PHP a while back.  Maybe we missed that box when we fixed it.  We recompiled everything on the dev box but there was no change.  So I started to think what was recently different with the dev boxes.  So, I turn off mod_substitute.  Dang, that fixed it.  I looked at my substitution strings and everything looked fine.  After cursing and being depressed that such a cool tool was not working, I took a break to let it settle in my mind.

I came back to the computer and decided to try a virgin Apache 2.2 build.  I downloaded the source from the web site instead of building from Gentoo's Portage.  Sure enough, a simple test worked fine.  No munging.  So, I loaded up the dev box Apache configuration into the newly compiled Apache.  Sure enough, munged XML.  ARGH!!

Up until this point, I had configured the substitutions globally and not in a particular virtual host.  So, I moved it all into one virtual host configuration.  Still broken.

A little more background on our config.  We use mod_proxy to emulate some features that we get in production with our F5 BIG-IP load balancers.  So, all requests to a dev box hit a mod_proxy virtual host and are then directed to the appropriate virtual host via a proxied request. 

So, I got the idea to hit the virtual host directly on its port and skip mod_proxy.  Dang, what do you know.  It worked fine.  So, something about the output of the backend request and mod_proxy was not playing nice.  So, hmm.  I got the idea to move the mod_substitute directives into the mod_proxy virtual hosts configuration.  Tested and working fine.  So, basically, this ensures that the substitution filtering is done only after the proxy and all other requests have been processed.  I am no Apache developer, so I have not dug any deeper.  I have a working solution and maybe this blog post will reach someone that can explain it.  As for mod_substitute, here is the way my config looks.

In the VirtualHost that is our global proxy, I have this:

FilterDeclare DN_REPLACE_URLS
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $text/
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $/xml
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $/json
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $/javascript
FilterChain DN_REPLACE_URLS


Elsewhere, in a file that is local to each dev host, I keep the actual mappings for that particular host:

Substitute "s|http://dealnews.com|http://somedevbox.dealnews.com|in"
Substitute "s|http://dealmac.com|http://somedevbox.dealmac.com|in"
# etc....


I am trying to think of other really cool uses for this.  Any ideas?
8 comments
Gravatar for Olly

Olly Says:

The only clean approach to this problem is, imho, to set a constant/config variable (however your stuff works) based on the environment you're in (based on a Apache/server wide environment var, probably?) to the correct host.
This way, everything just works, without the use of any uncontrollable regex's.

But there might be a good reason why you've got the host/domain hardcoded?

Gravatar for Brian Moon

Brian Moon Says:

Yeah, we host our CSS and JS on a CDN. So, the host serving the content (dealnews.com) is not the same as the CDN for CSS and JS (content.dealnews.com). And that is even different than our images host (images.dealnews.com) that is also on the CDN, but on its own domain for some very complicated reasons dealing with the CDN.

We do have settings files within the PHP code that sets the proper domain for use in PHP scripts. The real problem was how to handle a CSS script, hosted on content.dealnews.com that references an image on images.dealnews.com and then how to make that URL somedevserver.images.dealnews.com when being served on a dev server.

mod_substitute is working fine now. Just had to move it all into the very front end request and not have it working behind mod_proxy.

Gravatar for Stuart Herbert

Stuart Herbert Says:

Hi Brian,

Just wondering ... instead of having to rewrite the URLs on the dev environment, have you tried simply overriding your DNS by putting entries for dealnews.com et al in your local hosts file, and having those entries point at the right IP addresses for your dev boxes?

Best regards,
Stu

Gravatar for Brian Moon

Brian Moon Says:

@Stuart: Well, there are 5 dev instances. And different people in the company need to access all/some of them. So, DNS does not really help.

Gravatar for Josh

Josh Says:

Has anyone had success with using this format to handle separate Substitutions for different content types in the same Location?

Example:
<Location /testServer>
FilterDeclare DN_REPLACE_URLS
FilterProvider DN_REPLACE_URLS SUBSTITUTE resp=Content-Type $text/
FilterChain DN_REPLACE_URLS
Substitute s/www.test.com/www.testServer.com/i

FilterDeclare DN_REPLACE_URLS2
FilterProvider DN_REPLACE_URLS2 SUBSTITUTE resp=Content-Type $/javascript
FilterChain DN_REPLACE_URLS2
Substitute s/www.testJava.com/www.testServerJava.com/i
</Location>

It seems to me that both substitutions get performed on both content-types rather than the first one for text/html and the other for javascript.

Gravatar for Tin

Tin Says:

Hi Brian,

Your post really helped me with getting FilterProivder and Substitute for my own project.

I used it to fix a problem with a vendor application that hard coded its urls but the company wants to have multiple virtual hosts to segregate different parts of the vendor application.

Gravatar for Kevin Wilson

Kevin Wilson Says:

@Josh
If your Content-Type is text/javascript, both filters will match and fire.

Gravatar for Kayvan

Kayvan Says:

I have posted a blog regarding how to do substitution using mod_substitute which can be found here:
http://www.objectmasters.com/Blog/Post/How-to-Inject-html-in-outgoing-responses-using-Apache-Substitute-module

Comments are disabled for this post.