MemProxy 0.1 is out! It has taken me a while, but I have finally gotten around to releasing the code that I credited with saving us during a Yahoo! mention. It is a caching proxy "server" that uses memcached for storing the cache. I put server in quotes because it is really just a PHP script that handles the caching and talking to the application servers. Apache and other HTTP servers already do a good job talking HTTP to a vast myriad of clients. I did not see any reason to reinvent the wheel. Here are some of the features that make it different from anything I could find:
Uses memcached for storage
Serves cache headers to clients based on TTL of cached data
Uses custom headers to assemble multiple pieces of cache into one object
Minimal dependencies. Only PHP and pecl/memcached needed.
Small code base. It is just two files, one when settings are cached.
Application agnostic. If the backend is hosted on an HTTP server this can cache it.
Some other things it does that you might expect:
Handles HTTP 1.1 requests to the backend
Allows TTLs set by the standard Cache-Control header
Appears transparent to the client.
Sends proper HTTP error codes relating to proxies/gateways
Allows pages to be refreshed or removed from cache
Allows a page to be viewed from the application server without caching it
You can find the code on Google Code. The code (or something like it rather) has been in use at dealnews for well over a year. But, this is a new code base. It had to be refactored for public consumption. So, there may be bugs.
SimpleXML is neat. Some people don't think it is so simple. Boy, use the old stuff. The DOM-XML stuff.
Anyhow, one annoying thing about SimpleXML has to do with caching. When using web services, we often cache the contents we get back. We were having a problem where we would get an error about a SimpleXML node not existing. We were caching the data in memcached which serializes the variable. So, when it unserialized the variable, there were references in there to some SimpleXML nodes that we did not take care of. Basically, a tag like:
is a string. But a tag like:
is an empty SimpleXML Object. That is a little annoying, but I don't feel like digging into the C code and figuring out why. So, we just work around it. We made a recursive function to do the dirty work for us.
That will turn whatever you pass it into an array or empty string if it is empty.
But, while I was hacking around tonight, I came up with another idea. Check out this hackery:
$data = json_decode(json_encode($data));
Yeah! One liner. That converts all the SimpleXML elements into stdClass objects. All other vars are left intact.
Ok, so this is where someone in the comments can tell me about the magic SimpleXML method or magic OOP function I have missed to take care of all this. Go ahead, please make my code faster. I dare you.
I have always had an issue with PHP Sessions. Albeit, a lot of my issues are now invalid. When they were first implemented, they had lots of issues. Then the $_SESSION variable came to exist and it was better. Then memcached came to exist and you could store sessions there. That was better. But, still, after all this time, there is one issue that still bugs me.
When you start a session, if the user had no cookie, they get a new session id and they get a cookie. You can configure that cookie to last for n seconds via php.ini or session_cookie_set_params(). But, and this is a HUGE but for me, that cookie will expire in n seconds no matter what. Let me explain further. For my needs, the cookie should expire in n seconds from last activity. So, each page load where sessions are used should reset the cookie's expiration. This way, if a user leaves the site, they have n seconds to come back and still be logged in.
Consider an application that sets the cookie expiration to 5 minutes. The person clicks around on the site, gets a phone call that lasts 8 minutes and then gets back to using the site. Their session has expired!!!! How annoying is that? The only sites I know that do that are banks. They have good reason. I understand that.
My preference would be to either set an ini value that tells PHP sessions to keep the session active as long as the user is using the site. Or give me access to the internal function php_session_send_cookie(). That is the C function that sends the cookie to the user's browser. Hmm, perhaps a patch is in my future.
In the short term, this is what I do:
That will set the session cookie with a fresh ttl.
Ok, going to dig into some C code now and see if I can make a patch for this.
Well, it has been almost a month. I know I am late to the blogosphere on my thoughts. Just been busy.
Again this year, the Phorum team was invited to be a part of the DotOrg Pavilion. What is that? Basically they just give expo floor space to open source projects. It is cool. We had a great location this year. We were right next to the area where they served food and drinks during the breaks. We had lots of traffic and met some of our power users. IMVU.com is getting 1.5 million messages per month in their Phorum install. They did have to customize it to fit into their sharding. But, that is expected. A guy (didn't catch his name) from Innobase came by and told us that they just launced InnoDB support forums on their site using Phorum. Cool. So now MySQL and Innobase use Phorum. I am humbled by the message that sends to me about Phorum.
Speaking of our booth, we were right next to the phpMyAdmin guys. Wow, that product has come a long way. I was checking out the visual database designer they have now. It was neat. I also met the Gentoo MySQL package maintainer. He was in the phpMyAdmin booth.
I was interviewed by WebDevRadio as I already posted. I was also asked to do a short Q&A with the Sun Headlines video team. They used one part of my clip. I won't link to that. No, if you find it good for you. I need to be interviewed some more or something. I did not look comfortable at all.
There were lots of companies with open in their name or slogan. I guess this is expected pandering.
I attended part of the InnoDB talk given by Mark Callaghan of Google. It appears that Google is serious about improving InnoDB on large machines. That is, IMO, good news for anyone that likes InnoDB. If I counted right, they had more than 5 people who at least part of their job is to improve InnoDB.
I gave my two talks. The first had low attendance, but the feedback was nice. It was just after the snack break in the expo hall and I was in the farthest room from the expo hall. That is what I keep telling myself. =) The second was better attended and the feedback seemed good there. I was told by Maurice (Phorum Developer) that I talked too fast and at times sounded like Mr. Mackey from South Park by repeating the word bad a lot. I will have to work on that in the future. I want to do more speaking.
On the topic of my second talk, there seemed to be a lot of "This is how we scaled our site" talks. I for one found them all interesting. Everyone solves the problem differently.
Next year I am thinking about getting more specific with my talk submissions. Some ideas include: PHP, MySQL and Large Data Sets, When is it ok to denormalize your data?, Using memcached (not so much about how it works), Index Creation (tools, tips, etc.).
In closing, I want to give a big thanks to Jay Pipes and Lenz Grimmer from MySQL. Despite Jay's luggage being lost he was still a big help with some registration issues among other things. Both of them helped out the Phorum team a great deal this year. Thanks guys.
While I was at the MySQL Conference, I sat down with Michael Kimsal of WebDevRadio and recapped the two talks that I gave at the conference. I have uploaded the slides so you can follow along if you want.
515 Sparkman Drive
Huntsville , AL 35816
Brian Moon of dealnews.com will be discussing best practices for writing database backed web based applications. Many users teach themselves SQL and programming on the web. Other developers may have experience in enterprise desktop applications. No matter what your background, there are common mistakes made when deploying web based applications that use a database.
Also, at this event, we will be giving away two copies of NuSphere's PhpED. Plus, everyone who attends can purchase any NuSphere product at 50% off.
In the last 10 years, dealnews.com has grown from a single shared hosting account to an entire rack of equipment. Luckily, we started using PHP and MySQL very early in the company's history.
From the early days of growing a forum to surviving Slashdotting, Digging and even a Yahoo! front page mention, we have had to adapt both our hardware and software many times to keep up with the growth.
I will discuss the traps, bottlenecks, and even some big wins we have encountered along the way using PHP and MySQL. From the small scale to using replication and even some MySQL Cluster. We have done many interesting things to give our readers (and our content team) a good experience when using our web site.
MySQL hacks and tricks to make Phorum fast
Phorum is the message board software used by MySQL. One reason they chose Phorum was because of its speed. We have to use some tricks and fancy SQL to make this happen. Things we will talk about in this session include:
Using temporary tables for good uses.
Why PHP and MySQL can be a bad mix with large data sets.
What mysqlnd will bring to the table with the future of PHP and MYSQL.
How Phorum uses full text indexing and some fancy SQL to make our search engine fast.
Forcing MySQL to use indexes to ensure proper query performance.
Well, first, what is an MPM? It stands for Multi-Processing Module. It is the process model that Apache uses for its children process. Each request that comes in is handed to a child. Apache 1 used only one model for this, the prefork model. That uses one process per Apache child. The most commonly used threaded MPM is the Worker MPM. In this MPM, you have several processes that run multiple threads within it. This is the one I will be talking about. You can read more on Apache MPMs at the Apache web site.
Huge memory savings
With the Apache prefork or even FastCGI, each apache/php process allocates its own memory. Most healthy sites I have worked on use about 15MB of memory per apache process. Code that has problems will use even more than this. I have seen some use as much as 50MB of RAM. But, lets stick with healthy. So, a server with 1GB of RAM will only realistically be able to run 50 Apache processes or 50 PHP children for FastCGI if each uses 15MB or RAM. That is 750MB total. That leaves just 256MB for the OS and other applications. Now, if you are Yahoo! or someone else with lots of money and lots of equipment, you can just keep adding hardware. But, most of us can't do that.
As I wrote above, the worker MPM apache uses children (processes) and threads. If you configure it to use 10 child processes, each with 10 threads you would have 100 total threads or clients to answer requests. The good news is, because 10 threads are in one process, they can reuse memory that is allocated by other threads in the same process. At dealnews, our application servers use 25 threads per child. In our experience, each child process uses about 35MB of RAM. So, that works out to about 1.4MB per thread. That is 10% the usage for a prefork server per client.
Some say that you will run out of CPU way before RAM. That was not what we experienced before switching to worker. Machines with 2GB of RAM were running out of memory before we hit CPU as a bottleneck due to having just 100 Apache clients running. Now, with worker, I am happy to say that we don't have that problem.
Building PHP for best success with Worker
This is an important part. You can't use radical extensions in PHP when you are using worker. I don't have a list of extensions that will and won't work. We stick with the ones we need to do our core job. Mainly, most pages use the mysql and memcached extension. I would not do any fancy stuff in a worker based server. Keep a prefork server around for that. Or better yet, do funky memory sucking stuff in a cron job and push that data somewhere your web servers can get to it.
Other benefits like static content
Another big issue you hear about with Apache and PHP is running some other server for serving static content to save resources. Worker allows you to do this without running two servers. Having a prefork Apache/PHP process that has 15MB of RAM allocated serve a 10k jpeg image or some CSS file is a waste of resources. With worker, like I wrote above, the memory savings negate this issue. And, from my benchmarks (someone prove me wrong) Apache 2 can keep up with the lighttpds and litespeeds of the world in terms of requests per second for this type of content. This was actually the first place we used the worker mpm. It may still be a good idea to have dedicated apache daemons running just for that content if you have lots of requests for it. That will keep your static content requests from over running your dynamic content requests.
Some issues we have seen
Ok, it is not without problems (but, neither was prefork). There are some unknown (meaning undiagnosed by us) things that will occasionally cause CPU spikes on the servers running worker. For example, we took two memcached nodes offline and the servers that were connected to them spiked their CPU. We restarted Apache and all was fine. It was odd. We had another issue where a bug in my PHP code that was calling fsockopen() without a valid host name and a long timeout would cause a CPU spike and would not seem to let go. So, it does seem that bad PHP code makes the server more sensitive. So, your mileage may vary.
As with any new technology, you need to test a lot before you jump in with both feet. Anyone else have experience with worker and want to share?
One last tip
We have adopted a technique that Rasmus Lerdorf had mentioned. We decide how many MaxClients a server can run and we configure that number to always run. We set the min and max settings of the Apache configuration the same. Of course, we are running service specific servers. If you only have one or two servers and they run Apache and MySQL and mail and dns and... etc. you probably don't want to do that. But, then again, you need to make sure MaxClients will not kill your RAM/CPU as well. I see lots of servers that if MaxClients was actually reached, they would be using 20GB of RAM. And, these servers only have 2GB of RAM. So, check those settings. If you can, configure it to start up more (all if you can) Apache process rather than a few and make sure you won't blow out your RAM.
Anything goes. We can talk about memcached, Tugela, basic file caching... whatever.
More and more web sites are finding that they need to uses caching to increase their performance. There are those of us that have solved some problems. Others that are new to these techniques have a lot of questions. This BoF is an opportunity for web developers to share their ideas on caching.
Specifically, the dealnews.com dev team can talk about the 3 main types of caching we use and where each is applicable.