Wed, Nov 18, 2009 08:00 AM
So, we have been building up our code library at
dealnews for 9 years. It was started at
the end of PHP3 and the beginning of PHP4. So, we did not have
autoloading, static functions, and all that jazz. Classes had lots
of overhead in early PHP4 so we started down a pure procedural road
in 2000. And for a long time, it was very maintainable. We had 2 or
3 developers for most of this time. We now have 5 or 6 depending on
whether we have contractors. There are starting to be too many
files and too many functions. We find ourselves adding new files
when some new function is created instead of adding it to an
existing file because we don't want to have huge files with 100
functions in them. File names and function names are getting longer
and more ambiguous. For example, we have a file called
url_functions.php. It contains functions to generate URLs for
different types of pages on the site, functions to fetch URLs from
the web and functions to parse URLs from an article. Those probably
don't all belong in one file. But, they got nickle and dimed in
there over time. So, now, we are inclined to not add anything to
that file and make new files for new semi-URL related functions.
Ugh.
It is time to start thinkinb about a reorganization. There are
1,900+ functions in 400+ files in our code library. This is just
our library. This does not include the code that actually builds a
page and generates output. It does not include our cron jobs or
system administration scripts. Yeah, that is a lot. So, where do we
go from here? Some things are easy to do. For example, we have a
file called string.php. Most all the functions in that file can
easily be moved a String class with static functions that can be
accessed via an autoloader.
Then we have the various ways we deal with the articles on the web
site. I have
written
about our front end vs. back end system before. What this means
for our code base is that we have two ways to deal with an article.
One is in our highly relational backend system. The other is in our
optimized front end database servers. So, one Article object won't
really do. We already have an Article object that serves as an ORM
interface for the backend. To access the front end data, we
currently have a library of functions (fetch_article for a single,
fetch_articles for a set, etc.) but it does not fit with an
autoloading environment. It also is not related to the object (the
article) and is associated with where the data is stored. New
developers don't grok the server infrastructure, so the code
organization may not make sense to them. We have about 10 different
objects that need both a back end and front end interface.
On the other hand, I really don't want to end up with a class named
FrontEndArticle and BackEndArticle. Much less do we want to have
stuff like BackEnd_Article where the file is actually in
BackEnd/Article.php somewhere. The verbosity becomes overwhelming
and hard to read, IMO.
So, what are others doing with huge code bases? I see lots of
projects with 100 or so functions/methods in 20-30 files.
Frameworks have it easy because they don't have a CEO that wants
something on this one page to be different than it is on every
other page where that data is used. We have to deal with those
types of hacks in an elegant way that can be maintained.
Mon, Oct 12, 2009 10:01 AM
Amy Hoy has written a blog post about why
forums are crap.
And she is right. Forum software does not always do a good job of
helping people communicate. I have worked with Amy. She did a great
analysis of
dealnews.com that
led to our new design. So, she is not to be ignored.
However, as a software developer (
Phorum), I see a lot of problems and
no answers. And it is not all on the software. Web site
owners use forums to solve problems that they really, really suck
at. Ideally, every web site would be very unique for their
audience. They would use a custom solution that fits a number
of patterns that best solves their problem. However, most web
site owners don't want to take the time to do such things.
They want a one stop, drop in solution. See the monolith that is
vBulletin, scary.
And what if a forum is the best solution? Well, software
developers, in general, are not good designers. They don't think
like normal people. And they don't see their applications as a
whole, but as pieces that do jobs. The forum software market has
been run by software developers for over 10 years. Most of them all
are still copies of what
UBB was 13 years
ago. And software (like Phorum) that has tried to be different is
shunned by the online communities of the world because they don't
work/look/feel like every other forum software on the planet.
So, as software developers, what are we to do? We want to make
great software. We want to help our users help their users. But,
what we have been doing for 10+ years has only been adequate. As
the leader of an open source forum software project, I am open to
any and all ideas.
Wed, Sep 30, 2009 11:49 AM
I spoke at
CodeWorks in
Atlanta, GA this week. I totally dropped the ball promoting
it on my blog. It was a neat venue. Rather than a large
conference they are doing a traveling show. Seven cities in
14 days. Many of the presenters are working in every
city. Crazy. I was just in Atlanta. It is close
to home and easy for me to get to.
I spoke about
memcached. I tried
to dig a bit deeper into how memcached works. On the mailing
list we get a lot of new people that make assumptions about
memcached. Most talks I have seen focus on why caching is
good, how to use memcached, the performance gain. I kind of
assumed everyone knew that stuff already. I guess you could
say I gave a talk that was the real FAQs of the project.
Here are the slides.
Derick Rethans took video of the
talk. When he gets that online I will add it to this
post.
Mon, Aug 10, 2009 12:12 AM
The latest package of
Wordcraft, the
PHP/MySQL based blog software that runs this site, is available for
download
from Google Code. Just some minor bug fixes and cosmetic
stuff. Its getting a little use in the wild. That is
always fun to see.
Thu, Jul 23, 2009 01:17 PM
We use PHP everywhere in our stack. For us, it makes sense because
we have hired a great staff of PHP developers. So, we leverage that
talent by using PHP everywhere we can.
One place where people seem to stumble with PHP is with long
running PHP processes or parallel processing. The pcntl extension gives you the
ability to fork PHP processes and run lots of children like
many other unix daemons might. We use this for various things.
Most notably, we use it run Gearman worker processes. While at
the OReilly Open Sourc Convention in 2009, we were asked about
how we pulled this off. So, we are releasing the two scripts
that handle the forking and some instructions on how we use
them.
This is not a detailed post about long running PHP
scripts. Maybe I can get to the dos and don'ts of that
another time. But, these are the scripts we use to manage
long running processes. They work great for us on
Linux. They will not run on Windows at all. We also
never had any trouble running them on Mac OS X.
The first script, prefork.php, is for forking a given function
from a given file and running n children that will
execute that function. There can be a startup function that is
run before any forking begins and a shutdown function to run
when all the children have died.
The second script, prefork_class.php, uses a class with defined
methods instead of relying on the command line for function
names. This script has the added benefit of having functions
that can be run just before each fork and after each fork. This
allows the parent process to farm work out to each child by
changing the variables that will be present when the child
starts up. This is the script we use for managing our Gearman
workers. We have a class that controls how many workers are
started and what functions they provide. I may release a
generic class that does that soon. Right now it is tied to our
code library structure pretty tightly.
We have also included two examples. They are simple, but do
work to show you how the scripts work.
You can download the code from the dealnews.com developers'
page.
UPDATE: I have released a Gearman Worker Manager on Github.
Sun, May 31, 2009 10:23 PM
My coworker Rob ran into an issue building the PECL/memcache
extension on his Mac. He did
find the solution however. You can read and leave
comments on his blog.
Fri, May 22, 2009 11:34 AM
First there was
LAMP.
But are you using GLAMMP? You have probably not heard of it
because we just coined the term while chatting at work. You
know LAMP (Linux, Apache, MySQL and PHP or Perl and sometimes
Python). So, what are the extra letters for?
The G is for Gearman -
Gearman is a system to farm out work
to other machines, dispatching function calls to machines that are
better suited to do work, to do work in parallel, to load balance
lots of function calls, or to call functions between languages.
The extra M is for Memcached -
memcached is a
high-performance, distributed memory object caching system, generic
in nature, but intended for use in speeding up dynamic web
applications by alleviating database load.
More and more these days, you can't run a web site on
just
LAMP. You need these extra tools (or ones like them) to do
all the cool things you want to do. What other tools do we
need to work into the acronym?
PostgreSQL replaces MySQL in lots
of stacks to form LAPP. I guess
Drizzle may replace MySQL in some stacks
soon. For us, it will likely be
added to the
stack. Will that make it GLAMMPD? We need more
vowels! If you are starting the next must use tool for
running web sites on open source software, please use a vowel for
the first letter.
Fri, May 15, 2009 12:38 PM
register_globals is going way in PHP6. That is fine with
me. Super globals are cool and I have taken to using
filter_input_array these
days anyhow. However, our code base is now 10+ years old at
dealnews. Most of the forward facing code was completely
rewritten in the last couple of years due to architecture
changes. Many new projects had register_globals turned off
via php_admin_flag in Apache. So, that area is not that big
of a problem. However, our internal admin areas have not all
be rewritten because, well frankly, they still work. Yeah,
stuff written for PHP4 in 2000 is still working. KISS helps a
lot with that. But, this code, somewhere in there, may still
be relying on register_globals. Now, we could go line by line
and try and fix it. But, it seems like a program could be
written to do this job. I mean, I use jEdit and it can
highlight unset vars using the PHPParserPlugin just fine. I
bet Zend IDE can do the same. Has anyone written such a tool
for the command line? There will be false positives I
know. Things like passing a variable by reference to a
function would look like a use before set. But, I can deal
with those if I don't have to go line by line through tons of old
code. What would the rules look like for such an
animal? This would be a great project to get off the ground
before PHP6 hits. Ideally you could provide a list of
variables for it to ignore. We have some globals we set up in
prepends and includes.
Thu, May 7, 2009 11:57 AM
Last year I was surprised to be
going to Velocity. Read the post, it was an
adventure. But, I
really like the conference. It is the perfect conference
for me. While a good majority of my work is done coding
PHP/MySQL apps, I tend to focus on architecture, frameworks,
performance and that kind of stuff. So, a web performance and
operations conference is just perfect.
Last year, I was on a panel with
some
great guys. I was able to share just a bit about my
experience dealing with the instant success of a web site.
This year, my proposal was accepted to talk more about dealing with
success of a web site. The talk will be focused on my
experience at
dealnews.com and
from working with power users for
Phorum. Here is the summary:
Lots of people talk about scaling and performance. But,
are they preparing for all the things that could
happen? There are multiple problems and there is not
one solution to solve them all.
Everything is running fine and BAM! – your site is linked from the front
page of Yahoo! What do you do? How can you handle that
sudden rush of traffic. Requests per second are running
5x normal levels. Servers have CPU spikes. Daemons are hitting the
maximums. You are running out of bandwidth. How could
you have been prepared for this? What are the tools and
techniques for this type of sudden rush?
Or, lets say you have just come out of a meeting where
everyone discovered that your site is growing in
traffic 70% – 80% year over year. That means that 1
million page views this month will be nearly 3 million
this time in 2 years. How can you plan for that? You
don’t want to redesign the whole architecture every 2
years. What methods could be used to deal with this
constant long term growth?
While there is no magic bullet for either of these
scenarios, there are techniques used by many sites out
there to help you get through these situations. This
session will cover some of these techniques and talk
about their pros and cons.
I must admit, this if the first time since 2000 that I am a
little intimidated to speak at a conference. The
people that present and attend Velocity are so
awesome. I just hope I don't disappoint.
Tue, Apr 21, 2009 12:48 PM
I just discovered an incompatibility between
Net Gearman and PHP
5.2.9+. json_decode was
changed in 5.2.9 to
return NULL on invalid JSON strings. Previously, the bare
string had been returned if it was not valid JSON. This was
nice in a way as you could pass a scalar string to json_decode and
not worry about it. But, in reality, it would make debugging
a nightmare for JSON.
I have updated
my github
fork and requested a pull into the
main branch.
Once that is done a new PEAR release can be done.