Code Organization Dilemma
Wed, Nov 18, 2009 08:00 AM
So, we have been building up our code library at dealnews for 9 years. It was started at
the end of PHP3 and the beginning of PHP4. So, we did not have
autoloading, static functions, and all that jazz. Classes had lots
of overhead in early PHP4 so we started down a pure procedural road
in 2000. And for a long time, it was very maintainable. We had 2 or
3 developers for most of this time. We now have 5 or 6 depending on
whether we have contractors. There are starting to be too many
files and too many functions. We find ourselves adding new files
when some new function is created instead of adding it to an
existing file because we don't want to have huge files with 100
functions in them. File names and function names are getting longer
and more ambiguous. For example, we have a file called
url_functions.php. It contains functions to generate URLs for
different types of pages on the site, functions to fetch URLs from
the web and functions to parse URLs from an article. Those probably
don't all belong in one file. But, they got nickle and dimed in
there over time. So, now, we are inclined to not add anything to
that file and make new files for new semi-URL related functions.
Ugh.
It is time to start thinkinb about a reorganization. There are 1,900+ functions in 400+ files in our code library. This is just our library. This does not include the code that actually builds a page and generates output. It does not include our cron jobs or system administration scripts. Yeah, that is a lot. So, where do we go from here? Some things are easy to do. For example, we have a file called string.php. Most all the functions in that file can easily be moved a String class with static functions that can be accessed via an autoloader.
Then we have the various ways we deal with the articles on the web site. I have written about our front end vs. back end system before. What this means for our code base is that we have two ways to deal with an article. One is in our highly relational backend system. The other is in our optimized front end database servers. So, one Article object won't really do. We already have an Article object that serves as an ORM interface for the backend. To access the front end data, we currently have a library of functions (fetch_article for a single, fetch_articles for a set, etc.) but it does not fit with an autoloading environment. It also is not related to the object (the article) and is associated with where the data is stored. New developers don't grok the server infrastructure, so the code organization may not make sense to them. We have about 10 different objects that need both a back end and front end interface.
On the other hand, I really don't want to end up with a class named FrontEndArticle and BackEndArticle. Much less do we want to have stuff like BackEnd_Article where the file is actually in BackEnd/Article.php somewhere. The verbosity becomes overwhelming and hard to read, IMO.
So, what are others doing with huge code bases? I see lots of projects with 100 or so functions/methods in 20-30 files. Frameworks have it easy because they don't have a CEO that wants something on this one page to be different than it is on every other page where that data is used. We have to deal with those types of hacks in an elegant way that can be maintained.
It is time to start thinkinb about a reorganization. There are 1,900+ functions in 400+ files in our code library. This is just our library. This does not include the code that actually builds a page and generates output. It does not include our cron jobs or system administration scripts. Yeah, that is a lot. So, where do we go from here? Some things are easy to do. For example, we have a file called string.php. Most all the functions in that file can easily be moved a String class with static functions that can be accessed via an autoloader.
Then we have the various ways we deal with the articles on the web site. I have written about our front end vs. back end system before. What this means for our code base is that we have two ways to deal with an article. One is in our highly relational backend system. The other is in our optimized front end database servers. So, one Article object won't really do. We already have an Article object that serves as an ORM interface for the backend. To access the front end data, we currently have a library of functions (fetch_article for a single, fetch_articles for a set, etc.) but it does not fit with an autoloading environment. It also is not related to the object (the article) and is associated with where the data is stored. New developers don't grok the server infrastructure, so the code organization may not make sense to them. We have about 10 different objects that need both a back end and front end interface.
On the other hand, I really don't want to end up with a class named FrontEndArticle and BackEndArticle. Much less do we want to have stuff like BackEnd_Article where the file is actually in BackEnd/Article.php somewhere. The verbosity becomes overwhelming and hard to read, IMO.
So, what are others doing with huge code bases? I see lots of projects with 100 or so functions/methods in 20-30 files. Frameworks have it easy because they don't have a CEO that wants something on this one page to be different than it is on every other page where that data is used. We have to deal with those types of hacks in an elegant way that can be maintained.
Casey Says:
Not sure if this will help as I haven't actually used this on any truly large code-bases, but perhaps use directories to help group similar functionality together? Using your URL-related functions, you could have a /lib/URL directory and then break out similar functions into their own files, such as url_parser.php, url_generator.php, etc.