Code Organization Dilemma

Wed, Nov 18, 2009 08:00 AM
So, we have been building up our code library at dealnews for 9 years. It was started at the end of PHP3 and the beginning of PHP4. So, we did not have autoloading, static functions, and all that jazz. Classes had lots of overhead in early PHP4 so we started down a pure procedural road in 2000. And for a long time, it was very maintainable. We had 2 or 3 developers for most of this time. We now have 5 or 6 depending on whether we have contractors. There are starting to be too many files and too many functions. We find ourselves adding new files when some new function is created instead of adding it to an existing file because we don't want to have huge files with 100 functions in them. File names and function names are getting longer and more ambiguous. For example, we have a file called url_functions.php. It contains functions to generate URLs for different types of pages on the site, functions to fetch URLs from the web and functions to parse URLs from an article. Those probably don't all belong in one file. But, they got nickle and dimed in there over time. So, now, we are inclined to not add anything to that file and make new files for new semi-URL related functions. Ugh.

It is time to start thinkinb about a reorganization. There are 1,900+ functions in 400+ files in our code library. This is just our library. This does not include the code that actually builds a page and generates output. It does not include our cron jobs or system administration scripts. Yeah, that is a lot. So, where do we go from here? Some things are easy to do. For example, we have a file called string.php. Most all the functions in that file can easily be moved a String class with static functions that can be accessed via an autoloader.

Then we have the various ways we deal with the articles on the web site. I have written about our front end vs. back end system before. What this means for our code base is that we have two ways to deal with an article. One is in our highly relational backend system. The other is in our optimized front end database servers. So, one Article object won't really do. We already have an Article object that serves as an ORM interface for the backend. To access the front end data, we currently have a library of functions (fetch_article for a single, fetch_articles for a set, etc.) but it does not fit with an autoloading environment. It also is not related to the object (the article) and is associated with where the data is stored. New developers don't grok the server infrastructure, so the code organization may not make sense to them. We have about 10 different objects that need both a back end and front end interface.

On the other hand, I really don't want to end up with a class named FrontEndArticle and BackEndArticle. Much less do we want to have stuff like BackEnd_Article where the file is actually in BackEnd/Article.php somewhere. The verbosity becomes overwhelming and hard to read, IMO.

So, what are others doing with huge code bases? I see lots of projects with 100 or so functions/methods in 20-30 files.  Frameworks have it easy because they don't have a CEO that wants something on this one page to be different than it is on every other page where that data is used. We have to deal with those types of hacks in an elegant way that can be maintained.
16 comments
Gravatar for Casey

Casey Says:

Not sure if this will help as I haven't actually used this on any truly large code-bases, but perhaps use directories to help group similar functionality together? Using your URL-related functions, you could have a /lib/URL directory and then break out similar functions into their own files, such as url_parser.php, url_generator.php, etc.

Gravatar for Chris Hartjes

Chris Hartjes Says:

"Frameworks have it easy because they don't have a CEO that wants something on this one page to be different than it is on every other page where that data is used"

I call shenanigans on that. All frameworks do really is separate business logic from display logic. If you have to add some conditional statements into a template to meet business requirements, I cannot think of any PHP-based framework that could not handle that.

The problem is not the CEO, the problem is the incredibly brittle application that has been allowed to grow to this point. I speak from experience, having been in your position.

Gravatar for DGF

DGF Says:

"On the other hand, I really don't want to end up with a class named FrontEndArticle and BackEndArticle. Much less do we want to have stuff like BackEnd_Article where the file is actually in BackEnd/Article.php somewhere. The verbosity becomes overwhelming and hard to read, IMO."

You've just written off one of the most appropriate ways to organise a large codebase. Take a look at PEAR, ZF, Solar or any other major PHP project with a similar directory/namespace structure. If you don't like the underscores, consider 5.3's native namespace support.

You might also want to consider refactoring some of your code. The 'article' example would be best suited by abstraction and/or a decorator pattern.

Gravatar for Greg Beaver

Greg Beaver Says:

Hi Brian,

Although this is a huge hassle for you, what a great opportunity to design something dynamic, and an intellectual challenge on top of that! I love that kind of thing :).

The question I would ask regarding backend/frontend is whether the backend and the frontend have any commonality of how the data is represented. In other words, do you expect $blah->thing() to work for both a frontend and a backend $blah? Do you perform common tasks on both the backend and frontend? If the answer is no, then your code has no business abstracting them into the same containing class. If there is anything in common, then you can investigate lots of ways of representing them, whether having an interface that abstracts the common functionality to an abstract base class or even a class that can contain the backend/frontend-specific functionality with the delegator pattern.

If there is one thing I can say from experience it is that you would be best served by taking the necessary time to brainstorm in a more right-brained way until the right structure jumps out at you. It could eliminate thousands of lines of redundant code as well as making it far easier to read. Much of that time involves wandering around the hallway (or some other place you or the team doesn't usually work), visualizing the whole system and imagining the ways that it flows, and looking for non-obvious patterns and interactions.

Most design problems come from the set of assumptions behind them, and I find even being aware of what these assumptions are can be difficult in the normal task-oriented programming paradigm, let alone changing them.

Gravatar for Chuck Burgess

Chuck Burgess Says:

The backend/frontend thing just screams "facade + adapters" to me, and that falls in line with Greg's questions about whether or not the two are similar enough at an API level to become one abstraction.

Gravatar for Les

Les Says:

> There are 1,900+ functions in 400+ files in our code library.

Sorry, but that isn't a code library, it's a -beep- mess. I can appreciate that this mess dates back nearly a decade, but please don't describe it anything like a library; a library is something that is organised and well kept.

My thoughts are that you just flush the whole lot away and begin again from scratch,... after all you have PHP5 now and soon, PHP6. I would start from PHP6 simple because you have the benefits of those namespaces.

If on the other hand, you remain with what you have, you only have yourself's to blame :)

Gravatar for Richard Lynch

Richard Lynch Says:

I would suggest that the number of places that hook into the BackEndArticle should be rather small.

In fact, it probably should be mostly just the FrontEndArticle, and possibly some cron job or something...

So perhaps Article, and ORMArticle would be suitable class names.

I also have to wonder if anything unique/special really belongs in ORMArticle that isn't just in a base ORM class...

Perhaps the functionality on your BackEndArticle doesn't belong there at all: Either it's a database trigger / constraint, or it's a FrontEndArticle functionality.

PS
If you can handle a telecommuter, I'm looking for a job and could help out :-)

Gravatar for Brian Moon

Brian Moon Says:

Thanks Richard, we are trying to shore up ops right now though.

As for the ORM, there is a good bit of business logic in our backend article ORM layer. So, it is more than just base ORM. An article invloves 8 tables in our main database. In the front end, that data is flattened into one table with some serialized TEXT fields holding the relational data for speed. So, the two interactions are very different to the developer using them.

But, the Article problem is only a small part of our huge codebase issues.

Greg, we are doing just that. This process will likely take 6 months in all I expect.

Gravatar for Makario

Makario Says:

I bet most of that codebase is just crap. Dump it all out and start over again from scratch!

Gravatar for Jeff Dickey

Jeff Dickey Says:

> There are 1,900+ functions in 400+ files in our code library.

Yep, you've got three options:
1. burn it onto DVD, wipe the hard drives, then shred the DVD and start over;
2. refactor and evolve your way to a survivable code base; or
3. wade on in and make enhancements and fixes on an ad-hoc basis.

Having been through about five of those situations with as many different clients, I'm of the firm opinion that when you reach the amount of code you have, that's had as long as yours has had to age beyond understanding.... those three possibilities are in order of increasing risk and difficulty. Maybe the physical-media aspect of #1 is a bit over the top, but it would make clear to all and sundry that the ONLY available direction is FORWARD.

Gravatar for Brian Moon

Brian Moon Says:

@jeff, the key word in your comment is "clients". You did it an left. Having to live with the decision makes it all more complicated. I would love to do #1, but that would mean 6 months of stagnant production from my team. Which is not going to happen.

Gravatar for Adam

Adam Says:

I think you're asking wrong question. It's not about code organization, but architecture design. Guessing that you wish to go Object Oriented way, so do OO analysis of current app, design it from scratch and after that try to relate existing code to objects. If function does fit - great, if needs some refactoring - do it. Write new code to fulfill missing functionality.

Bigger problem is with functions out of scope, i.e. global helpers. You can leave it as it is for some time and after main refactoring (described above) find scope for those methods.

Gravatar for fqqdk

fqqdk Says:

contributing to the overall trolliness of the internet with just one sentence:
opensource it, and have all the funemployed losers beat it into shape :D

Gravatar for Les

Les Says:

Missed this bit...

> something on this one page to be different than it is on every other page where that data
> is used...

Never heard of the Composite? Your fragmented web page is therefore more organised and maintainable - absolutely no hacking.

I use Factory to construct the separate parts of any given page, and it's beautiful. I can even add/remove [new] parts once the factories are done with which gives greater flexibility.

Got to love object oriented methodologies.

Add A Comment

Your Name:


Your Email:


Your URL:


Your Comment: