Saturday, March 20, 2004


It has become fashionable in some circles these days (notably, eXtreme Programming and Agile Development) to say "code the absolute minimum for the problem (you know you have)."

Problem is, because these same folks often forego a lot of early design work (which they refer to as the Paralysis by Overanalysis antipattern), they often don't know what problem they are trying to solve. Or not completely, or not accurately or something. The solution they have for this is to "refactor often." Well, duh. This reminds me of the Monty Python "ant counter" sketch: "What do you feed them on?" "Nothing." "Then what do they live on?" "They don't, they die." "They die?" "Well, of course if you don't feed them."

I am forced to consider Pressman's research here. For those who may forget, Pressman looked at the cost to fix defects at different points in an application lifecycle, and recorded exponential cost growth as one moves into later phases of a waterfall process. If I remember the numbers correctly, setting a scale at 1 unit cost if found during initial coding, problems found in design (before coding) reduced the cost to .75, problems found in unit test were perhaps 1.25, integration 2 or so. Etc.

How is it that skipping the initial design will reduce costs? It won't, because refactoring costs. In fact, for me the only result that seems likely is that for any project too complex to keep all in one's head at once, eXtreme programming will increase costs by shifting defect correction to the right on Pressman's curve.

An overlapping group of folks as recommend XP also suggest that <some complex technology> is a sledgehammer and shouldn't be used for small problems.

But what is a small problem, expecially if one doesn't know the requirements (well) and hasn't done (much, or any) design?  Moreover, how many applications we deploy today will never be extended? And if your application will be extended, just how will you go about this?

An example that is used to illustrate a 'flyweight' problem that shouldn't be solved with a complex solution is a logging facility. Instead there are a number of solutions out there that take a variety of approaches: one such is to write a log class that takes a couple of parameters (severity and message, for example) and writes them to a flat file or through JDBC to a database. Cool. Simple problem, simple solution.

So roll forward, your application has been deployed with its simple logging solution. Now someone else builds another application that wants to talk to yours, and wants to add entries to the same log, preferably in-order.  There are a few ways to accomplish this. One is to have the different applications each open the file whenever they want to add something, which assumes the filesystem is local to both applications. Or if your used JDBC, that the database server is accessible etc. Another solution is to add some sort of remote protocol (perhaps a web service interface) to the first application, to allow the second to send a log event.  That's rather a bunch of code to add to that first application.  Of course, now that there is a second application, you're going to have to think about what information to log - do you want to capture the application name as well, perhaps? So you find yourself now writing 3 fields instead of the 2 your wrote previously, and  updating all your old code to call the new log method signature. And perhaps converting your old log files.

More time goes by, and you've noticed that there is a recurring event that shows up in  your log that requires special processing whenever it occurs. Perhaps a cracker is targeting your server and you want to know when it is happening, so whenever the log gets this event you want an email sent to your cellphone.  If you've been using a log file, you could at this point write another program that watches the tail of that file, parses the text, and if it matches a pattern compose and send an email.  This is an expensive solution because it is a separate process, and it is parsing information that was probably composed out of separate fields pereviously. And again, assumes the file is local, available.

The point of all this is that even something as simple as a log facility may wind up having very different requirements that those it started with. In fact even the assumption that the log only gets written and if it is processed at all it is later in a batch probably by a human, is suspect. And with every change in requirements came some refactoring, or rewriting, and a whole bunch of new code.

Returning to our ant counter, "I don't understand." "You let them die, then you buy new ones."

As if! Like we can toss old applications when requirements change! Those old COBOL applications are still hanging around, 30 years later!

By contrast, choosing a better architected solution from the beginning increased some up-front work, but can reduce the overall complexity as time goes by.  My own favorite solution for logging is publishing messages to a publish/subscribe message bus.  The messages retain structure (that is, they stay as a set of attribute/value pairs), and they can be reused by new applications (for example, our monitoring function that sends an email) sometimes with less new code and without any change to the originator or any other participant in the log. And the scalability as the number of processes filing log entries increases is very good, without those logging being blocked by others' locks on the log file or database.

To my mind, this is true agility - easier extensibility and stable functions (small changes in inputs mean small changes in outputs).

Oh, and if you develop the logging structure once that is more extensible, you can more easily reuse it for your next application - the code to log the next app is to format the attribute/value pairs for the new app into a message, and perhaps add a new message channel to the messaging system, and add some filter on the mesage bus to listen for those new messages and save the contents. Not hard, and distribution and scaling came for free on this second use.

The real problem seem to me that XP is going out of its way to downplay foresight and planning. In a world in which applications frequently, if not invariably, are extended to do more that originally intended, solving for the minimum is frequently not a good choice.

XP may have been a good solution to the bubble economy and 'internet time', but I think 'internet time' has turned out to be the wrong problem to solve.  Adding real value seems to be coming back into fashion.

[Comments from my previous blog]

1. a reader left...
Sunday, 21 March 2004 2:47 pm
Where XP excels is when the solution is not clear. This happens frequently (because problem's that have clear solutions, very often have already been solved -- your logging as a case in point).

Just as you suggest, there are massive benifits to be gained from correctly designing up front. This is the lure. But no matter how well you model the current requirements, if either you or your users don't understand the problem completely your model will be wrong. The more up front design you did, the more that that costs.

So, where the problem is understood, use patterns, designs, libraries etc. that have probably already been made to solve it. Where the problem (or solution) requires a few iterations to solve, build only what you need right now.


No comments:

Post a Comment