January 30, 2015

It's True: TDD Isn't The Only Game In Town. So What *Are* You Doing Instead?

The artificially induced clickbait debate "Is TDD dead?" continues at developer events and in podcasts and blog posts and commemorative murals across the nations, and the same perfectly valid point gets raised every time: TDD isn't the only game in town

They're absolutely right. Before the late 1990's, when the discipline now called "Test-driven Development" was beginning to gain traction at conferences and on Teh Internets, some teams were still somehow managing to create reliable, maintainable software and doing it economically.

If they weren't doing TDD, then what were they doing?

The simplest alternative to TDD would be to write the tests after we've written the implementation. But hey, it's pretty much the same volume of tests we're writing. And, for sure, many TDD practitioners go on to write more tests after they've TDD'd a design, to get better assurance.

And when we watch teams who write the tests afterwards, we tend to find that the smart ones don't write them all at once. They iteratively flesh out the implementation, and write the tests for it, one or two scenarios (test cases) at a time. Does that sound at all familiar?

Some of us were using what they call "Formal Methods" (often confused with heavyweight methods like SSADM and the Unified Process, which aren't really the same thing.)

Formal Methods is the application of rigorous mathematical techniques to the design, development and testing of our software. The most common approach was formal specification - where teams would write a precise, mathematical and testable specification for their code and then write code specifically to satisfy that specification, and then follow that up with tests created from those specifications to check that the code actually works as required.

We had a range of formal specification languages, with exotic names like Z (and Object Z), VDM, OCL, CSP, RSVP, MMRPG and very probably NASA or some such.

Some of them looked and worked like maths. Z, for example, was founded on formal logic and set theory, and used many of the same symbols (since all programming is set theoretic.)

Programmers without maths or computer science backgrounds found mathematical notations a bit tricky, so people invented formal specification languages that looked and worked much more like the programming languages we were familiar with (e.g., the Object Constraint Language, which lets us write precise rules that apply to UML models.)

Contrary to what you may have heard, the (few) teams using formal specification back in the 1990's were not necessarily doing Big Design Up-Front, and were not necessarily using specialist tools either.

Much of the formal specification that happened was scribbled on whiteboards to adorn simple design models to make key behaviours unambiguous. From that teams might have written unit tests (that's how I learned to do it) for a particular feature, and they pretty much became the living specification. Labyrinthine Z or OCL specifications were not necessarily being kept and maintained.

It wasn't, therefore, such a giant leap for teams like the ones I worked on to say "Hey, let's just write the tests and get them passing", and from there to "Hey, let's just write a test, and get that passing".

But it's absolutely true that formal specification is still a thing that some teams still do - you'll find most of them these days alive and well in the Model-driven Development community (and they do create complete specifications and all the code is generated from those, so the specification is the code - so, yes, they are programmers, just in new languages.)

Watch Model-driven Developers work, and you'll see teams - well, the smarter ones - gradually fleshing out executable models one scenario at a time. Sound familiar?

So there's a bunch of folk out there who don't do TDD, but - by jingo! - it sure does look a lot like TDD!

Other developers used to embed their specifications inside the code, in the form of assertions, and then write tests suites (or drive tests in some other way) that would execute the code to see if any of the assertions failed.

So their tests had no assertions. It was sort of like unit testing, but turned inside out. Imagine doing TDD, and then refactoring a group of similar tests into a single test with a general assertion (e.g., instead of assert(balance == 100) and assert(balance == 200), it might be assert(balance = oldBalance + creditAmount).

Now go one step further, and move that assertion out of the test and into the code being tested (at the end of that code, because it's a post-condition). So you're left with the original test cases to drive the code, but all the questions are being asked in the code itself.

Most programming languages these days include a built-in assertion mechanism that allows us to do this. Many have build flags that allows us to turn assertion checking on or off (on if testing, off if deploying to live.)

When you watch teams working this way, they don't write all the assertions (and all the test automation code) at once. They tend to write just enough to implement a feature, or a single use case scenario, and flesh out the code (and the assertions in it) scenario by scenario. Sound familiar?

Of course, some teams don't use test automation at all. Some teams rely on inspections, for example. And inspections are a very powerful way to debug our code - more effective than any other technique we know of today.

But they hit a bit of a snag as development progresses, namely that inspecting all of the code that could be broken after a change, over and over again for every single change, is enormously time-consuming. And so, while it's great for discovering the test cases we missed, as a regression testing approach, it sucks ass Gagnam Style.

But, and let's be clear about this, these are the techniques that are - strictly speaking - not TDD, and that can (even if only initially, like in the case of inspections) produce reliable, maintainable software. if you're not doing these, then you're doing something very like these.

Unless, of course... you're not producing reliable, maintainable code. Or the code you are producing is so very, very simple that these techniques just aren't necessary. Or if the code you're creating simply doesn't matter and is going to be thrown away.

I've been a software developer for approximately 700 million years (give or take), so I know from my wide and varied experience that code that doesn't matter, code that's only for the very short-term, and code that isn't complicated, are very much the exceptions.

Code that gets used tends to stick around far longer than we planned. Even simple code usually turns out to be complicated enough to be broken. And if it doesn't matter, then why in hell are we doing it? Writing software is very, very expensive. If it's not worth doing well, then it's very probably - almost certainly - not worth doing.

So what is the choice that teams are alluding to when they say "TDD isn't the only game in town?" Do they mean they're using Formal Methods? Or perhaps using assertions in their code? Or do they rely on rigorous inspections to make sure they get it at least right the first time around?

Or are they, perhaps, doing none of these things, and the choice they're alluding to is the choice to create software that's not good enough and won't last?

I suspect I know the answer. But feel free to disagree.







January 26, 2015

Intensive Test-driven Development, London, Saturday March 14th - Insanely Good Value!

Just a quick note to mention that the Codemanship 1-day Intensive TDD training workshop is back, and at the new and insanely low price of just £20.

You read that right; £20 for a 1-day TDD training course (a proper one!)

Places are going fast. Visit our EventBrite page to book your place now




January 21, 2015

My Solution To The Dev Skills Crisis: Much Smaller Teams

Putting my Iconoclast hat on temporarily, I just wanted to share a thought that I've harboured almost my entire career: why aren't very small teams (1-2 developers) the default model in our industry?

I think back to products I've used that were written and maintained by a single person, like the guy who writes the guitar amp and cabinet simulator Recabinet, or my brother, who wrote a 100,000 line XBox game by himself in a year, as well as doing all the sound, music and graphic design for it.

I've seen teams of 4-6 developers achieve less with more time, and teams of 10-20 and more achieve a lot less in the same timeframe.

We can even measure it somewhat objectively: my Team Dojo, for example, when run as a one day exercise seems to be do-able for an individual but almost impossible for a team. I can do it in about 4 hours alone, but I've watched teams of very technically strong developers fail to get even half-way in 6 hours.

People may well counter: "Ah, but what about very large software products, with millions of lines of code?" But when we look closer, large software products tend to be interconnected networks of smaller software products presenting a unified user interface.

The trick to a team completing the Team Dojo, for example, is to break the problem down at the start and do a high-level design where interfaces and contracts between key functional components are agreed and then people go off and get their bit to fulfil its contracts.

hence, we don't need to know how the spellcheck in our word processor works, we just need to know what the inputs and expected outputs will be. We could sketch it out on paper (e.g., with CRC cards), or we could sketch it out in code with high-level interfaces, using mock objects to defer the implementation design.

There'll still be much need for collaboration, though. It's especially important to integrate your code frequently in these situations, because there's many a slip 'twixt cup and microservice.

As with multithreading (see previous blog post), we can aim to limit the "touch points" in component-based/service-oriented/microservice architectures so that - as much as possible - each component is self-contained, presents a simple interface and can be treated as a black box by everyone who isn't working on its implementation.

Here's the thing, though: what we tend to find with teams who are trying to be all hifalutin and service-oriented and enterprisey-wisey is that, in reality, what they're working on is a small application that would probably be finished quicker and better by 1-2 developers (1 on her own, or 2 pair programming).

You only get an economy of scale with hiding details behind clean interfaces when the detail is sufficiently complex that it makes sense to have people working on it in parallel.

Do you remember from school biology class (or physics, if you covered this under thermodynamics) the lesson about why small mammals lose heat faster than large mammals?

It's all about the surface area-to-volume ratio: a teeny tiny mouse presents a large surface area proportional the volume of its little body, so more of its insides are close to the surface and therefore it loses heat through its skin faster than, say, an elephant who has a massive internal volume proportional to its surface area, and so most of its insides are away from the surface.

It may be stretching the metaphor to breaking point, but think of interfaces as the surface of a component, and the code behind the interfaces as the internal volume. When a component is teeny-tiny, like a wee mouse, the overhead in management, communication, testing and all that jazz in splitting off developers to try to work on it in parallel makes it counterproductive to do that. Not enough of the internals are hidden to justify it. And so much development effort is lost through that interface as "heat" (wasted energy).

Conversely, if designed right, a much larger component can still hide all the detail behind relatively simple interfaces. The "black box-iness" of such components is much higher, in so much as the overhead for the team in terms of communication and management isn't much larger than for the teeny-tiny component, but you get a lot more bang for your buck hidden behind the interfaces (e.g., a clever spelling and grammar checker vs. a component that formats dates).

And this, I think, is why trying to parallelise development on the majority of projects (average size of business code base is ~100,000 lines of code) is on a hiding to nowhere. Sure, if you're creating on OS, with a kernel, and a graphics subsystem, and a networking subsystem, etc etc, it makes sense to a point. But when we look at OS architectures, like Linux for example, we see networks of "black-boxy", weakly-interacting components hidden behind simple interfaces, each of which does rather a lot.

For probably 9 our of 10 projects I've come into contact with, it would in practice have been quicker and cheaper to put 1 or 2 strong developers on it.

And this is my solution to the software development skills crisis.



January 16, 2015

Can Restrictive Coding Standards Make Us More Productive?

So, I had this long chat with a client team yesterday about coding standards, and I learned that they had - several years previously - instituted a small set of very rigorously enforced standards. We discussed the effects of this, especially on their ability to get things done.

They've built a quality gate for their product that runs as part of their integration builds (though you can save yourself time by running it on your desktop before you try to commit).

To give you an example of the sort of thing this quality gate catches, and their policies for dealing with it, try this one for size...

Necessarily, parts of their code are multi-threaded and acting on shared data. Before we begin, let's just remind ourselves that this is something that all programming languages have to deal with. Functional programming uses sleight of hand to make it look like multiple threads aren't acting on shared data, but they do this typically by pushing mutable state out into some transactional mechanism where our persistent data is managed. And hence, at some point, we must deal with the potential consequences of concurrency.

So, where was I? Oh yeah...

Anyway, they find that unavoidably there must be some multithreading in their code. Their standard, however, is to do as little multithreading as possible.

So their quality gate catches commits that introduce new threads. Simple as that: check in code that spawns news threads, and the gate slams shut and traps it in "commit purgatory" to be judged by a higher power.

The "higher power", in this case, is the team. Multithreading has to be justified to an impromptu panel of your peers. If they can think of a way to live without it, it gets rejected and you have to redo your code without introducing new threads.

Why go to all this trouble?

Well, we've all seen what unnecessary complexity does to our code. It makes it harder to understand, harder to change, and more likely to be contain errors. Multithreading adds arguably the most pernicious kind of complexity, creating an explosion in the possible ways our code can go wrong.

So every new thread, when it's acting on shared data, piles on the risk, and piling on risk piles on cost. Multithreading comes at a high price.

The team I spoke to yesterday recognised this high price - referring to multithreaded logic as "premium code" (because you have to pay a premium to make it work reliably enough) - and took steps to limit its use to the absolute bare minimum.

In the past, I've encouraged a system of tariffs for introducing "premium code" into our software. For example, fill a small jam jar with tokens of some kind labelled "1 Thread", and every time you think you need to write code that spawns a new thread, you have to go and get a token form the jar. This is a simple way to strictly limit the amount of multithreaded code by limiting the number of available tokens.

This can also serve to remind developers that introducing multithreaded code is a big deal, and they shouldn't do it lightly or thoughtlessly.

Of course, if you hit your limit and you're faced with a problem where multithreading is unavoidable, that can force you to look at how existing multithreaded code could be removed. Maybe there's a better way of doing what we want that we didn't think of at the time.

In the case of multithreading - and remember that this is just an example (e.g., we could do something similar whenever someone wants to introduce a new dependency on an external library or framework) - it can also help enormously to know where in our code the multithreaded bits are. These are weak points in our code - likely to break - and we should make sure we have extra-string scaffolding (for example, better unit tests) around those parts of the code to compensate.

But the real thrust of our discussion was about the impact rigorously enforced coding standards can have on productivity. Standards are rules, and rules are constraints.

A constraint limits the number of allowable solutions to a problem. We might instinctively assume that limiting the solution space will have a detrimental effect on the time and effort required to solve a problem (for the same reason it's easier to a throw a 7 with two dice than a 2 or a 12 - more ways to achieve the same goal).

But the team didn't find this to be the case. If anything, to a point, they find the reverse is true: the more choices we have, the longer it seems to take us.

So we ruminated on that for a while, because it's an interesting problem. Here's what I think causes it:

PROGRAMMERS ARE PARALYSED BY TOO MANY CHOICES

There, I've said it.

Consider this: you are planning a romantic night in with your partner. You will cook a romantic dinner, with a bottle of wine and then settle down in front of the TV to watch a movie.

Scenario #1: You go to the shopping mall to buy ingredients for the meal, the wine and a DVD

Scenario #2: You make do with what's in your fridge, wine rack and DVD collection right now

I've done both, and I tend to find that I spend a lot of time browsing and umm-ing and ah-ing when faced with shelf after shelf of choices when I visit the mall. Too much choice overpowers my feeble mind, and I'm caught like a rabbit in the headlights.

Open up the fridge, though, and there might be ingredients for maybe 3-4 different dishes in it right now. And I have 2 bottles of wine in the rack, both Pinot Noir. (Okay, I do, however, have a truly massive DVD collection, which is why when I decide to have a quiet night in watching a movie, the first half hour can spent choosing which movie.)

Now, not all dishes are born equal, and not all wines are good, and not all movies are enjoyable.

But I generally don't buy food I don't like, or wine I won't drink (and that leaves a very wide field, admittedly), or movies I don't want to watch. So, although my choices are limited at home, they're limited to stuff that I like.

In a way, I have pre-chosen so that, when I'm hungry, or thirsty or in need of mindless entertainment, the limited options on offer are at least limited to stuff that will address those needs adequately.

The trick seems to be to allow just enough choice so that most things are possible. We restrict our solution space to the things we know we often need, just like I restrict my DVD collection to movies I know I'll want to watch again. I'll still want to buy more DVDs, but I don't go to the shop every time I want to watch a movie.

This is the effort we put into establishing standards in the first place (and it's an ongoing process, just like shopping.)

Over the long-term, at our leisure, we limit ourselves to avoid being overwhelmed by choice in the heat of battle, when decisions may need to be made quickly. But we limit ourselves to a good selection of choices - or should I say, a selection of good choices - that will work in most situations. Just as Einstein limited his wardrobe so he could get on with inventing gravity or whatever it was that he did.

Harking back to my crappy DVD library analogy - and I know this is something friends do, too, from what they tell me - I will not watch a particular movie I own on DVD for years, and then it'll be shown on TV, and I will sit there and watch it all the way through, adverts and all, and enjoy it.

This might also have the effect of compartmentalising trying out new solutions (new programming languages, new frameworks, new build tools and so on) - what we might call "R&D" (though all programming is R&D, really) - from solving problems using the selection of available solutions we land upon. This could be a double-edged sword. Making "trying out new stuff" a more explicit activity could have the undesired effect of creating an option for non-technical managers that wasn't there before. Like refactoring, it's probably wise to make this non of their business.

And, sure, from time to time we'll found ourselves facing a problem for which our toolkit is not prepared. And then we'll have to improvise like we always do. Then we can switch into R&D mode. In Extreme Programming, we call these "spikes".

But I can't help feeling that we waste far too much time getting bogged down in choices when we have a perfectly good solution we could apply straight away.

Oftentimes, we just need to pick a solution and then get on with it.

I look forward to your outrage.


January 7, 2015

Do Matchers Really Make Test Code Easier To Understand?

Just time for a quick Thought For The Day about using matchers in unit test assertions.

Have a look at this simple example:



And now again without matchers, the old-fashioned way:



My first question is this: is the version using matchers easier to read than the traditional version? Really?

My second question is: should unit test assertions ever get complex enough to benefit from matchers? (The old "tests should only have one reason to fail" principle.)

And finally, when test code starts to get "smart", does that perhaps indicate sometimes that we've discovered some useful model code and should refactor to move it to the most appropriate class in our implementation?

In which case, does using matchers help or hinder in that respect? I'm not a big fan of having my implementation depend on Hamcrest libraries, especially if the tests do, too.

If I refactored the traditional version like so:



...then is that more readable than the version using matchers? To my eyes it is. And I tend to find that generally. Not always, but often, I have a harder time wrapping my head around assertions written using matchers than traditional assertions.

When assertions get complicated, I can't help feeling that's my test code trying to tell me something. Matchers make it more difficult for me to respond to those clues.

I'm wondering, even, if maybe I could try pairing test code all the way back to only using a simple Boolean assertion, and letting it all "hang out" to see what logic emerges beyond a simple TRUE or FALSE.

What do you think? Do matchers really make test code more readable? Have we tested that assumption? How would we test that assumption? And what impact do matchers have on the "refactorability" of our test code?

There. I've said it. "Burn the unbeliever!" etc.

UPDATE:

So, thoughts are pouring in via the Social Mediums and That Email That They Used To Have. Most defences of matchers centre around the more informative test failure reporting. I can see how that might help, but in practice haven't always found that "more information = better".

@c089 suggested I look at Groovy's Power Assert, which has the purity of simple Boolean assertions with the diagnostic advantages of using matchers. I shall ask Santa for a Java version and a .NET version of this, because it looks jolly useful.

The jury's out on whether assertions using matchers really are easier to read. Not many have defended them on those grounds so far. I would like to put it to the test. (Puts thinking cap on...)



January 6, 2015

Example Random Parameterised Unit Tests With JCheck

Pairing with a client developer yesterday led to some code that you might find interesting.

My pairing partner was new to parameterised testing, and wanted to see how one could refactor and triangulate our way from traditional unit tests with a single input and expected output to a parameterised test with multiple hardcoded inputs and expected outputs, and then on to something that could give us exhaustive coverage with as little code as we could.

Sticking to a very familiar problem, we went through the FizzBuzz kata and eventually ended up with this.



You'll have to excuse my basic knowledge of JCheck, but I think this test fixture will cover pretty much every possible value from 1 to 100.

Notable firstly is that we've generalised our tests, but not directly in the assertions, but rather in the constraints on the generated test input values. This is peculiar to this kind of problem, probably. But we would certainly expect these constraints to correspond to the pre-condition for each test.

Note, too, the duplication we chose to leave in the test code. We could refactor this into an even more general test method, but when we tried the code started to become obfuscated, with a very long test method name describing all of the FizzBuzz rules.

So we rolled back to what I think is a pretty readable pattern for test code: one test per rule.

One more thing I'd like to point out before I'm done; if we look at the actual implementation, we see that the algorithm used is different to logic we used in the test code when we generalised it.



It's pretty unavoidable in randomising test inputs that our test code has to use algorithms to calculate expected results, unless we want to construct tables of every possible permutation.

The danger is that we use the same algorithms as the implementation. I get a much warmer, fuzzier feeling knowing that both our tests and our implementation arrived at matching results by different routes. (And, of course, the ultimate crime is to use the implementation itself to calculate the expected results - that way lies madness!)

MASSIVE PLUG: If this sort of fun and larks interest you, you might want to check out my new Advanced Unit Testing course.




January 4, 2015

Real-world Examples of Polymorphic Testing

A couple of people have asked me to clarify what I wrote in my brain-dump that touched on polymorphic testing on Jan 1st, requesting a couple of real-world examples.

Just to quickly recap, polymorphic testing is writing test code against abstractions and dependency-injecting the object under test so that we can run the same tests against different implementations. It's effectively a tool for testing Liskov Substitution.

I can think of two that I know have had a big impact, both of which are discussed in my new Advanced Unit Testing course.

Firstly, there's device drivers. If you're developing hardware to work with, say, Windows, then you may need to write custom device drivers to enable the software running on the PC to interface with the hardware.

Now, device drivers can do all sorts of things, from graphics and displays, to handling different kinds of inputs (mouse, tracker pad, pen etc), to acting as an audio interface for recording or monitoring sound, to talking to printers, and so on.

But all device drivers - regardless of what they're for - have to behave like good device drivers of the community (to paraphrase something my university landlord used to say to us), and play nice with the operating system and all the other device drivers.

And there are subgroups of device driver specialisation, too - audio, display, disk i/o, USB, Wi-Fi, etc etc.

Windows talks to these drivers through well-defined interfaces, and these abstractions have abstract rules that all device drivers must obey.

To certify your driver and hardware for use with Windows, they've made a suite of automated tests available as part of the Windows Hardware Certification Kit (HCK) (it used to be called the "Windows Logo" kit).

There are general tests that apply to all kinds of device driver, and specialised tests targeted at those subgroups like audio and video.

You will, one hopes, have your own tests that are specific to what your device does (e.g., testStartingOrgasmatronDisablesIncomingCalls() ). But shipping a suite of tests aimed at abstract device drivers helps ensure that, whatever it does, it does it without crashing your computer.

The second example is one I'm very familiar with as a user: Virtual Studio Technology (VST) plug-ins

A VST plug-in is a software library that can be inserted into the digital signal chain of an audio track for use in the creation of audio and music projects. In a digital way, they mimic how recording studios can insert an analog audio processing device into an audio analog signal chain (e.g., adding reverb to a guitar sound by routing the guitar signal through a reverb pedal).

VST plug-ins can do a wide variety of tasks, ranging from the simple reverb unit I mentioned, to simulating entire instruments (e.g., a digital simulation of a Minimoog synthesizer, or a sampled drum machine).

But, kind of like in real recording studios, it only becomes possible to route VST plug-ins together to do useful stuff when they present a standard interface through which the audio signals can flow. In the real world, many audio processing devices like reverbs and delays and guitar amps and synthesizers standardize on 1/4" audio jacks to allow us to route the signal from one device to another.

In software, VST plug-ins must present the standard interface that the recording software that hosts instances of them expect. And, just as with device drivers, there are rules that apply to this abstraction, regardless of what the plug-in specifically does.

When VST plug-ins don't function correctly as plug-ins, the host software is likely to crash. recording software can be notoriously unreliable.

Some VST plug-in developers have published unit tests suites to help other developers write more reliable plug-ins, like this one.

In both of these cases, it has served developer communities and their customers greatly to provide suites of tests that can be run against their abstractions.

Have a look around? What other examples of polymorphic testing can you find?


January 3, 2015

The Software We Create Must Meaningfully Handle All The Inputs It Allows

Another quick brain-dump of ideas from the new Advanced Unit Testing training workshop...

A fundamental principle of software reliability is that the software we create must meaningfully handle all the inputs we allow.

Case in point, the sign-up page of a social network I worked on many moons ago. The boss wanted us to collect the user's email address - quite naturally, since the membership functions wouldn't work without one.

It's "just a simple text field" was the thinking. So imagine her surprise when we identified a bunch of different test cases relating to that one "simple text field".

If it's left empty, for example - which it could be - then that's not a valid email address. If it doesn't follow the syntax of a valid email address - and there are several rules to obey here - then, well, it's not a valid email address. If the email address doesn't exist (perhaps because they typed it in wrong), then - valid syntax or not - it's of no use to us. If the email address does exist, but it's not the user's email address, then ditto. And finally, of course, we might already have someone signed up with that email address.

The boss threw her hands up in the air and proclaimed "you're overcomplicating it!"

But, no, we weren't. We were just identifying the consequences of that "simple text box" on the sign-up page.

Had we left it as "enter what you like, we won't check", then our user database would have been riddled with invalid or blank email addresses, and a whole bunch of functionality we had planned simply wouldn't have worked.

Attempting to write software for a social network where some users don't have email addresses would make the whole thing more complicated. Better to simplify core application logic by restricting user input to that which we can meaningfully handle.

From a unit testing perspective, our main concern here is pre-conditions. If we have functions that only work under certain circumstances - e.g., only if the user has a valid email address - then we must design our code in such a way that we can either guarantee that is always true when that function is invoked, or that checks and handles meaningfully the possibility that it isn't true.

Either way, the behaviour must be complete - that is to say, all allowable inputs are accounted for in our design.

I favour the typically simpler approach of restricting inputs to the more demanding approach of handling a wider set of possible inputs.

In this approach, we need to be especially aware of the boundaries to our software, across which inputs flow.






January 2, 2015

The Unimportant Code That Turned Out To Be Critical

Just time for another sneaky peek at one of the topics covered in the new Advanced Unit Testing course from Codemanship...

Since it's around that time of year when we traditionally tell ghost stories - at least, we did in the days before the Eastenders Xmas omnibus - here's a short scary story about The Unimportant Code That Turned Out To Be Critical.

It was a few years ago now; an IT organisation serving a large business with offices all around the world. Some bright spark had written a really useful library of actuarial functions on a quiet day. There was no project, no budget, no strategy, no requirements, no plan. Everyone was out at the Christmas party and he just thought "Hey, I know what I can do". And he did went and done it.

The library was used on the project he was working on, and they saw that it was good. So it got reused on other projects. A lot.

Before you could say "Enterprise Architecture", these 1,000 lines of very useful code were propping up millions of lines of important business code.

But his library was just a little something he knocked up one afternoon. It had no automated tests. It had no build script. It was not being maintained. It was done with scant regard for how readable the code was, or how simple it was, or how easy it might be to extend or replace it.

From the perspective of developer standards, it had slipped under everyone's radar.

And now there it was, a little kernel of code they didn't give a second thought to that was a foundation stone for almost all of their systems.

Any builder will tell you about the importance of strong foundations.

Much talk there is of dependencies in programming, said Master Yoda. But little or no talk about the importance of reliability in the things we depend on. That, too, has slipped under our radars as a profession.

Here's the thing: when a chunk of code - a method, a class, a component, a system - is heavily depended upon, we need to take extra care that it is proportionately dependable.

If we visualise the call stacks in our software - at any scale - as chains of dependent modules collaborating to achieve some goal, then that chain is only as strong as its weakest link. And if we include one link in many chains, that link needs to be especially strong, because it becomes a single point of failure that, when it does fail, breaks many, many chains.

And so it is that a single bug in the actuarial library broke many, many use cases in multiple systems when it finally jumped out and said "boo!", and the whole house of cards came tumbling down late one Saturday night.

That's why it pays to be aware of the real dependencies in our software, so we can target our efforts to make them dependable enough more accurately.

if you want to get a bit more in-depth about it, read my Dependable Dependencies Principle paper.

And, remember, don't have nightmares.




January 1, 2015

Polymorphic Testing & A Pattern For Backwards Compatibility Checking

Good morning, and Alka Seltzer!

To kick off the new year, I'm thinking about advanced unit testing tools and techniques, which I've been devoting the last couple of months to developing a training workshop on.

In particular, this morning I'm thinking about applications of polymorphic testing, which a client of mine has been experimenting with to help them better ensure backwards compatibility on new releases of their libraries.

I'm very lucky to have been introduced to this early in my career, and having seen many, many teams stung by backwards compatibility gaffs in the libraries they use - and extrapolating the cost of millions of developers having to change their code to make it work again - it surprises me it's not more common.

Polymorphic testing very simply means writing tests that bind to abstractions - essentially, dependency injection where the dependency being injected is the object under test.

There are a number of reasons we might want to do this; e.g. testing that subclasses satisfy the contracts of their super-types (Liskov Substitution), testing that implementations satisfy the contracts implied by their interfaces (e.g., 3rd party device drivers), and so on.

Among the applications of polymorphic testing, though, backwards compatibility is usually overlooked. Indeed, let's face it: backwards compatibility is usually overlooked anyway.

Implementing a polymorphic test using something like JUnit or NUnit is pretty straightforward.



In this example, I've written an abstract test class that allows me to create concrete test subclasses that insert the correct implementation if Animal into the test. e.g., for a Tiger:



Or for a Snake:



Now, a very useful thing framework developers can do - which of course, almost none of them do - is to ship these kinds of polymorphic tests with their interfaces, so when we implement them in our own code (e.g., an JEE interface) we can test them against the base contracts that the framework expects. So, regardless of what your implementation specifically does, we can assure ourselves that it correctly plays the role of a "thing that implements that interface" regardless.

OS developers could have saved themselves a lot of heartache if they'd done this for device drivers decades ago. But that's a different rant for another day...

It ought to be standard practice to write polymorphic tests for abstract types under these sorts of circumstances (i.e., when you're expecting other developers to extend or implement them).

Another thing that ought to be standard practice is to write polymorphic tests - tests where the objects being tested are dependency-injected into the test code - for APIs.

Here there's a bit more that goes on under the hood, but the basic thinking is that it should be possible to build and run one version of the tests against a later version of the implementation code.

There are a number of different ways of doing this, and a short blog post can't really cover the fine detail, but here's a simple pattern to get you started.

In your build script - you have a build script, right? - have an optional target for getting the previous version of the tests from your VCS and building those against the latest version of the implementation code.

Mind truly boggled? It's a bit of slight of hand, really. In your build script, you check out and marge two versions of the code - the latest version of the source code, and the previous release version of the tests (the test code that corresponds to the last release of your software.)

It's for exactly this reason that I tell people to keep their source and test code cleanly separate. Just in case you were wondering.

So, in your backwards compatibility testing build, you check out the latest from SRC, and the last release version from TEST and merge it all into a single project to be built and tested.

If your old test code won't compile against the new source code, then your new code has fallen at the first hurdle.

If it does compile, then you should be able to run the unit tests - but only the unit tests written against the public API. Ongoing refactoring and a preference for using internal and private classes wherever possible should mean that the internals of your source code may have changed for the better. This is fine. Just as long as, to old client code from the outside, no change is visible.

And for that reason, I add an addendum to my "Keep your source and test code separate" mantra; namely, keep API-level tests and internal unit tests separate so we can easily disambiguate and run them separately in a build for just these purposes.

Now, the astute among you will ponder "why do these backwards compatibility tests need to be polymorphic if we're just compiling them against new versions of the code?" Good thinking, Batman!

Hark back to our OO design principles: the Open-Closed Principle (OCP) encourages us to extend our software from one release to the next not by modifying classes, but by extending them. This is generally a good idea, because it assures binary compatibility between releases. But syntactic coupling is only one of our concerns when it comes to backwards compatibility. We must also concern ourselves with semantic coupling, which brings us back to Liskov Substitution. These new subclasses must satisfy the contracts of their super-classes.

And it is the case that well-designed libraries tend to mostly expose abstractions so that, from one version to the next, client code shouldn't need to be recompiled and re-tested at all.

So why not rely on the basic kind of polymorphic testing illustrated above? It's all a question of living in the real world. In the real world, we don't always stick doggedly to the principles of Open-Closed and Liskov Substitution. We might wish to, of course. We might try. But generally we don't.

So, this is as much an aide memoir as anything else. The pattern is that all tests for public types should be polymorphic, and separated from the other unit tests, including implementations of those polymorphic tests, which - as they rely on internal details - really belong with whatever version of the code is current.

I encourage you to make this distinction clear, both in the style of unit test and in the organisation of the test code. It should be possible to build and test these polymorphic API tests against not just the code that was current when they were first written (including test implementations), but against any subsequent versions of the code, and in particular release versions.

Think of it as an extension of Liskov Substitution: an instance of any class can be substituted with an instance of any later versions of that class.

So, in some cases our API-level tests will be testing new implementations of types from previous versions. And in some cases, they will be testing modifications of previous versions. And, from their external perspective, it should be impossible to tell which is which.

Polymorphic testing gifts us the "flex points" to allow that to happen.


So there you have it; polymorphic testing a bit of build script chicanery can allow us to run old tests on new code, just as long as those tests bind strictly to public types.