September 24, 2018

Learn TDD with Codemanship

Why I Throw Away (Most Of) My Customer Tests

There was a period about a decade ago, when BDD frameworks were all new and shiny, when some dev teams experimented with relying entirely on their customer tests. This predictably led to some very slow-running test suites, and an upside-down test pyramid.

It's very important to build a majority of fast-running automated tests to maintain the pace of development. Upside-down test pyramids become a severe bottleneck, slowing down the "metabolism" of delivery.

But it is good to work from precise, executable specifications, too. So I still recommend teams work with their customers to build a shared understanding of what is to be delivered using tools like Cucumber and Fitnesse.

What happens to these customer tests after the software's delivered, though? We've invested time and effort in agreeing them and then automating them. So we shoud keep them, right?

Well, not necessarily. Builders invest a lot of time and effort into erecting scaffolding, but after the house is built, the scaffolding comes down.

The process of test-driving an internal design with fast-running unit tests - by which I mean tests that ask one question and don't involve external dependencies - tends to leave us with the vast majority of our logic tested at that level. That's the base of our testing pyramid, and as it should be.

So I now have customer tests and unit tests asking the same questions. One of them is surplus to requirements for regression testing, and it makes most sense to retain the fastest tests and discard the slowest.

I keep a cherrypicking of customer tests just to check that everything's wired together right in my internal design - maybe a few dozen key happy paths. The rest get archived and quite possibly never run again, or certainly not an a frequent basis. They aren't maintained, because those features or changes have been delivered. Move on.

August 26, 2018

Learn TDD with Codemanship

Yes, Developers Should Learn Ethics. But That's Only Half The Picture.

Given the negative impact that some technology start-ups have had on society, and how prominent that sentiment is in the news these days, it's no surprise that more and more people are suggesting that the people who create this technology develop their sense of humanity and ethics.

I do not deny that many of us in software could use a crash course in things like ethics, philosophy, law and history. Ethics in our industry is a hot potato at the moment.

But I do not believe that it should all be on us. When I look at the people in leadership positions - in governments, in key institutions, and in the boardrooms - who are driving the decisions that are creating the wars, the environmental catastrophes, the growing inequality, and the injustice and oppression that we see daily in the media - it strikes me that the problem isn't that the world is run by scientists or engineers. Society isn't ruled by evidence and logic.

As well as STEM graduates needing a better-developed sense of ethics, I think the world would also be improved if the rest of the population had more effective bullshit detectors. Taking Brexit as a classic example, voters were bombarded with campaign messages that were demonstrably false, and promises that were provably impossible to deliver. Leave won by appealing to voters' feelings about immigration, about globalisation and about Britain's place in the EU. Had more voters checked the facts, I have no doubt the vote would have swung the other way.

Sure, this post-truth world we seem to be living in now was aided and abetted by new technology, and the people who created that technogy should have said "No". But, as far as I can tell, it never even occured to them to ask those kids of questions.

But let's be honest, it wasn't online social media advertising that gifted a marginal victory to the British far-right and installed a demagogue in the White House, any more than WWII was the fault of the printing presses that churned out copy after copy of Mein Kampf. Somebody made a business decision to let those social media campaigns run and take the advertisers' money.

Rightly IMHO, it's turned a spotlight on social media that was long overdue. I do not argue that technology should not require ethics. Quite the reverse.

What I'm saying, I guess, is that a better understanding of the humanities among scientists and engineers is only half the picture. If we think the world's problems will be solved because a coder said "I'm not going to track that cookie, it's unethical" to their bosses, we're going to be terribly disappointed.

August 6, 2018

Learn TDD with Codemanship

Agile Baggage

In the late 1940s, a genuine mystery gripped the world as it rebuilt after WWII. Thousands of eye witnesses - including pilots, police officers, astronomers, and other credible observers - reported seeing flying objects that had performance characteristics far beyond any known natural or artificial phenomenon.

These "flying saucers" - as they became popularly known - were the subject of intense study by military agencies in the US, the UK and many other countries. Very quickly, the extraterrestrial hypothesis - that these objects were spacecraft from another world - caught the public's imagination, and "flying saucer" became synonymous with Little Green Men.

In an attempt to outrun that pop culture baggage, serious studies of these objects adopted the less sensational term "Unidentified Flying Object". But that, too, soon became shorthand for "alien spacecraft". These days, you can't be taken seriously if you study UFOs, because it lumps you in with some very fanciful notions, and some - how shall we say? - rather colorful characters. Scientists don't study UFOs any more. It's not good for the career.

These days, scientific studies of strange lights in the sky - like the Ministry of Defence's Project Condign - use the term Unidentified Aerial Phenomena (UAP) in an attempt to outrun the cultural baggage of "UFOs".

The fact remains, incontravertibly, that every year thousands of witnesses see things in the sky that conform to no known physical phenomena, and we're no closer to understanding what it is they're seeing after 70 years of study. The most recent scientific studies, in the last 3 decades, all conclude that a portion of reported "UAPs" are genuine unknowns, they they are of real defence significance, and worthy of further scientific study. But well-funded studies never seem to materialise, because of the connotation that UFOs = Little Green Men.

The well has been poisoned by people who claim to know the truth about what these objects are, and they'll happily reveal all in their latest book or DVD - just £19.95 from all good stores (buy today and get a free Alien Grey lunch box!) If these people would just 'fess up that, in reality, they don't know what they are, either - or , certainly, they can't prove their theories - the scientific community could get back to trying to find out, like they attempted to in the late 1940s and early 1950s.

Agile Software Development ("agile" for short) is also now dragging a great weight of cultural baggage behind it, much of it generated by a legion of people also out to make a fast buck by claiming to know the "truth" about what makes businesses successful with technology.

Say "agile" today, and most people think you're talking about Scrum (and its scaled variations). The landscape is very different to 2001, when the term was coined at a ski resort in Utah. Today, there are about 20,000 agile coaches in the UK alone. Two thirds of them come from non-technical backgrounds. Like the laypeople who became "UFO researchers", many agile coaches apply a veneer of pseudoscience to what is - in essence - a technical persuit.

The result is an appearance of agility that often lacks the underlying technical discipline to make it work. Things like unit tests, continuous integration, design principles, refactoring: they're every bit as important as user stories and stand-up meetings and burndown charts.

Many of us saw it coming years ago. Call it "frAgile", "Cargo Cult agile", or "WAgile" (Waterfall-Agile) - it was on the cards as soon as we realised Agile Software Development was being hijacked by management consultants.

Post-agilism was an early response: an attempt to get back to "doing what works". Software Craftsmanship was a more defined reaction, reaffirming the need for technical discipline if we're to be genuinely responsive to change. But these, too, accrued their baggage. Software craft today is more of a cult of personality, dominated by a handful of the most vocal proponents of what has become quite a narrow interpretation of the technical disciplines of writing software. Post-agilism devolved into a pseudo-philosophical talking shop, never quite getting down to the practical detail. Their wells, too, have been poisoned.

But teams are still delivering software, and some teams are more successfully delivering software than others. Just as with UFOs, beneath the hype, there's a real phenomenon to be understood. It ain't Scrum and it ain't Lean and it certainly ain't SAFe. But there's undeniably something that's worthy of further study. Agile has real underlying insights to offer - not necessarily the ones written on the Manifesto website, though.

But, to outrun the cultural baggage, what shall we call it now?

August 3, 2018

Learn TDD with Codemanship

Keyhole APIs - Good for Microservices, But Not for Unit Testing

I've been thinking a lot lately about what I call keyhole APIs.

A keyhole API is the simplest API possible, that presents the smallest "surface area" to clients for its complete use. This means there's a single function exposed, which has the smallest number of primitive input parameters - ideally one - and a single, simple output.

To illustrate, I had a crack at TDD-ing a solution to the Mars Rover kata, writing tests that only called a single method on a single public class to manipulate the rover and query the results.

You can read the code on my Github account.

This produces test code that's very loosely coupled to the rover implementation. I could have written test code that invokes multiple methods on multiple implementation classes. This would have made it easier to debug, for sure, because tests would pinpoint the source of errors more closely.

If we're writing microservices, keyhole APIs are - I believe - essential. We have to hide as much of the implementation as possible. Clients need to be as loosely coupled to the microservices they use as possible, including microservices that use other microservices.

I encourage developers to create these keyhole APIs for their components and services more and more these days. Even if they're not going to go down the microservice route, its helpful to partition our code into components that could be turned into microservices easily, shoud the need arise.

Having said all that, I don't recommend unit testing entirely through such an API. I draw a distinction there: unit tests are an internal thing, a sort of grey-box testing. Especially important is the ability to isolate units under test from their external dependencies - e.g., by using mocks or stubs - and this requires the test code to know a little about those dependencies. I deliberately avoided that in my Mars Rover tests, and so ended up with a design where dependencies weren't easily swappable in ths way.

So, in summary: keyhole APIs can be a good thing for our architectures, but keyhole developer tests... not so much.

July 27, 2018

Learn TDD with Codemanship

For Load-Bearing Code, Unleash The Power of Third-Generation Testing

As software "eats the world", and people rely more and more on the code we write, there's a strong case for making that code more reliable.

In popular products and services, code may get executed millions or even billions of times a day. In the face of such traffic, the much vaunted "5 nines" reliability (99.999%) just doesn't cut the mustard. Our current mainstream testing practices are arguably not up to the job where our load-bearing code's concerned.

And, yes, when I say "current mainstream practices", I'm including TDD in that. I may test-drive, say, a graph search algorithm in a dozen or so test cases, but put that code in a SatNav system and ship it 1 million cars, and suddenly a dozen tests doesn't fill me with confidence.

Whenever I raise this issue, most developers push back. "None of our code is that critical", they argue. I would suggest that's true of most of their code. But even in pretty run-of-the-mill applications, there's usually a small percentage of code that really needs to not fail. For that code, we should consider going further with our tests.

The first generation of software testing involved running the program and seeing what happens when we enter certain inputs or click certain buttons. We found this to be time-consuming. It created severe bottlenecks in our dev processes. Code needs to be re-tested every time we change it, and manual testing just takes far too long.

So we learned to write code to test our code. The second generation of software testing automated test execution, and removed the bottlenecks. This, for the majority of teams, is the state of the art.

But there are always the test cases we didn't think of. Current practice today is to perform ongoing exploratory testing, to seek out the inputs, paths, user journeys and combinations our test suites miss. This is done manually by test professionals. When they find a failing test we didn't think of, we add it to our automated suite.

But, being manual, it's slow and expensive and doesn't achieve the kind of coverage needed to go beyond the Five 9's.

Which brings me to the Third Generation of Software Testing: writing code to generate the test cases themselves. By automating exploratory testing, teams are able achieve mind-boggling levels of coverage relatively cheaply.

To illustrate, here's a parameterised unit test I wrote when test-driving an algorithm to calculate square roots:

Imagine this is going to be integrated into a flight control system. Those five tests don't give me a warm fuzzy feeling about stepping on any plane using this code.

Now, I feel I need to draw attention to this: unit test fixtures are just classes and unit tests are just methods. They can be reused. We can compose new fixtures and new tests out of them.

So I can write a new parameterised test that, for example, generates a large number of random inputs - all unique - using a library called JCheck (a Java port of the Haskell QuickCheck library).

Don't worry too much about how this works. The important thing to note is that JCheck generates 1,000 unique random inputs. So, with a few extra lines of code we're jumped from 5 test cases to 1,000 test cases.

And with a single extra character, we can leap up a further order of magnitude by simply adding a zero to the number of cases. Or two zeros for 100x more coverage. Or three, or four. Whatever we need. This illustrates the potential power of this kind of technique: we can cover massive state spaces with relatively little extra code.

(And, for those of you thinking "Yeah, but I bet it takes hours to run" - when I ran this for 1 million test cases, it took just over 10 seconds.)

The eagle-eyed among you wil have noticed that I didn't reuse the exact same MathsTest fixture listed above. When test inputs are being generated, we don't have 1,000,000 expected results. We have to generalise our assertions. I adapted the original test into a property-based test, asserting a general property that every correct square root has to have.

Our property-based test can be reused in other ways. This test, for example, generates a range of inputs from 1 to 10 at increments of 0.01.

Again, adding coverage is cheap. Maybe we want to test from 1 to 10000 at increments of 0.001? Easy as peas.

(Yes, these tests take quite a while to run - but that's down to the way JUnit handles parameterised tests, and could be optimised.)

Let's consider a different example. Imagine we have a design with a selection of UI's (Web, Android, iOS, Windows), a selection of local languages (English, French, Chinese, Spanish, Italian, German), and a selection of output formats (Excel, HTML, XML, JSON) and we want to test that every possible combination of UI, language and output works.

There are 96 possible combinations. We could write 96 tests. Or we could generate all the possible combinations with a relatively straightforward bit of code like the Combiner I knocked up in a few hours for larks.

If we added another language (e.g., Polish), we'd go from 96 combinations to 112. It's hopefully easy to see how much easier it could be to evolve the design when the test cases are generated in this way, without dropping below 100% coverage. And, yes, we could take things even further and use reflection to generate the input arrays, so our tests always keep pace with the design without having to change the test code at all. There are many, many possibilities for this kind of testing.

To repeat, I'm not suggesting we'd do this for all our code - just for the code that really has to work.

Food for thought?

July 5, 2018

Learn TDD with Codemanship

The Grand Follies of Software Development

Just time for a few thoughts on software deveopment's grand follies - things many teams chase that tend to make things worse.

Scale - on and on and on we go about scaling up or scaling out our software systems to handle millions of users and tens of thousands of requests every second. By optimising our architectures to work on Facebook scale, or Netflix scale, we potentially waste a lot of time and money and opportunities to get a product out there by doing something much simpler. The bottom line is that almost all software will never need to work on that scale, just like almost every person will never need a place to moor their $120 million yacht. If you're ever lucky enough to have that problem, good for you! Facebook and the others solved their scaling problems when they needed to, and they had the resources to do it because of their enormous scale.

Likewise the trend for scaling up software development itself. Organisations that set out to build large products - millions or tens of millions of lines of code - are going about it fundamentally arse-backwards. If you look at big software products today, they typically started out as small software products. Sure, MS Word today is over 10M LOC, but Word 1.0 was tens of thousands of lines of code. That original small team created something useful that became very popular, and it grew incrementally over time. Nature handles complexity very well, where design is concerned. It doesn't arrive at something like the human brain in a single step. Like Facebook and their scaling problems, Microsoft crossed that bridge when they got to it, by which time they had the money to crack it. And it takes a lot of money to create a new version of Word. There's no economy of scale, and at the scale they do it now, very little latitude for genuine innovation. Microsoft's big experiments these days are relatively small, like they always had to be. Focus on solving the problems you have now.

That can be underpinned by a belief that some software systems are irreducibly complex - that a Word processor would be unusable without the hundreds of features of MS Word. Big complex software, in reality, starts as small simple software and grows. Unless, of course, we set out to reproduce software that has become big and complex. Which is fine, if that's your business model. But you're going to need a tonne of cash, and there are no guarantees yours will fare better in the market. So it's one heck of a gamble. Typically, such efforts are funded by businesses (or governments) with enormous resources, and they usually fail spectacularly. Occasionally we hear about them, but a keenness to manage their brand means most get swept under the carpet - which might explain why organisations continue to attempt them.

Reuse - oh, this was a big deal in the 90s and early noughties. I came across project after project attempting to build reusable components and services that the rest of the organisation could stitch together to create working business solutions. Such efforts suffered from spectacular levels of speculative generality, trying to solve ALL THE PROBLEMS and satisfy such a wide range of use cases that the resulting complexity simply ran away from them. We eventually - well, some of us, anyway - learned that it's better to start by building something useful. Reuse happens organically and opportunistically. The best libraries and frameworks are discovered lurking in the duplication inside and across code bases.

"Waste" - certain fashionable management practices focus on reducing or eliminating waste from the software development process. Which is fine if we're talking about building every developer their own office complex, but potentially damaging f we're talking abut eliminating the "waste" of failed experiments. That can stifle innovation and lead - ironically - to the much greater waste of missed opportunities. Software's a gamble. You're gonna burn a lot of pancakes. Get used to it, and embrace throwing those burned pancakes away.

Predictability - alongside the management trend for "scaling up" the process of innovation comes the desire to eliminate the risks from it. This, too, is an oxymoron: innovation is inherently risky. The bigger the innovation, the greater the risk. But it's always been hard to get funding for risky ventures. Which is why we tend to find that the ideas that end up being greenlit by businesses are typically not very innovative. This is because we're still placing big bets at the crap table of software development, and losing is not an option. Instead of trying to reduce or eliminate risk, businesses should be reducing the size of their bets and placing more of them - a lot more. This is intimately tied to our mad desire to do everything at "enterprise scale". It's much easier to innovate with lots of small, independent teams trying lots of small-scale experiments and rapidly iterating their ideas. Iterating is the key to this process. So much of management theory in software development is about trying to get it right first time, even today. It's actually much easier and quicker and cheaper to get it progressively less wrong. And, yes, like natural evolution, there will be dead ends. The trick is to avoid falling to the Sunk Cost fallacy of having invested so much time and money in that dead end that you feel compelled to persist.

"Quick'n'dirty" - I shouldn't need to elaborate on this. It's one of the few facts we can rely on in software development. In the vast majority of cases, development teams would deliver sooner if they took more care. and yet, still, we fall for it. Start-ups especially have this mindset ("move fast and break things"). Noted that over time, the most successful tech start-ups tend to abandon this mentality. And, yes, I am suggesting that this way of thinking is a sign of a dev organisation's immaturity. There. I've said it.

July 2, 2018

Learn TDD with Codemanship

Level 4 Agile Maturity

I recently bought new carpets for my home, and the process of getting a quote was very interesting. First, I booked an appointment online for someone to come round and measure up. This appointment took about an hour, and much of that time was spent entering measurements into a software application that created a 2D model of the rooms.

Then I visited a local-ish store - this was a big national chain - and discussed choices and options and prices. This took about an hour and a half, most of which was spent with the sales adviser reading the measurements off a print-out of the original data set and typing them into a sales application to generate a quote.

There were only 3 sales people on the shop floor, and it struck me that all this time spent re-entering data that someone had already entered into a software application was time not spent serving customers. How many sales, I wondered, might be lost because there were no sales people free to serve? We discussed this, and the sales advisor agreed that this system very probably cost sales: and lots of them. (Only the previous week I had visited the local, local shop for this chain, and walked out because nobody was free to serve me.)

With more time and research, we might have been able to put a rough figure on potential sales lost during this data re-entering activity for the entire chain (400 stores).

As a software developer, this problem struck me immediately. It had never really occurred to the sales advisor before, he told me. We probably all have stories like this. I can think of many times during my 25-year career where I've noticed a problem that a piece of software might be able to solve. We tend to have that problem-solving mindset. We just can't help ourselves.

And this all reminded me of a revelation I had maybe 16 years ago, working on a dev team who had temporarily lost its project manager and requirements analyst, and had nobody telling us what to build. So we went to the business and asked "How can we help?"

It turned out there was a major, major problem that was IT-related, and we learned that the IT departmet had steadfastly ignored their pleas to try and solve it for years. So we said "Okay, we'll have a crack at it."

We had many meetings with key business stakeholders, which led to us identifying roughly what the problem was and creating a Balanced Scorecard of business goals that we'd work directly towards.

We shadowed end users who worked in the processes that we needed to improve to see what they did and think about how IT could make it easier. Then we iteratively and incrementally reworked existing IT systems specifically to achieve those improvements.

For several months, it worked like a dream. Our business customers were very happy with the progress we were making. They'd never had a relationship with an IT team like this before. It was a revelation to them and to us.

But IT management did not like it. Not one bit. We weren't following a plan. They wanted to bring us back to heel, to get project management in place to tell us what to do, and to get back to the original plan of REPLACING ALL THE THINGS.

But for 4 shiny happy months I experienced a different kind of software development. Like Malcom McDowell in Star Trek Generations, I experienced the bliss of the Nexus and would now do pretty much anything to get back there.

So, ever since, I've encouraged dev teams to take charge of their destinies in this way. To me, it's a higher level of requirements maturity. We progress from:

1. Executing a plan, to
2. Building a product, to
3. Solving real problems people bring to us, to
4. Going out there and pro-actively seeking problems we could solve

We evolve from being told "do this" to being told "build this" to being told "solve this" to eventually not being told at all. We start as passive executors of plans and builders of features to being active engaged stakeholders in the business, instigating the work we do in response to business needs and opportunities that we find or create.

For me, this is the partnership that so many dev teams aspire to, but can never reach because management won't let them. Just like, ultimately, they woudn't let us in that particular situation.

But I remain convinced it's the next step in the evolution of software development: one up from Agile. It is inevitable*.

*...that we will pretend to do it for certifications while the project office continues to be the monkey on our backs

June 29, 2018

Learn TDD with Codemanship

.NET Code Analysis using NDepend

It's been a while since I used it in anger, but I've been having fun this week reaquainting myself with NDepend, the .NET code analysis tool.

Those of us who are interested on automating code reviews for Continuous Inspection have a range of options for .NET - ranging from tools built on the .NET Cecil decompiler - e.g., FxCop - to compiler-integrated tools on the Roslyn platform.

Ou of all them, I find NDepend to be by far the most mature. It's code model is much more expressive and intuitive (oh, the hours I've spent trying to map IL op codes on to source code!), it integrates out of the box with a range of popular build and reporting tools like VSTS, TeamCity, Excel and SonarQube. And in general, I find I'm up and running with a suite of usable quality gates much, much faster.

Under the covers, I believe we're still in Cecil/IL territory, but all the hard work's been done for us.

Creating analysis projects in NDepend is pretty straightforward. You can either select a set of .NET assemblies to be analysed, or a Visual Studio project or solution. It's very backwards-compatible, working with solutions as far back as VS 2005 (which, for training purposes, I still use occasionally).

I let it have a crack at the files for the Codemanship refactoring workshop, which are deliberately riddled with tasty code smells. My goal was to see how easy it would be to use NDepend to automatically detect the smells.

It found all the solution's assemblies, and crunched through them - building a code model and generating a report - in about a minute. When it's done, it opens a dashboard view which summarises the results of the analysis.

There's a lot going on in NDepend's UI, and this would be a very long blog post if I explored it all. But my goal was to use NDepend to detect the code smells in these projects, so I've focused on the features I'd use to do that.

First of all, out of the box with the code rules that come with NDepend, it has not detected any of the smells I'm interested in. This is tyicaly of any code analysis tool: the rules are not your rules. They're someone else's interpretation of code quality. FxCop's developers, for example, evidently have a way higher tolerance for complexity than I do.

The value in these tools is not in what they do out of the box, but in what you can make them do with a bit of thought. And for .NET, NDepend excels at this.

In the dialog at the bottom of the NDepend window, we can explore the code rules that it comes with and see how they've been implemented using NDepend's code model and some LINQ.

I'm interested in methods with too many parameters, so I clicked on that rule to bring up its implementation.

I happen to think that 5 parameters is too many, so could easily change the threshold where this rule is triggered in the LINQ. When I did, the results list immediately updated, showing the methods in my solution that have too many parameters.

This matches my expectation, and the instant feedback is very useful when creating custom quality gates - really speeds up the learning process.

To view the offending code, I just had to double click on that method in the results list, and NDepend opened it in Visual Studio. (You can use NDepend from within Visual Studio, too, if you want a more seamless experience.)

The interactive and integrated nature of NDepend makes it a useful tool to have in code reviews. I've always found going through the code inspecting source files by eye looking for issues hard work and really rather time-consuming. Being able to search for them interatively like this can help a lot.

Of course, we don't just want to look for code smells in code reviews - that's closing the stable door after the horse has bolted a lot of the time. It's quite fashionable now for dev teams to include code reviews as part of their check-in process - the dreaded Pull Request. It makes sense, as a last line of defence, to try to prevent issus being checked into the code respository. What I'm seeing more and more, though, is that pull requests can become a bottleneck for the team. Like any manual testing, it slows us down and hampers Continuous Delivery.

The command-line version of NDepend can easily be integrated into your build pipeline, allowing for some pretty comprehensive code reviews that can be performed automatically (and therefore quickly, alleviating the bottleneck).

I decided to turn this code rule into a quality gate that coud be used in a build, and set a policy that it should fail the build if more than 5 examples of long parameter lists are found.

So, up and running with a simple quality gate in no time. But what about more complex code smells, like message chains and feature envy? In the next blog post I'll go deeper into NDepend's Code Query Language and explore the kinds of queries we can create with more thought.

June 27, 2018

Learn TDD with Codemanship

Team Craft

We're a funny old lot, software developers.

90% of us are working on legacy code 90% of the time, and yet I can only think of one book about working with legacy code that's been published in the last 20 years.

We spend between 50%-80% of our time reading code, and yet I can only think of a couple of books about writing code that's easier to understand that have ever been published.

We have a problem with our priorities, it would seem. And maybe none more so than in the tiny amount of focus we place on how we work together as teams to get shit done.

Our ability to work together, to communicate, to coordinate, to build shared undersanding and reach shared decisions and to make stuff happen - I call it Team Craft - rarely gets an airing in books, training courses and conferences.

In my TDD workshop, we play a little game called Evil FizzBuzz. If you've applied for a developer job in recent years, you may well have been asked to do the FizzBuzz coding exercise. It's a trivial problem - output a list of integers from 1 to 100, replace any that are divisible by 3 with "Fizz", any that are divisible by 5 with "Buzz", and any that are divisible by 3 and 5 with "FizzBuzz". Simple as peas.

I made it "evil" by splitting the rules up and requiring that individual pairs only work on code for their rule. (e.g., they can only work on generating a sequence from 1..100, or only on replacing numbers with Fizz, or Buzz etc).

They must coordinate their efforts to produce a single unified solution that passes my customer acceptance test - a complete comma-delimited sequence of the required length, with the numbers, the Fizzes, the Buzzes and FizzBuzzes in the right place. This is an exercise - superficially - in Continuous Integration. But, it turns out, it exercises far more than that.

An average developer can complete FizzBuzz in less than 30 minutes. An average team can't complete it in under an hour. No, seriously. 9 out of 10 teams who attempt it don't complete it. Go figure!

Watching teams attempt Evil FizzBuzz is fascinating. The first observation I've made - from dozens of teams who've tried it - is that the individual technical skills of the developers on the team appears to have little bearing on how they'll fare.

FizzBuzz is easy. It doesn't require strong Code Fu. And yet, somehow, it defeats 90% of teams. There must be something else at play here; some other skillset outside of coding and unit testing and refactoring and Git and wotnot that determines how a team will perform.

Over the years since it was introduced, I've developed an instinct for which teams will crack it. I can usually tell within the first 10 minutes if they're going to complete Evil FizzBuzz within the hour, just by looking at the way they interact.

Here are the most typical kinds of rocks I've seen teams' ships dashed on trying to complete Evil FizzBuzz.

1. Indecision - 45 minutes in and the team is still debating options. Should we do it in Java or JavaScript? Jenkins or TeamCity? NUnit or Making affirmative decisions as a group is a hard skill. But it can be learned. There are various models for group decision making - from a show of hands to time-boxed A/B experiments to flipping a coin. I maintain that the essence of agility is that ability to make effective decisions quickly and cheaply and move on.

2. Priorities - the team spends 30 minutes discussing the design, and then someone starts to think about setting up the GitHub repository and a CI server.

3. Forgetting They're In a Team - I see this one a lot. For example, someone sets up a repository, then forgets to invite the rest of the team to contribute to it. Or - and this is my favourite - someone writes their code in a totally different set of project files, only realising too late that their bit isn't included in the end product. To coordinate efforts in such a small solution space, developers need to be hyper-aware of what the rest of the team are doing.

4. Trying To Win The Argument Instead Of The Game - as with 1-3, this is also very common on development teams. We get bogged down in trying to "win" the debate about what language we should use or whether we should use the Chain of Responsibility design pattern or go for tabs or spaces, and completely lose sight of what we're setting to achieve in the first place. This effect seems to escalate the more technically strong individuals on the team are. Teams of very senior developers or software architects tend to crash and burn more frequently than teams of average developers. We've kind of made this rod for our own backs, as a profession. Career advancement tends to rely more on winning arguments than achieving business goals. Sadly, life's like that. Just look at the people who end up in boardrooms or in government: prepared for leadership in the debating societies of our top schools and colleges. Organisations where that isn't part of the culture tend to do much better at Evil FizzBuzz.

5. All Talk, No Code, No Pictures - the more successful teams get around a whiteboard and visualise what they're going to do. They build a better shared understanding, sooner. The teams who stand around in a circle talking about it invariably end up with every pair walking away with a different understanding, leading to the inevitable car crash at the end. It's especially important for each pair to understand how their part fits in with the whole. The teams that do best tend to agree quickly on how the parts will interact. I've known this for years: the key to scaling up development is figuring out the contracts early. Use of stubs and mocks can help turn this into an explicit executable understanding. Also, plugging their laptops into the projector and demonstrating what they intend is always an option - but one that few teams take up. To date, no team has figured out that Mob Programming is allowed by the rules of the exercise, but a couple of teams came close in their use of the available technology in the room.

6. Focus On Plans, Not Goals - It all seems to be on track; with 5 minutes to go the team are merging their respective parts, only to discover at the very last minute that they haven't solved the problem I set them. Because they weren't setting out to. They came up with a plan, and focused on executing that plan. The teams that crack it tend to revisit the goals continually throughout the exercise. Does this work? Does this work? Does this work? Equally, teams who get 30 minutes in and don't realise they've used 50% of their time show a lack of focus on getting the job done. I announce the time throughout, to try and make them aware. But I suspect often - when they've got their heads down coding and are buried in the plan - they don't hear me. The teams who set themselves milestones - e.g. by 20 minutes we should have a GitHub repository with everyone contributing and a CI server showing a green build so we can start pushing - tend to do especially well.

From long experience on real teams, I've observed relationships between these elements of Team Craft. Teams that lack clear objectives tend to consume themselves with internal debate and "pissing contests". It also tends to make prioritising nigh-on impossible. Tabs vs spaces matters a lot more when you think you have infinite time to debate it. Lack of visualisation of what we're going to do - or attempt to do - tends to lead to less awareness of the team, and less effective coordination. And all of these factors combined tend to lead to an inability to make shared decisions when they're needed.

But before you conclude from this that the individual technical skills don't matter, I need to tell you about the final rule of Evil FizzBuzz: once the build goes green for the first time, it must not go red again. Breaking the build means disqualification. (Hey, it's an exercise in Continuous Integration...)

A few teams get dashed on those rocks, and the lesson from that is that technical discipline does matter. How we work together as teams is crucial, but potentially all for nought if we don't take good care of the fundamentals.

June 21, 2018

Learn TDD with Codemanship

Adopting TDD - The Codemanship Roadmap

I've been doing Test-Driven Development for 20 years, and helping dev teams to do it for almost as long. Over that time I've seen thousands of developers and hundreds of teams try to adopt this crucial enabling practice. So I've built a pretty clear picture of what works and what doesn't when you're adopting TDD.

TDD has a steep learning curve. It fundamentally changes the way you approach code, putting the "what" before the "how" and making us work backwards from the question. The most experienced developers, with years of test-after, find it especially difficult to rewrite their internal code to make it comfortable. It's like learning to write with your other hand.

I've seen teams charge at the edifice of this learning curve, trying to test-drive everything from Day #1. That rarely works. Productivity nosedives, and TDD gets jettisoned at the next urgent deadline.

The way to climb this mountain is to ascend via a much shallower route, with a more gentle and realistic gradient. You will most probably not be test-driving all your code in the first week. Or the first month. typically, I find it takes 4-6 months for teams to get the hang of TDD, with regular practice.

So, I have a recommended Codemanship Route To TDD which has worked for many individuals and teams over the last decade.

Week #1: For teams, an orientaton in TDD is a really good idea. It kickstarts the process, and gets everyone talking about TDD in practical detail. My 3-day TDD workshop is designed specifically with this in mind. It shortcuts a lot of conversations, clears up a bunch of misconceptions, and puts a rocket under the team's ambitions to succeed with TDD.

Week #2-#6: Find a couple of hours a week, or 20 minutes a day, to do simple TDD "katas", and focus on the basic Red-Green-Refactor cycle, doing as many micro-iterations as you can to reinforce the habits

Week #7-#11: Progress onto TDD-ing real code for 1 day a week. This could be production code you're working on, or a side project. The goal for that day is to focus on doing it right. The other 4 days of the week, you can focus on getting stuff done. So, overall, your productivity maybe only dips a bit each week. As you gain confidence, widen this "doing it right" time.

Week #12-#16: By this time, you should find TDD more comfortable, and don't struggle to remember what you're supposed to do and when. Your mind is freed up to focus on solving the problem, and TDD is becoming your default way of working. You'll be no less productive TDD-ing than you were befpre (maybe even more productive), and the code you produce will be more reliable and easier to change.

The Team Dojo: Some teams are keen to put their new TDD skills to the test. An exercise I've seen work well for this is my Team Dojo. It's a sufficiently challenging problem, and really works on those individual skills as well as collaborative skills. Afterwards, you can have a retrospective on how the team did, examining their progress (customer tests passed), code quality and the disciplie that was applied to it. Even in the most experienced experienced teams, the doj will reveal gaps that need addressing.

Graduation: TDD is hard. Learning to test-drive code involves all sorts of dev skills, and teams that succeed tell me they feel a real sense of achievement. It can be good to celebrate that achievement. Whether it's a party, or a little ceremony or presentation, when organisations celebrate the achievement with their dev teams, it shows reall commitment to them and to their craft.

Of course, you don't have to do it my way. What's important is that you start slow and burn your pancakes away from the spotlight of real projects with real deadlines. Give yourself the space and the safety to get it wrong, and over time you'll get it less and less wrong.

If you want to talk about adopting TDD on your team, drop me a line.