April 25, 2017
20 Dev Metrics - 8. Test AssuranceNumber 8 in my Twitter series 20 Dev Metrics is a very important one, when it comes to minimising the cost of changing code - Test Assurance.
If there's one established fact about software development, it's that the sooner we catch bugs the cheaper they are to fix. When we're changing code, the sooner we find out if we broke it, the better. This can have such a profound effect on the cost of changing software that Michael Feathers, in his book 'Working Effectively with Legacy Code' defines 'legacy code' as code for which we have no automated tests.
The previous metric, Changes per Test Run give us a feel for how frequently we're running our tests - an often-forgotten factor in test assurance - but it can't tell us how effective our tests would be at catching bugs.
We want an answer to the question: "If this code was broken, would we know?" Would one or more tests fail, alerting us to the problem. The probability that our tests will catch bugs is known as "test assurance".
Opinions differ on how best to measure test assurance. Probably because it's easier to measure, a lot of teams track code coverage of tests.
For sure, code that isn't covered by tests isn't being tested at all. If only half your code's covered, then the probability of bugs being caught in the half that isn't is definitely zero.
But just because code is executed in a test, that doesn't necessarily mean it's being tested. Arguably a more meaningful measure of assurance can be calculated by doing mutation testing - deliberately introducing errors into lines of code and then seeing if any tests fail.
Mutation testing's a bit like committing burglaries to measure how effective the local police are at detecting crimes. Be careful with automated mutation testing tools, though; they can throw up false positives (e.g., when a convergent iterative loop just takes a bit longer to fund the right answer after changing its seed value). But most of these tools are configurable, so you can learn what mutations work and what mutations tend to throw up false positives and adapt your configuration.
April 24, 2017
20 Dev Metrics - 7. Changes per Test RunDay seven in my Twitter series 20 Dev Metrics, and we turn our attention to frequency of testing. The economics of software development are very clear that the sooner we catch problems, the cheaper they're fixed. Therefore, the sooner after we make a change to the software, the sooner we should test it.
A good metric would be "Mean Time to Re-testing", but this is a difficult one to collect (you'd essentially need to write custom plug-ins for your IDE and test runner and be constantly monitoring).
A decent compromise is Changes per Test Run, which gives us a rough idea of how much testing we do in a given period vs. how much code changes. A team attempting to do Waterfall delivery would have a very high ratio because they leave testing until very late and do it manually maybe just several times. A team that runs automated tests overnight would have a lower ratio. And a team doing TDD would have the lowest ratio.
To calculate this metric, we need to take the measure for code churn we used in a previous dev metric (tools are available for Git, Subversion, CVS etc to collect this), and tally each time we execute regression tests - be it manual testing or automated. If it's automated, and using an open source testing tool, we can usually adapt the test runner to keep score. A customer JUnit runner, for example, could ping a metrics data server on the network asynchronously, which just keeps a running total for that day/week/month.
That passes no comment at all, of course, on how effective those tests are at catching bugs, which we'll leave for another day.
April 23, 2017
The Win-Win-Win of Clean CodeA conversation I had with a development team last week has inspired me to write a short post about the Win-Win-Win that Clean Code can offer us.
Code that is easier to understand, made of simpler parts, low in duplication and supported by good, fast-running automated tests tends to be easier to change and cheaper to evolve.
Code that is easier to understand, made of simpler parts, low in duplication and supported by good, fast-running automated tests also tends to be more reliable.
And code that is easier to understand, made of simpler parts, low in duplication and supported by good, fast-running automated tests - it turns out - tends to require less effort to get working.
It's a tried-and-tested marketing tagline for many products in software development - better, faster, cheaper. But in the case of Clean Code, it's actually true.
It's politically expedient to talk in terms of "trade-offs" when discussing code quality. But, in reality, show me the team who made their code too good. With very few niche exceptions - e.g., safety-critical code - teams discover that when they take more care over code quality, they don't pay a penalty for it in terms of productivity.
Unless, of course, they're new to the practices and techniques that improve code quality, like unit testing, TDD, refactoring, and all that lovely stuff. Then they have to sacrifice some productivity to the learning curve.
Good. I'm glad we had this little chat.
20 Dev Metrics - 6. Complexity (Likelihood of Failure)The sixth in my Twitter series 20 Dev Metrics has proven, over the decades, to be a reasonably good predictor of which code is most likely to have bugs - Complexity.
Code, like all machines, has a greater probability of going wrong when there are more things in it that could be wrong - more ways of being wrong. It really is as straightforward as that.
There are different ways of measuring code complexity, and they all have their merits. Size is an obvious one. Two lines of code tends to have twice as many ways of being wrong. It's not a linear relationship, though. Four lines of code isn't twice as likely as two lines of code to be buggy. It could be twice as likely again, depending on the extent to which lines of code interact with each other. To illustrate with a metaphor; four people are much more than twice as likely as two people to have a disagreement. The likelihood of failure grows exponentially with code size.
Another popular measure is cyclomatic complexity. This tells us how many unique paths through a body of code exist, and therefore how many tests it might take to cover every path.
Less popular, but still useful, is Halstead Volume, which was used back in the CMMi days as a predictor of the maintenance cost of code. It's a bit more sophisticated than measuring lines of code, calculating the size of 'vocabulary' that the code uses, so a line of code that employs more variables - and therefore more ways of being wrong - would have a greater Halstead Volume.
All of these metrics are easily calculated, and many tools are available to collect them. They make most sense at the method level, since that's the smallest unit of potential test automation. But complexity aggregated at the class level can indicate classes that do too much and need splitting up.
April 22, 2017
20 Dev Metrics - 5. Impact of FailureThe fifth in my Twitter series 20 Dev Metrics builds on a previous metric, Feature Usage, to estimate the potential impact on end users when a piece of code fails.
Impact of Failure can be calculated by determining the critical path through the code (the call stack, basically) in a usage scenario and then assigning the value of Feature Usage to each method (or block of code, if you want to get down to that level of detail).
We do this for each usage scenario or use case, and add Feature Usage to methods every time they're in the critical path for the current scenario.
So, if a method is invoked in many high-traffic scenarios, our score for Impact of Failure will be very high.
What you'll get at the end is a kind of "heat map" of your code, showing which parts of your code could do the most damage if they broke. This can help you to target testing and other quality assurance activities at the highest-value code more effectively.
This is, of course, a non-trivial metric to collect. You'll need a way to record what methods get invoked when, say, you run your customer tests. And each customer test will need an associated value for Feature Usage. Ideally, this would be a feature of customer testing tools. But for now, you'll have some DIY tooling to do, using whatever instrumentation and/or meta-programming facilities your language has available. You could also use a code coverage reporting tool, generating reports one customer test at a time to see which code was executed in that scenario.
In the next metric, we'll look at another factor in code risk that we can use to help us really pinpoint QA efforts.
April 21, 2017
20 Dev Metrics - 4. Feature UsageThe next few metrics in my 20 Dev Metrics Twitter series are going to help us understanding which parts of our code present the greatest risk for failure.
One metric that I'm always amazed to learn teams aren't collecting is Feature Usage. For a variety of reasons, it's useful to know which features of your software are used all the time and which features are used rarely (if ever).
When it comes to the quality of our code, feature usage is important because it guides our efforts towards things that matter more. It's one component of risk of failure that we need to know in order to effectively gauge how reliable any line of code (or method, or module, or component/service) needs to be. (We'll look at other components soon.)
Keeping logs of which features are being used is relatively straightforward; for a web application, your web server's logs might reveal that information (if the action being performed is revealed in the URL somehow). If you can't get them for free like that, though, adding usage logging needs some thought.
What you definitely don't want to do is add logging code willy-nilly all across your code. That way lies madness. Look for ways to encapsulate usage logging in a single part of the code. For example, for a desktop application, using the Command pattern offers an opportunity to start with a Command base class or template class that logs an action, then invokes the helper method that does the actual work. Logs can be stored in memory during the user's session and occasionally sent as a small batch to a shared data store if performance is a problem.
For a web application or web service, the Front Controller pattern can be used to implement logging before any work is done. (e.g., for an ASP.NET application, a custom logging HttpModule.) Whatever the architecture, there's usually a way.
And feature usage is good to know for other reasons, of course. Marketing and product management would be interested, for a start. We're all guessing at what will have value. As soon as we deliver working software that people are using, it's time to test our theories.
April 20, 2017
Still Time to Grab Your TDD 2.0 TicketsJust a quick reminder about my upcoming Codemanship TDD training workshop in London on May 10-13. It's quite possibly the most hands-on TDD training out there, and great value at half the price of competing TDD courses.
20 Dev Metrics - 3. Dev Effort BreakdownThe third in my Twitter series 20 Dev Metrics is Development Effort Breakdown.
I've come across many, many organisations who have software and systems that, with each release, require more and more time fixing bugs instead of adding or changing features. Things can get so bad that some products end up with almost all the available time being spent on fixes, and users having to wait months or years for requested changes - if they ever come at all.
This can get very expensive, too. A company who made a call centre management product found themselves needing multiple teams maintaining multiple versions of the product that were in use, all devoting most of their time just to bug fixes.
I'm suggesting this metric instead of the classic "defects per thousand lines of code", because what matters most about bugs - after the trouble they cause end users - is how they soak up developer time, leaving less time to add value to the product.
To track this metric, teams need to record roughly how much time they've spent completing work (with the usual caveat about definitions of "done"). If you're using story cards, a simple system is for the developers to take a moment to record how many person-days/hours were spent on it on the card itself. Your project admin can then tot up the numbers in a spreadsheet.
Remember to include bugs reported in testing as well as production. If someone reported it, and someone had to fix it, then it counts.
April 19, 2017
20 Dev Metrics - 2. Cost of Changing CodeThe second in my Twitter series 20 Dev Metrics is the Cost of Changing Code. This is a brute force metric - anything involving measuring lines of code tends to be - so handle with care. The temptation will be for teams to game this, and it's very easily gamed. As with all metrics, cost of changing code should be used as a diagnostic tool, not as weaponised as a target or as a stick to beat dev teams with.
NB: My advice is to collect these metrics under the radar, and use them to provide hindsight when establishing a link between the way you write code and the results the business gets.
The aim is to establish the trend for cost of change, so we can link it to our first metric, Lead Time to Delivery. There are two things we need to track in order to calculate this simple metric.
1. How much code is changing (lines added, modified and deleted) - often called "code churn". There are tools available for various VCS solutions that can do this for you.
2. How much that change cost (in developer time or wages). Your project admin should have this information available in some form. If there's one thing teams are measuring, it's usually cost.
Divide developer cost by code churn, and - hey presto - you have the cost of changing a line of code.
All teams experience an increase in the cost of changing code, but better teams tend to experience a flatter gradient. The worst teams have a "hockey stick" trend, where change becomes prohibitively expensive alarmingly soon. e.g., one software company, after 8 years, had a cost of change 40x higher than at the start. At the same time, they tend to see lead times growing exponentially, as delivery gets slower and slower.
I dare your boss not to care about these first two metrics!
April 18, 2017
20 Dev Metrics - 1. Lead Time to DeliveryOn the Codemanship Twitter account, I've started posting a series of daily 'memes' called 20 Metrics for Dev Teams.
These are based on the health check some clients ask me to do to establish where teams are when I first engage with them as a coach. My focus, as a business, is on sustaining the pace of innovation, so these metrics are designed to build a picture of exactly that.
The first metric is, in my experience, the most important in that regard; how long does it take after a customer asks the team for a change to the software before it's delivered in a usable state? This is often referred to as the "lead time" to software delivery.
Software development's a learning process, and in any learning process, the time lag before we get useful feedback has a big impact on how fast we learn. While others may focus on the value delivered in each drop, the reality is that "value" is very subjective, and impossible to predict i advance. When the customer says "This feature has 10x the value of that feature to my business", it's just an educated guess. We still have to put working software in front of real end users to find out what really has value. Hence, the sooner we can get that feedback, the more we can iterate towards higher value.
This is why Lead Time to Delivery is my number one metric for dev teams.
To measure it, you need to know two things:
1. When did the customer request the change?
2. When did the customer have working software available to use (i.e., tested and deployed) that incorporated the completed change?
Many teams have repositories of completed user stories (e.g., in Jira - yes, I know!) from which this data can be mined historically. Bu the picture can be confusing, as too many teams have woolly definitions of "done". My advice is to start recording these dates explicitly, and to firm up your definition of "done" to mean specifically "successfully in production and in use".
As with weather and climate, what we should be most interested in is the trend of lead time as development progresses.
Sustaining innovation means keep lead times short. The majority of dev teams struggle to achieve this, and as the software grows and the unpaid technical debt piles up, lead times tend to grow. To some extent this is inevitable - entropy gets us all in the end - but our goal is to keep the trend as flat as we can so we can keep delivering and our customer can keep learning their way to success.