Friday, March 28, 2008

what kind of misleading metrics have you observed ?

I asked this question on Linkedin and got a lot of interesting responses.

Here is the link and I will summarize the contents here:

http://www.linkedin.com/answers/technology/software-development/TCH_SFT/198122-155544

1) 100000 test cases executed, 2500 failed. When you review the 100,000 test cases you realize that there are only 40 good test cases and the others are 2500 copies of the same test case with a minor variation.

2) 100 bugs were raised in this release When you review the 100 bugs you find, 20 are pilot errors and are INVALID, 30 are duplicates, 15 were nitpicks that developement decides WONTFIX and only 35 are real bugs out of which 20 that were originally marked as show stoppers got downgraded to trivial and pushed out to a far far out release.

3) lines of code

4) A more subtle answer is all the measurements that confuse effort with results. Managers who don't understand the work tend to reward the appearance of work. Myself, I like to see a programmer or tester who is well organized and plans well, so s/he can complete a full days work in eight hours and go home and have a life. But too many managers reward those people who come in early and stay late, while appearing to be busy all day. (at least when the manager is watching) Programmers and testers who are not well organized themselves cannot do the kind of precision work that's required in software.

5) As Samuel Clemons once opined... "There are lies, damn lies and then there are statistics!" It reminds me of the prank we pulled in High School. We circulated a petition at school asking for the total ban of Oxydihydride! It was THE most dangerous element known to man and was responsible for killing millions of people (more than the plague) as well as causing untold property damage. We stopped after we got a couple of hundred signatures. Of course you know it better as water. People will always spin data to prove their theories. Some do it maliciously with an intent to defraud and some are naive about the whole process.

6) There are 100 bugs and we are fixing them at a rate of 25 a week, so we'll be ready to release in 4 weeks. This ignores the incoming rate, which could be 25 a week also.

7) Metrics are most useful when used close to their source, to adjust process. Up the chain, they become goals rather than measurements, and people will then naturally game the system -- thus the cliche "paid by lines of code". For example, if you are measuring closure time on bugs, you will have many little, focused bugs, with quick, possibly less reliable fixes. If you are measuring bug counts, you will have fewer bugs, but longer closure times because each bug is overloaded, or added to, to lower the count. Neither improves the process.

8) the ones that have been controversial and to me appear to have less value are some time based ones like "time to fix", "time to regress" These can be part of tracking and motivation but they get confused fast when other priorities are mixed in.

9) I like the standard ones used by call center managers.

year 1 - Reduced time/client
year 2 - Reduced Total # of calls to call center
year 3 - Reduced time/client

What is actually happening in year 2 was that they were understaffed, so they had long hold times, so customers with easy questions either gave up or figured it out themselves by then. In years 1 and 3, they are now handling the easier issues, which naturally resolve themselves more quickly. If you ever compare annual reviews from an IT department you will see these trends. I don't even think they are being decietful, b/c the turn over is too high. The person there in years 1 and 2 is replaced by year 3 by a guy who thinks he can do it better. When in fact, he is just striving to maximize the metrics that upper management picks in a given year.

10) Metrics can be useful but also quite dangerous. I have seen examples where performance bonuses are attached to metrics and teams can quickly shift to a myopic focus on attaining bonus without really improving the product or process the metric/bonus was initially intended to incent.

I think that when the focus shifts away from people, their face to face interactions and reviewing the product frequently, to metrics and dashboards, the spirit of what the team is trying to accomplish may be diminished.

11) "X" Projects managed; all of them successfully. In most cases the truth is a variation of: "project somehow completed", after client accepted to categorize & prioritize defects, agreed to leave some of them out, and the final delivery after several schedule delays. Of course he will continue to pay for it since the remaining defects will be covered under the "software change control/management" clauses - at extra cost.

12). I've been seeing this next one on rise recently - under the guise of "continuous improvement": Number of new defects / change requests declining with time. Closer examination reveals that if the initial quality is low enough, you can show continuous improvement for a long, really long time.

13) People game the system IF they know they will be judged based on those metrics. I agree with Watts S. Humphrey, who wrote that metrics are immensely helpful but only if they are not used "against" the workers, otherwise they will simply game them, writing more LOC than necessary, putting several bugs into one, etc. If I look at the question with this in mind, then I would say every metric can be misleading.

14) One of my favorite misleading metrics is to look at a point in time as opposed to a trend. "We only found one bug today!" But the testers were all at a workshop and when they return tomorrow, they'll find 50 more to make up for lost time. "We finished another feature!" But how long did it take and how many people? Without looking at several pieces of data, and without looking at trends, a single-dimension metric is quite misleading.

15) Let me follow up Jerry Weinberg's comments about measurements that confuse effort with results with a few specific examples:

i) Time reporting Last month, 1266 person hours was dedicated to the project. If it's safe enough for people to reveal what's really happening, you will hear that people reported whatever management wanted to hear. I often wonder what percentage of people fill out accurate time reports. I suspect it's a very low number.

ii) How many cars are in the parking lot early in the morning or late at night It's a quick measurement that any manager can make. And they have heard from management experts that all the fastest growing companies have this characteristic. Many managers conclude that if there are more cars in the parking lot, the faster the business will grow. They would be better off surveying the local junk yard.

iii) Percent project complete I've heard that a project is 90% complete dozens of times in my career. I admit until I learned its uselessness, I reported projects that way. Why? The remaining 10% takes longer to complete then the first 90%. But that's not the inference people would make from the complete percentage. Others have commented that people game measurements. I agree. That certainly happens. But why does it happen? Because people are using measurements as evidence to support their story about a project. Once the measurements are used as evidence, the gaming begins and the more the gaming, the more useless the measurement.

16) A dramatic reduction of violence in Iraq compared to a year ago and yet some of the highest rates since the invasion....

17) The economy is strong, compared to bailing out Bears Stearns

18) One of my current favorites is Code Coverage of unit tests. When a certain level of coverage is mandated, test will be written that exercise the code but don't verify that it works. I've seen test suites that had no assertions at all. If the code didn't blow up, it passed. Anytime a metric is used as a goal instead of an indicator, it is likely to be misleading. BTW, I second the recommendation for Jerry's book, QSM vol. 2. I'm re-reading it at the moment.

19) I suggest that the misleading metric is always the metric without its own history. In other words, if you compare the result of a metric with the related result of previous metric, the approach is good, because the percentage of error is surely the same in both cases, Otherwise the result is like what you said in your two examples.

20) Metrics should never be an end goal, because their interpretation is subjective, metrics can always be gamed, and attempting to maximize one metric will negatively impact others.

How much time do your developers spend fixing bugs, versus implementing new features? This is an interesting metric to look at, but what does it tell you, really?

Suppose developers spend no time fixing bugs. What does that mean? It could mean developers don't like fixing bugs, so they work on new features instead. Or it could mean that QA isn't finding bugs, because they're too busy playing foosball. Heck, maybe it means there is no QA department and that customers have no way to contact the company.

How many bugs do your customers submit per release? Again, that's an interesting metric to look at, but its interpretation is not so easy.

Suppose customers submitted 10 times more bugs this release than the release before. Does that mean this release is 10 times buggier? It could be. But maybe your user base increased by 10 times, or maybe you made it 10 times easier for customers to submit bugs by giving them a direct interface to your bug tracker.

You can reduce bugs to zero by killing the QA department and not providing contact information to your customers. Presto, zero bugs overnight! How will that help your company? It won't.

Finding more bugs can mean QA is doing a better job. It could mean your customer base is expanding. It could mean you're making it easier for customers to submit bugs. All these things are *good* for the company.

Instead of focusing on metrics, it's better to focus on the efficiency of processes. If QA is finding bugs, QA needs to work with developers to find out how these bugs were injected into the code, and how developers can prevent this from happening again. That's a process improvement -- you don't need metrics for that.

Similarly, if customers are reporting bugs, maybe designers can work with QA to show them how customers are using the product, so they can cover more usage scenarios; and maybe developers can work with QA to provide them better tools for creating and automating tests. This, too, is a process improvement, requiring no metrics.

Metrics do have some value, especially when looking at trends, but the vast majority of time spend metric chasing would be better spent improving the end-to-end process of delivering value to customers.

No comments: