I found this cartoon about testing :
http://www.eviltester.com/index.php/2008/03/23/evil-tester-explains-software-testing-strategy-27-find-the-big-bug-first/
The author has a list of book reviews here:
http://www.compendiumdev.co.uk/blog/
I found the link through an essay on testing reflections that was lamenting about the low quality of testers and how not many have read any books on testing or have a favorite blog:
http://www.testingreflections.com/node/view/6795
and a funny response to it is here:
http://www.eviltester.com/index.php/2008/03/28/we-dont-need-no-stinkin-passion/
Sunday, March 30, 2008
Friday, March 28, 2008
What are the patterns in requirements, testing and how do they relate to the well known patterns in design ?
I posted this question in linked in at: http://www.linkedin.com/answers/technology/software-development/TCH_SFT/195456-155544
There were a few responses. The summary of the questions and responses are:
Question: When you look at a product and see the requirements, do have a feeling of deja vu, like you have seen similar requirements before in another product ? And quickly identify some deficiencies in the current set of requirements because you gone through this experience before more than a few times ?
Have you looked at a test plan or set of test cases and said to yourself, gee, this looks very much like some other project and this is how we tested then and we learned a few lessons the hard way. If I was gonna test something like this again, I would try these set of things ?
Have you looked at a DB schema and looked at the flow of HTML forms and the requirements and the test plan and said to yourself -- there seems to be trend in how the UML diagrams, DB schema diagrams and the HTML forms flow and the test plan seem to look like they could all be generated from just one of them. That is, given a good flow chart, the DB schema could be generated and so could the FORMs, or given a set of screen shots and how they interact, a flow chart and DB schema could be generated ?
Examples I gave:
I have written many test tools and usually they run a command and compare the result with an expected pattern and determine wether to pass or fail the test. What I have noticed is that this framework can be used for many products because they do the same thing, i.e run a command and compare with a result. There are a few types of comparisions:
a) Static results -- An query results in an answer from a DB. The answers are always same.
b) Load balanced results -- for e.g. There might be an ad on a web page, that is shown 25 % of the time and another 75 % of the time. The average might be in that ratio but the first 25,000 might be the ad 1 while the next 75,000 might be ad 2. To really test it have to run the query a whole lot of times.
c) Time sensitive results -- A sale might only exist on Memorial day, Trains might run twice as frequently on weekdays as on weekends, Phone rates might be cheaper after 7 pm and weekends. The results depend on when you run the test or the test framework has to set the time on the application server before running the test.
d) State based results -- If you look up the price of a flight from Chicago to Denver, it might give you $150. But as more and more seats fill up, the next time you run the query the price given to you could be $200, $250. etc. etc. etc.
The above are all what I call business logic testing and this type seems to be repeating over and over again. I found similar patterns in Performance, Stability and Scalability testing where you could build a framework and use the same against many products.
Responses I got:
4.-----by Paul Oldfield------------------------------------------------------------------------
One thing that I have observed is that I come across the same 'mini-domain' models repeatedly. At about the same time I was making this observation, Martin Fowler published his book "Analysis Patterns" that covered a set of domain models he had commonly found to be re-usable. Of course, in any business domain different organisations have their own ways of working; they use this common domain model in similar but not identical ways. Patterns of use of a system can also be re-usable - acknowledged for 'data-rich' systems by the "CRUD" acronym. We tend to use a set of operations that are basically Create, Read, Update, Delete and Query / Report. 'Behaviour-rich' systems tend to disguise these CRUD operations, but any interaction with such a system can usually be broken down into CRUD operations and the timely application of a few business rules. So yes, in requirements we do tend to see similar things happen repeatedly. We already acknowledge the existence of some of these patterns. I'm not familiar with testing in such depth but I gather there will be similar patterns of re-use; after all the tests will be based on requirements that fall into patterns, or on design that utilises patterns. Auto-generation of code and schemas by translation of models has already happened. It is quite common for control-rich systems and is becoming more common for behaviour-rich systems. For data-rich systems it is more common to use 4GL languages. Some workers are developing domain-specific languages that show some promise of uniting the 4GL and model transformation approaches. I'm not sure whether this answers the question you were asking, but hopefully it gives you a few leads into areas that you might find of interest.
5.----by Michael Kramlich--------------------------------------------------------------------
One pattern I've seen is the Resume Stuffing pattern, where someone writes a requirements doc in order to make themselves look good, or to add a bullet point to their resume, or as ammunition to help themselves get promoted. I've also seen the Crack Smoking pattern, where the author was clearly smoking crack when he wrote it. And I've seen the More Boilerplate Than Content pattern. The name of that pattern should give you a good idea of what it's talking about. If not, have your secretary call my secretary, let's do lunch sometime, next week's not good however, etc. I'm joking but I'm also serious. I think a lot of folks are going to give super-serious answers, I salute them for it, and there's a probably a lot of merit to what they're going to say. It's all good. Well, much of it. Okay, some of it. I kid! Okay, to be serious-serious. Probably the most important thing to a requirements document, if one is going to exist (and it doesn't always need to exist, trust me), is that the content is expressed at the right level of specificity for your organization, for the developing entity, and your situation. Put too much work into it, and it gets thrown away because it often becomes a dead document as soon as the rubber hits the road -- er, I mean, the code hits the CPU. The actual content of a requirements document will depend on the type of project, and the type of specifications or diagrams you use will vary as well. If you really care about identifying and compiling patterns, and really taking advantage of them (if they exist) then I would recommend an approach where you try to invent a model specification format. One that is both human readable and machine readable, and from which all your deliverable artifacts are generated, either directly or indirectly. Try to allow bi-directional changes, and preserve metadata like comments. And never produce a dead document, ever. Patterns are only useful to think about if they recur. If they recur it suggests an opportunity to automate. Software is all about automating. If the pattern (of requirements document and/or application behavior) is unique to your business, then write it (that tool for handling and generating deliverables from your requirements document model) in-house and enjoy a competitive advantage. If the pattern is NOT unique to your business, if it's general enough that it would recur in other businesses, then look to make that tool a sellable product, and possibly spin it off into it's own company as well. The real danger in thinking too much about Patterns of Requirements Documents is that you turn into a sort of stamp collector or butterfly collector for this stuff, rather than approaching them as opportunities to build a tool, or add a feature to an existing tool. The latter is more productive, more useful, and potentially more profitable as well. Oh, one more thing, you mentioned that you have written test tools in the past, to assert various expected qualities about the application in question. This sounds like another opportunity to bake these assertions into a formalized requirements document thingie. Not expressed in English/Hindi/whatever, but expressed in a DSL you create for this purpose, and from which you generate all deliverables and/or configure the execution of all related and relevant processes, like, for example, the execution of your tests. All of these patterns-to-tests you mentioned (performance, stability, etc.) are all opportunitities to "meta-automate" in this fashion. Also, one of the highest and most important goals/rules/ideals of programming is, in my opinion, the DRY principle: Don't Repeat Yourself. Any attempt to study or improve upon requirements documents would benefit from looking for opportunities to increase adherence to the DRY principle. If something must be specified somewhere, anywhere, then specify it once, and only once, in a well-defined place. Everywhere else just refers to it, or, is derived/generated from it.
There were a few responses. The summary of the questions and responses are:
Question: When you look at a product and see the requirements, do have a feeling of deja vu, like you have seen similar requirements before in another product ? And quickly identify some deficiencies in the current set of requirements because you gone through this experience before more than a few times ?
Have you looked at a test plan or set of test cases and said to yourself, gee, this looks very much like some other project and this is how we tested then and we learned a few lessons the hard way. If I was gonna test something like this again, I would try these set of things ?
Have you looked at a DB schema and looked at the flow of HTML forms and the requirements and the test plan and said to yourself -- there seems to be trend in how the UML diagrams, DB schema diagrams and the HTML forms flow and the test plan seem to look like they could all be generated from just one of them. That is, given a good flow chart, the DB schema could be generated and so could the FORMs, or given a set of screen shots and how they interact, a flow chart and DB schema could be generated ?
Examples I gave:
I have written many test tools and usually they run a command and compare the result with an expected pattern and determine wether to pass or fail the test. What I have noticed is that this framework can be used for many products because they do the same thing, i.e run a command and compare with a result. There are a few types of comparisions:
a) Static results -- An query results in an answer from a DB. The answers are always same.
b) Load balanced results -- for e.g. There might be an ad on a web page, that is shown 25 % of the time and another 75 % of the time. The average might be in that ratio but the first 25,000 might be the ad 1 while the next 75,000 might be ad 2. To really test it have to run the query a whole lot of times.
c) Time sensitive results -- A sale might only exist on Memorial day, Trains might run twice as frequently on weekdays as on weekends, Phone rates might be cheaper after 7 pm and weekends. The results depend on when you run the test or the test framework has to set the time on the application server before running the test.
d) State based results -- If you look up the price of a flight from Chicago to Denver, it might give you $150. But as more and more seats fill up, the next time you run the query the price given to you could be $200, $250. etc. etc. etc.
The above are all what I call business logic testing and this type seems to be repeating over and over again. I found similar patterns in Performance, Stability and Scalability testing where you could build a framework and use the same against many products.
Responses I got:
4.-----by Paul Oldfield------------------------------------------------------------------------
One thing that I have observed is that I come across the same 'mini-domain' models repeatedly. At about the same time I was making this observation, Martin Fowler published his book "Analysis Patterns" that covered a set of domain models he had commonly found to be re-usable. Of course, in any business domain different organisations have their own ways of working; they use this common domain model in similar but not identical ways. Patterns of use of a system can also be re-usable - acknowledged for 'data-rich' systems by the "CRUD" acronym. We tend to use a set of operations that are basically Create, Read, Update, Delete and Query / Report. 'Behaviour-rich' systems tend to disguise these CRUD operations, but any interaction with such a system can usually be broken down into CRUD operations and the timely application of a few business rules. So yes, in requirements we do tend to see similar things happen repeatedly. We already acknowledge the existence of some of these patterns. I'm not familiar with testing in such depth but I gather there will be similar patterns of re-use; after all the tests will be based on requirements that fall into patterns, or on design that utilises patterns. Auto-generation of code and schemas by translation of models has already happened. It is quite common for control-rich systems and is becoming more common for behaviour-rich systems. For data-rich systems it is more common to use 4GL languages. Some workers are developing domain-specific languages that show some promise of uniting the 4GL and model transformation approaches. I'm not sure whether this answers the question you were asking, but hopefully it gives you a few leads into areas that you might find of interest.
5.----by Michael Kramlich--------------------------------------------------------------------
One pattern I've seen is the Resume Stuffing pattern, where someone writes a requirements doc in order to make themselves look good, or to add a bullet point to their resume, or as ammunition to help themselves get promoted. I've also seen the Crack Smoking pattern, where the author was clearly smoking crack when he wrote it. And I've seen the More Boilerplate Than Content pattern. The name of that pattern should give you a good idea of what it's talking about. If not, have your secretary call my secretary, let's do lunch sometime, next week's not good however, etc. I'm joking but I'm also serious. I think a lot of folks are going to give super-serious answers, I salute them for it, and there's a probably a lot of merit to what they're going to say. It's all good. Well, much of it. Okay, some of it. I kid! Okay, to be serious-serious. Probably the most important thing to a requirements document, if one is going to exist (and it doesn't always need to exist, trust me), is that the content is expressed at the right level of specificity for your organization, for the developing entity, and your situation. Put too much work into it, and it gets thrown away because it often becomes a dead document as soon as the rubber hits the road -- er, I mean, the code hits the CPU. The actual content of a requirements document will depend on the type of project, and the type of specifications or diagrams you use will vary as well. If you really care about identifying and compiling patterns, and really taking advantage of them (if they exist) then I would recommend an approach where you try to invent a model specification format. One that is both human readable and machine readable, and from which all your deliverable artifacts are generated, either directly or indirectly. Try to allow bi-directional changes, and preserve metadata like comments. And never produce a dead document, ever. Patterns are only useful to think about if they recur. If they recur it suggests an opportunity to automate. Software is all about automating. If the pattern (of requirements document and/or application behavior) is unique to your business, then write it (that tool for handling and generating deliverables from your requirements document model) in-house and enjoy a competitive advantage. If the pattern is NOT unique to your business, if it's general enough that it would recur in other businesses, then look to make that tool a sellable product, and possibly spin it off into it's own company as well. The real danger in thinking too much about Patterns of Requirements Documents is that you turn into a sort of stamp collector or butterfly collector for this stuff, rather than approaching them as opportunities to build a tool, or add a feature to an existing tool. The latter is more productive, more useful, and potentially more profitable as well. Oh, one more thing, you mentioned that you have written test tools in the past, to assert various expected qualities about the application in question. This sounds like another opportunity to bake these assertions into a formalized requirements document thingie. Not expressed in English/Hindi/whatever, but expressed in a DSL you create for this purpose, and from which you generate all deliverables and/or configure the execution of all related and relevant processes, like, for example, the execution of your tests. All of these patterns-to-tests you mentioned (performance, stability, etc.) are all opportunitities to "meta-automate" in this fashion. Also, one of the highest and most important goals/rules/ideals of programming is, in my opinion, the DRY principle: Don't Repeat Yourself. Any attempt to study or improve upon requirements documents would benefit from looking for opportunities to increase adherence to the DRY principle. If something must be specified somewhere, anywhere, then specify it once, and only once, in a well-defined place. Everywhere else just refers to it, or, is derived/generated from it.
what kind of misleading metrics have you observed ?
I asked this question on Linkedin and got a lot of interesting responses.
Here is the link and I will summarize the contents here:
http://www.linkedin.com/answers/technology/software-development/TCH_SFT/198122-155544
1) 100000 test cases executed, 2500 failed. When you review the 100,000 test cases you realize that there are only 40 good test cases and the others are 2500 copies of the same test case with a minor variation.
2) 100 bugs were raised in this release When you review the 100 bugs you find, 20 are pilot errors and are INVALID, 30 are duplicates, 15 were nitpicks that developement decides WONTFIX and only 35 are real bugs out of which 20 that were originally marked as show stoppers got downgraded to trivial and pushed out to a far far out release.
3) lines of code
4) A more subtle answer is all the measurements that confuse effort with results. Managers who don't understand the work tend to reward the appearance of work. Myself, I like to see a programmer or tester who is well organized and plans well, so s/he can complete a full days work in eight hours and go home and have a life. But too many managers reward those people who come in early and stay late, while appearing to be busy all day. (at least when the manager is watching) Programmers and testers who are not well organized themselves cannot do the kind of precision work that's required in software.
5) As Samuel Clemons once opined... "There are lies, damn lies and then there are statistics!" It reminds me of the prank we pulled in High School. We circulated a petition at school asking for the total ban of Oxydihydride! It was THE most dangerous element known to man and was responsible for killing millions of people (more than the plague) as well as causing untold property damage. We stopped after we got a couple of hundred signatures. Of course you know it better as water. People will always spin data to prove their theories. Some do it maliciously with an intent to defraud and some are naive about the whole process.
6) There are 100 bugs and we are fixing them at a rate of 25 a week, so we'll be ready to release in 4 weeks. This ignores the incoming rate, which could be 25 a week also.
7) Metrics are most useful when used close to their source, to adjust process. Up the chain, they become goals rather than measurements, and people will then naturally game the system -- thus the cliche "paid by lines of code". For example, if you are measuring closure time on bugs, you will have many little, focused bugs, with quick, possibly less reliable fixes. If you are measuring bug counts, you will have fewer bugs, but longer closure times because each bug is overloaded, or added to, to lower the count. Neither improves the process.
8) the ones that have been controversial and to me appear to have less value are some time based ones like "time to fix", "time to regress" These can be part of tracking and motivation but they get confused fast when other priorities are mixed in.
9) I like the standard ones used by call center managers.
year 1 - Reduced time/client
year 2 - Reduced Total # of calls to call center
year 3 - Reduced time/client
What is actually happening in year 2 was that they were understaffed, so they had long hold times, so customers with easy questions either gave up or figured it out themselves by then. In years 1 and 3, they are now handling the easier issues, which naturally resolve themselves more quickly. If you ever compare annual reviews from an IT department you will see these trends. I don't even think they are being decietful, b/c the turn over is too high. The person there in years 1 and 2 is replaced by year 3 by a guy who thinks he can do it better. When in fact, he is just striving to maximize the metrics that upper management picks in a given year.
10) Metrics can be useful but also quite dangerous. I have seen examples where performance bonuses are attached to metrics and teams can quickly shift to a myopic focus on attaining bonus without really improving the product or process the metric/bonus was initially intended to incent.
I think that when the focus shifts away from people, their face to face interactions and reviewing the product frequently, to metrics and dashboards, the spirit of what the team is trying to accomplish may be diminished.
11) "X" Projects managed; all of them successfully. In most cases the truth is a variation of: "project somehow completed", after client accepted to categorize & prioritize defects, agreed to leave some of them out, and the final delivery after several schedule delays. Of course he will continue to pay for it since the remaining defects will be covered under the "software change control/management" clauses - at extra cost.
12). I've been seeing this next one on rise recently - under the guise of "continuous improvement": Number of new defects / change requests declining with time. Closer examination reveals that if the initial quality is low enough, you can show continuous improvement for a long, really long time.
13) People game the system IF they know they will be judged based on those metrics. I agree with Watts S. Humphrey, who wrote that metrics are immensely helpful but only if they are not used "against" the workers, otherwise they will simply game them, writing more LOC than necessary, putting several bugs into one, etc. If I look at the question with this in mind, then I would say every metric can be misleading.
14) One of my favorite misleading metrics is to look at a point in time as opposed to a trend. "We only found one bug today!" But the testers were all at a workshop and when they return tomorrow, they'll find 50 more to make up for lost time. "We finished another feature!" But how long did it take and how many people? Without looking at several pieces of data, and without looking at trends, a single-dimension metric is quite misleading.
15) Let me follow up Jerry Weinberg's comments about measurements that confuse effort with results with a few specific examples:
i) Time reporting Last month, 1266 person hours was dedicated to the project. If it's safe enough for people to reveal what's really happening, you will hear that people reported whatever management wanted to hear. I often wonder what percentage of people fill out accurate time reports. I suspect it's a very low number.
ii) How many cars are in the parking lot early in the morning or late at night It's a quick measurement that any manager can make. And they have heard from management experts that all the fastest growing companies have this characteristic. Many managers conclude that if there are more cars in the parking lot, the faster the business will grow. They would be better off surveying the local junk yard.
iii) Percent project complete I've heard that a project is 90% complete dozens of times in my career. I admit until I learned its uselessness, I reported projects that way. Why? The remaining 10% takes longer to complete then the first 90%. But that's not the inference people would make from the complete percentage. Others have commented that people game measurements. I agree. That certainly happens. But why does it happen? Because people are using measurements as evidence to support their story about a project. Once the measurements are used as evidence, the gaming begins and the more the gaming, the more useless the measurement.
16) A dramatic reduction of violence in Iraq compared to a year ago and yet some of the highest rates since the invasion....
17) The economy is strong, compared to bailing out Bears Stearns
18) One of my current favorites is Code Coverage of unit tests. When a certain level of coverage is mandated, test will be written that exercise the code but don't verify that it works. I've seen test suites that had no assertions at all. If the code didn't blow up, it passed. Anytime a metric is used as a goal instead of an indicator, it is likely to be misleading. BTW, I second the recommendation for Jerry's book, QSM vol. 2. I'm re-reading it at the moment.
19) I suggest that the misleading metric is always the metric without its own history. In other words, if you compare the result of a metric with the related result of previous metric, the approach is good, because the percentage of error is surely the same in both cases, Otherwise the result is like what you said in your two examples.
20) Metrics should never be an end goal, because their interpretation is subjective, metrics can always be gamed, and attempting to maximize one metric will negatively impact others.
How much time do your developers spend fixing bugs, versus implementing new features? This is an interesting metric to look at, but what does it tell you, really?
Suppose developers spend no time fixing bugs. What does that mean? It could mean developers don't like fixing bugs, so they work on new features instead. Or it could mean that QA isn't finding bugs, because they're too busy playing foosball. Heck, maybe it means there is no QA department and that customers have no way to contact the company.
How many bugs do your customers submit per release? Again, that's an interesting metric to look at, but its interpretation is not so easy.
Suppose customers submitted 10 times more bugs this release than the release before. Does that mean this release is 10 times buggier? It could be. But maybe your user base increased by 10 times, or maybe you made it 10 times easier for customers to submit bugs by giving them a direct interface to your bug tracker.
You can reduce bugs to zero by killing the QA department and not providing contact information to your customers. Presto, zero bugs overnight! How will that help your company? It won't.
Finding more bugs can mean QA is doing a better job. It could mean your customer base is expanding. It could mean you're making it easier for customers to submit bugs. All these things are *good* for the company.
Instead of focusing on metrics, it's better to focus on the efficiency of processes. If QA is finding bugs, QA needs to work with developers to find out how these bugs were injected into the code, and how developers can prevent this from happening again. That's a process improvement -- you don't need metrics for that.
Similarly, if customers are reporting bugs, maybe designers can work with QA to show them how customers are using the product, so they can cover more usage scenarios; and maybe developers can work with QA to provide them better tools for creating and automating tests. This, too, is a process improvement, requiring no metrics.
Metrics do have some value, especially when looking at trends, but the vast majority of time spend metric chasing would be better spent improving the end-to-end process of delivering value to customers.
Here is the link and I will summarize the contents here:
http://www.linkedin.com/answers/technology/software-development/TCH_SFT/198122-155544
1) 100000 test cases executed, 2500 failed. When you review the 100,000 test cases you realize that there are only 40 good test cases and the others are 2500 copies of the same test case with a minor variation.
2) 100 bugs were raised in this release When you review the 100 bugs you find, 20 are pilot errors and are INVALID, 30 are duplicates, 15 were nitpicks that developement decides WONTFIX and only 35 are real bugs out of which 20 that were originally marked as show stoppers got downgraded to trivial and pushed out to a far far out release.
3) lines of code
4) A more subtle answer is all the measurements that confuse effort with results. Managers who don't understand the work tend to reward the appearance of work. Myself, I like to see a programmer or tester who is well organized and plans well, so s/he can complete a full days work in eight hours and go home and have a life. But too many managers reward those people who come in early and stay late, while appearing to be busy all day. (at least when the manager is watching) Programmers and testers who are not well organized themselves cannot do the kind of precision work that's required in software.
5) As Samuel Clemons once opined... "There are lies, damn lies and then there are statistics!" It reminds me of the prank we pulled in High School. We circulated a petition at school asking for the total ban of Oxydihydride! It was THE most dangerous element known to man and was responsible for killing millions of people (more than the plague) as well as causing untold property damage. We stopped after we got a couple of hundred signatures. Of course you know it better as water. People will always spin data to prove their theories. Some do it maliciously with an intent to defraud and some are naive about the whole process.
6) There are 100 bugs and we are fixing them at a rate of 25 a week, so we'll be ready to release in 4 weeks. This ignores the incoming rate, which could be 25 a week also.
7) Metrics are most useful when used close to their source, to adjust process. Up the chain, they become goals rather than measurements, and people will then naturally game the system -- thus the cliche "paid by lines of code". For example, if you are measuring closure time on bugs, you will have many little, focused bugs, with quick, possibly less reliable fixes. If you are measuring bug counts, you will have fewer bugs, but longer closure times because each bug is overloaded, or added to, to lower the count. Neither improves the process.
8) the ones that have been controversial and to me appear to have less value are some time based ones like "time to fix", "time to regress" These can be part of tracking and motivation but they get confused fast when other priorities are mixed in.
9) I like the standard ones used by call center managers.
year 1 - Reduced time/client
year 2 - Reduced Total # of calls to call center
year 3 - Reduced time/client
What is actually happening in year 2 was that they were understaffed, so they had long hold times, so customers with easy questions either gave up or figured it out themselves by then. In years 1 and 3, they are now handling the easier issues, which naturally resolve themselves more quickly. If you ever compare annual reviews from an IT department you will see these trends. I don't even think they are being decietful, b/c the turn over is too high. The person there in years 1 and 2 is replaced by year 3 by a guy who thinks he can do it better. When in fact, he is just striving to maximize the metrics that upper management picks in a given year.
10) Metrics can be useful but also quite dangerous. I have seen examples where performance bonuses are attached to metrics and teams can quickly shift to a myopic focus on attaining bonus without really improving the product or process the metric/bonus was initially intended to incent.
I think that when the focus shifts away from people, their face to face interactions and reviewing the product frequently, to metrics and dashboards, the spirit of what the team is trying to accomplish may be diminished.
11) "X" Projects managed; all of them successfully. In most cases the truth is a variation of: "project somehow completed", after client accepted to categorize & prioritize defects, agreed to leave some of them out, and the final delivery after several schedule delays. Of course he will continue to pay for it since the remaining defects will be covered under the "software change control/management" clauses - at extra cost.
12). I've been seeing this next one on rise recently - under the guise of "continuous improvement": Number of new defects / change requests declining with time. Closer examination reveals that if the initial quality is low enough, you can show continuous improvement for a long, really long time.
13) People game the system IF they know they will be judged based on those metrics. I agree with Watts S. Humphrey, who wrote that metrics are immensely helpful but only if they are not used "against" the workers, otherwise they will simply game them, writing more LOC than necessary, putting several bugs into one, etc. If I look at the question with this in mind, then I would say every metric can be misleading.
14) One of my favorite misleading metrics is to look at a point in time as opposed to a trend. "We only found one bug today!" But the testers were all at a workshop and when they return tomorrow, they'll find 50 more to make up for lost time. "We finished another feature!" But how long did it take and how many people? Without looking at several pieces of data, and without looking at trends, a single-dimension metric is quite misleading.
15) Let me follow up Jerry Weinberg's comments about measurements that confuse effort with results with a few specific examples:
i) Time reporting Last month, 1266 person hours was dedicated to the project. If it's safe enough for people to reveal what's really happening, you will hear that people reported whatever management wanted to hear. I often wonder what percentage of people fill out accurate time reports. I suspect it's a very low number.
ii) How many cars are in the parking lot early in the morning or late at night It's a quick measurement that any manager can make. And they have heard from management experts that all the fastest growing companies have this characteristic. Many managers conclude that if there are more cars in the parking lot, the faster the business will grow. They would be better off surveying the local junk yard.
iii) Percent project complete I've heard that a project is 90% complete dozens of times in my career. I admit until I learned its uselessness, I reported projects that way. Why? The remaining 10% takes longer to complete then the first 90%. But that's not the inference people would make from the complete percentage. Others have commented that people game measurements. I agree. That certainly happens. But why does it happen? Because people are using measurements as evidence to support their story about a project. Once the measurements are used as evidence, the gaming begins and the more the gaming, the more useless the measurement.
16) A dramatic reduction of violence in Iraq compared to a year ago and yet some of the highest rates since the invasion....
17) The economy is strong, compared to bailing out Bears Stearns
18) One of my current favorites is Code Coverage of unit tests. When a certain level of coverage is mandated, test will be written that exercise the code but don't verify that it works. I've seen test suites that had no assertions at all. If the code didn't blow up, it passed. Anytime a metric is used as a goal instead of an indicator, it is likely to be misleading. BTW, I second the recommendation for Jerry's book, QSM vol. 2. I'm re-reading it at the moment.
19) I suggest that the misleading metric is always the metric without its own history. In other words, if you compare the result of a metric with the related result of previous metric, the approach is good, because the percentage of error is surely the same in both cases, Otherwise the result is like what you said in your two examples.
20) Metrics should never be an end goal, because their interpretation is subjective, metrics can always be gamed, and attempting to maximize one metric will negatively impact others.
How much time do your developers spend fixing bugs, versus implementing new features? This is an interesting metric to look at, but what does it tell you, really?
Suppose developers spend no time fixing bugs. What does that mean? It could mean developers don't like fixing bugs, so they work on new features instead. Or it could mean that QA isn't finding bugs, because they're too busy playing foosball. Heck, maybe it means there is no QA department and that customers have no way to contact the company.
How many bugs do your customers submit per release? Again, that's an interesting metric to look at, but its interpretation is not so easy.
Suppose customers submitted 10 times more bugs this release than the release before. Does that mean this release is 10 times buggier? It could be. But maybe your user base increased by 10 times, or maybe you made it 10 times easier for customers to submit bugs by giving them a direct interface to your bug tracker.
You can reduce bugs to zero by killing the QA department and not providing contact information to your customers. Presto, zero bugs overnight! How will that help your company? It won't.
Finding more bugs can mean QA is doing a better job. It could mean your customer base is expanding. It could mean you're making it easier for customers to submit bugs. All these things are *good* for the company.
Instead of focusing on metrics, it's better to focus on the efficiency of processes. If QA is finding bugs, QA needs to work with developers to find out how these bugs were injected into the code, and how developers can prevent this from happening again. That's a process improvement -- you don't need metrics for that.
Similarly, if customers are reporting bugs, maybe designers can work with QA to show them how customers are using the product, so they can cover more usage scenarios; and maybe developers can work with QA to provide them better tools for creating and automating tests. This, too, is a process improvement, requiring no metrics.
Metrics do have some value, especially when looking at trends, but the vast majority of time spend metric chasing would be better spent improving the end-to-end process of delivering value to customers.
suggested books for QA book reading club
I posted a request for book suggestions on linkedin and got a few replies.
The posting is at: http://www.linkedin.com/answers/technology/software-development/TCH_SFT/196363-155544
The recommended books so far (not in any order) are:
1. Books by James Bach at: http://www.satisfice.com/bibliography.shtml
2. Designing and Deploying Software Processes by F. Alan Goodman (AUERBACH PUBLICATIONS)
3. Software Sizing, Estimation, and Risk Management by Daniel D. Galorath & Donald Reifer (AUERBACH PUBLICATIONS)
4. CMMI® Distilled: A Practical Introduction to Integrated Process Improvement by Dennis M. Ahern, Aaron Clouse, Richard Turner (Addison Wesley)
5. Implementing the IEEE Software Engineering Standards by Michael Schmidt (Sams Publishing)
6. Metrics and Models in Software Quality Engineering by Stephen H. Kan (Addison Wesley)
7. Software Metrics: Best Practices for Successful IT Management by Paul Goodman (Rothstein Associates)
8. Software Measurement and Estimation by Linda M. Laird & M. Carol Brennan (Wiley)
9. Achieving Software Quality through Teamwork by Isabel Evans (ARTECH HOUSE)
10. Professional Pen Testing for Web Applications by Andres Andreu (Wrox press)
11. Unit Test Frameworks (O'Reilly)
12. Rex Black's Critical Testing Processes
13. Global Quality by Richard Tabor Greene
14. "Software Testing in the Real World" by Ed Kit
15. Introduction to Quality Control by Kaoru Ishikawa
16. "Software Testing Foundations" - Andreas Spillner, Tilo Linz, Hans Schaefer.
17. Testing Computer Software - 2nd Edition - Cem kaner
18. Lessons Learned in Software Testing - Cem Kaner, James Bach, Bret Pettichord
19. I know it when I saw it - A modern fable about Quality - John Guspari
The posting is at: http://www.linkedin.com/answers/technology/software-development/TCH_SFT/196363-155544
The recommended books so far (not in any order) are:
1. Books by James Bach at: http://www.satisfice.com/bibliography.shtml
2. Designing and Deploying Software Processes by F. Alan Goodman (AUERBACH PUBLICATIONS)
3. Software Sizing, Estimation, and Risk Management by Daniel D. Galorath & Donald Reifer (AUERBACH PUBLICATIONS)
4. CMMI® Distilled: A Practical Introduction to Integrated Process Improvement by Dennis M. Ahern, Aaron Clouse, Richard Turner (Addison Wesley)
5. Implementing the IEEE Software Engineering Standards by Michael Schmidt (Sams Publishing)
6. Metrics and Models in Software Quality Engineering by Stephen H. Kan (Addison Wesley)
7. Software Metrics: Best Practices for Successful IT Management by Paul Goodman (Rothstein Associates)
8. Software Measurement and Estimation by Linda M. Laird & M. Carol Brennan (Wiley)
9. Achieving Software Quality through Teamwork by Isabel Evans (ARTECH HOUSE)
10. Professional Pen Testing for Web Applications by Andres Andreu (Wrox press)
11. Unit Test Frameworks (O'Reilly)
12. Rex Black's Critical Testing Processes
13. Global Quality by Richard Tabor Greene
14. "Software Testing in the Real World" by Ed Kit
15. Introduction to Quality Control by Kaoru Ishikawa
16. "Software Testing Foundations" - Andreas Spillner, Tilo Linz, Hans Schaefer.
17. Testing Computer Software - 2nd Edition - Cem kaner
18. Lessons Learned in Software Testing - Cem Kaner, James Bach, Bret Pettichord
19. I know it when I saw it - A modern fable about Quality - John Guspari
Subscribe to:
Posts (Atom)