It is not about writing tests, its about writing stories
Wednesday, September 02, 2009
by Miško Hevery
A piece of software is like a car. Lets say you would like to test a car, which you are in the process of designing, would you test is by driving it around and making modifications to it, or would you prove your design by testing each component separately? I think that testing all of the corner cases by driving the car around is very difficult, yes if the car drives you know that a lot of things must work (engine, transmission, electronics, etc), but if it does not work you have no idea where to look. However, there are some things which you will have very hard time reproducing in this end-to-end test. For example, it will be very hard for you to see if the car will be able to start in the extreme cold of the north pole, or if the engine will not overheat going full throttle up a sand dune in Sahara. I propose we take the engine out and simulate the load on it in a laboratory.
We call driving car around an end-to-end test and testing the engine in isolation a unit-test. With unit tests it is much easier to simulate failures and corner cases in a much more controlled environment. We need both tests, but I feel that most developers can only imagine the end-to-end tests.
But lets see how we could use the tests to design a transmission. But first, little terminology change, lets not call them test, but instead call them stories. They are stories because that is what they tell you about your design. My first story is that:
- the transmission should allow the output shaft to be locked, move in same direction (D) as the input shaft, move in opposite (R) or move independently (N)
Given such a story I could easily create a test which would prove that the above story is true for any design submitted to me. What I would most likely get is a transmission which would only have a single gear in each direction. So lets write another story
- the transmission should allow the ratio between input and output shaft to be [-1, 0, 1, 2, 3, 4]
Again I can write a test for such a transmission but i have not specified how the forward gear should be chosen, so such a transmission would most likely be permanently stuck in 1st gear and limit my speed, it will also over-rev the engine.
- the transmission should start in 1st and than switch to higher gear before the engine reaches maximum revolutions.
This is better, but my transmission would most likely rev the engine to maximum before it would switch, and once it would switch to higher gear and I would slow down, it would not down-shift.
- the transmission should down shift whenever the engine RPM fall bellow 1000 RPMs
OK, now it is starting to drive like a car, but still the limits for shifting really are 1000-6000 RPMs which is not very fuel efficient way to drive your car.
- the transmission should up-shift whenever the estimated fuel consumption at a higher gear ration is better than the current one.
So now our engine will not rev any more but it will be a lazy car since once the transmission is in the fuel efficient mode it will not want to down-shift
- the transmission should down-shift whenever the gas pedal is depressed more than 50% and the RPM is lower than the engine's peak output RPM.
I am not a transmission designer, but I think this is a decent start.
Notice how I focused on the end result of the transmission rather than on testing specific internals of it. The transmission designer would have a lot of levy in choosing how it worked internally, Once we would have something and we would test it in the real world we could augment these list of stories with additional stories as we discovered additional properties which we would like the transmission to posses.
If we would decide to change the internal design of the transmission for whatever reason we would have these stories as guides to make sure that we did not forget about anything. The stories represent assumptions which need to be true at all times. Over the lifetime of the component we can collect hundreds of stories which represent equal number of assumption which is built into the system.
Now imagine that a new designer comes on board and makes a design change which he believes will improve the responsiveness of the transmission, he can do so because the existing stories are not restrictive in how, only it what the outcome should be. The stories save the designer from breaking an existing assumption which was already designed into the transmission.
Now lets contrast this with how we would test the transmission if it would already be build.
- test to make sure all of the gears work
- test to make sure that the engine is not allowed to over-rev
It is hard now to think about what other tests to write, since we are not using the tests to drive the design. Now, lets say that someone now insist that we get 100% coverage, we open the transmission up and we see all kinds of logic, and rules and we don't know why since we were not part of the design so we write a test
- at 3000 RPM input shaft, apply 100% throttle and assert that the transmission goes to 2nd gear.
Tests like that are not very useful when you want to change the design, since you are likely to break the test, without fully understanding why the test was testing that specific conditions, it is hard to know if anything was broken if the tests is red.. That is because the tests does not tell a story any more, it only asserts the current design. It is likely that such a test will be in the way when you will try to do design changes. The point I am trying to make is that there is huge difference between writing tests before or after. When we write tests before we are:
- creating a story which is forcing a particular design decision.
- tests are a collection of assumptions which needs to be true at all times.
when we write tests after the fact we:
- miss a lot of reasons why things are done in particular way even if we have 100% coverage
- test are often brittle because they are tied to particulars of the current implementation
- tests are just snapshots and don't tell a story of why the component does something, only that it does.
For this reason there are huge differences in quality when writing assumptions as stories before (which force design to emerge) or writing tests after which take a snapshot of a given design.
For me , end to end test is easy to fall into testing M*N combination of unit tests, where the ROI may be negative ...
ReplyDeleteI like the stretch in thought and but it leaves with open questions and some contradictory thoughts...
ReplyDeleteI think at the crux of why i feel ambiguous is the definition of a test that is used at the bottom.
is a test really a series of assumptions that must be true at all times?
if a test is defined to be something that passes, then the redux is: a test does nothing but return a pass value in all cases.
so, I think I got it (please correct me if I am wrong).
Are you saying that if something is given in complete form, then a test is simply a way of describing the behaviors that exist - a mapping so to speak?
well if that is the case I agree, it doesn't say much about the quality or fitness of the thing, it doesn't even call out purpose.
but I don't see that as a test. I look at a test as an experiment with an outcome only.
Lets look at this with our science hat on, the scientific method.
software is intended to meet some set of expectations for it's behavior.
so, if it is a car - it should act like one. The stories are a dialog about the behavior of the car, what makes it a car.
tests are then the experiments that demostrates wether it is a car or not.
in order to be a car, certain requirements must be met - it must go 25mph (geez - I got that from iCarly), it must have four wheels, it must have a place for a person to sit - etc.
Now of course a car can explode when it rounds a corner at 40mph and still be a car prior to the explosion ....
so lets qualify test a bit more. There are validation tests that confirm the story.
but now we do need another term....
There need to be tests to challange the completeness of the story itself, tests to address assumptions the story teller has made.
so we have to think... mabey the story of the car really isnt the car, its just one of many stories that maps to our tests of the car... we do know we have a car of unknown safety and reliability.
we do not yet have the car - the one we were expecting to have.
what do we do if we want to ensure we have a BMW? Well, there are a number of things I expect from a BMW, you probably expect some the same, some different.
If we don't have a BMW to compare to, we are going to have to explore the behaviors of the car.
Why don't we take it on the autobaun, drive as fast as we can?
then when it breaks determine why.
I don't think its possible to know you have a BMW untill you set your expectations, and use the test failures and subsequent work on the car to make it what you think a BMW should be.
Sorry for the ramble - please feel free to correct me where I have strayed.
Nice article, but I don't see why writing tests after the product has already been built has to restrict the test cases in the way you described.
ReplyDeleteIf you take a step back and consider what the transmission is supposed to do, you could come up with the same set of tests as in your first example; I think it's a question of what test approach to take rather than what stage of development the product is at.
Of course, if you find faults by testing after the car has already been built, it will be a lot more expensive to adjust the design !
I had found hard sometimes to follow TDD by always writing test first, but when I stick to it the results come. I feel better when I'm sure I'm not writing unnecessary code or testing private methods.
ReplyDeleteA related article
ReplyDeleteRichard Feynman, the Challenger Disaster, and Software Engineering
http://duartes.org/gustavo/blog/post/richard-feynman-challenger-disaster-software-engineering
Misko,
ReplyDeleteVery good post. My most recent blog post has a similar theme. I had a different take on what software testers can learn from manufacturing though.
My blog post is here: http://hexawise.wordpress.com/2009/08/25/what-else-can-software-development-and-testing-learn-from-manufacturing-dont-forget-design-of-experiments/
There are more similarities than most people realize between manufacturing and software testing (and more lessons to be learned from proven manufacturing practices than most software testers realize). In manufacturing scenarios (such as making transmissions and engines), before a final design is arrived at, prototypes are constructed. There is a science to creating as few prototypes as possible and learning as much as possible from each prototype. The field is known as Design of Experiments and firms like Toyota and Ford have followed DoE principles for decades. The analogy in software development during this phase is Google’s web site optimizer. It allows developers of web apps to experiment with different layouts and learn (with as few “prototypes” as possible) which works best.
In this prototype building phase, the engineers involved probably have hypotheses from their past experiences about what will be a pretty good starting point (e.g., engine size of X, transmission should shift when the RPM's fall below Y, the transmission should down-shift whenever the gas pedal is depressed more than Z% and a further condition applies, etc.) Several prototypes would be built that vary X, Y, and Z so that X+10% and Y-10% and Z might be tested together as well as X with Y+10% and Z-10%, etc. In that way, the interaction between the different variables would be identified as well to capture the near-optimal solution in as little time and investment as possible. An indication of how important Design of Experiments is to Toyota is that one of their senior engineers was quoted in the book 'Statistics for Experimenters' as saying "An engineer who does not understand Design of Experiments is not an engineer."
After the Design of Experiments prototyping phase is completed, the finished product should be tested. This step described above not typically considered “Testing” by QA/Testing organizations, but the next lesson learned from manufacturing is:
Design of Experiments methods (borrowed from decades of successful implementations in manufacturing) can also be used in QA / software testing. With a goal of finding as many defects in as few test cases as possible, Design of Experiments-based test case identification methods (such as pair-wise, orthogonal array-based, and n-wise approaches) have been proven to find more than twice as many defects per tester hour as standard, manual methods of test case identification in a wide range of software testing situations.
Empirical evidence from 10 projects showing that average defects found per tester hour more than doubled can be found in an article I link to in my blog post. Please see the link to the IEEE article I recently co-wrote with three PhD's who share my strong belief that these efficient and effective software testing methods are not nearly as widespread among software testers as they would be if more people tried them out and saw for themselves the benefits that these proven approaches consistently deliver.
- Justin
Just as a general observation I see tests in the test-driven development model acting as the requirements in traditional software development models with the exception that tests are generally more detailed and provide more granularity. Although such tests set some proper expectations with the end-user, I doubt any developer will be able to nail down all test cases that must be passed on the first round (unless the dev is also an SME in the software's domain). To address this, test-driven development must be executed in a cyclic fashion allowing new tests/stories/requirements to be added and old code to be tested against those new tests/stories/requirements.
ReplyDelete... You could call it "questions" instead of "stories" since (as you have already pointed out) your stories are about the what and not about the how.
ReplyDelete