TotT: Literate Testing With Matchers

By Zhanyong G. Mock Wan in Google Kirkland

Alright, it sounds like a good idea to verify that matchmakers can read and write. How does this concern us programmers, though?
Actually, we are talking about a way of writing tests here – a way that makes both the test code and its output read like English (hence “literate”). The key to this technique is matchers, which are predicates that know how to describe themselves. For example, in Google C++ Mocking Framework, ContainsRegex("Ahcho+!") is a matcher that matches any string that has the regular expression "Ahcho+!" in it. Therefore, it matches "Ahchoo!" and "Ahchoooo! Sorry.", but not "Aha!".
What's this to do with test readability, anyway? It turns out that matchers, whose names are usually verb phrases, lend themselves easily to an assertion style that resembles natural languages. Namely, the assertion

EXPECT_THAT(value, matcher);


succeeds if value matches matcher. For example,

#include <gmock/gmock.h>
using ::testing::Contains;
...
EXPECT_THAT(GetUserList(), Contains(admin_id));


verifies that the result of GetUserList() contains the administrator.

Now, pretend the punctuations aren't there in the last C++ statement and read it. See what I mean?

Better yet, when an EXPECT_THAT assertion fails, it will print an informative message that includes the expression being validated, its value, and the property we expect it to have – thanks to a matcher's ability to describe itself in human-friendly language. Therefore, not only is the test code readable, the test output it generates is readable too. For instance, the above example might produce:

Value of: GetUserList()
Expected: contains "yoko"
  Actual: { "john", "paul", "george", "ringo" }


This message contains relevant information for diagnosing the problem, often without having to use a debugger.
To get the same effect without using a matcher, you'd have to write something like:

std::vector<std::string> users = GetUserList();
EXPECT_TRUE(VectorContains(users, admin_id))
    << " GetUserList() returns " << users
    << " and admin_id is " << admin_id;


which is harder to write and less clear than the one-liner we saw earlier.

Google C++ Mocking Framework (http://code.google.com/p/googlemock/) provides dozens of matchers for validating many kinds of values: numbers, strings, STL containers, structs, etc. They all produce friendly and informative messages. See http://code.google.com/p/googlemock/wiki/CheatSheet to learn more. If you cannot
find one that matches (pun intended) your need, you can either combine existing matchers, or define your own from scratch. Both are quite easy to do. We'll show you how in another episode. Stay tuned!

Toilet-friendly version

Permalink | Links to this post | 0 comments

Exploratory Testing ... in print and at STAR


By James A. Whittaker

Well my book survived a recession haunted publishing house and my own change of employer and is now available in print. The subtitle even changed as the techniques, which guided manual testing at Microsoft, were reapplied by Google engineers as a way to design test automation.


I'll be at STAR next week in Anaheim to talk about exploratory testing, the subject of the book. Accompanying me will be Rajat Dewan of Google who used the 'FedEx Tour' to reduce a test set from hundreds of manual test cases to exactly 9 automated ones. I hope you'll join us.


Permalink | Links to this post | 6 comments

Checked exceptions I love you, but you have to go


Once upon a time Java created an experiment called checked-exceptions, you know, you have to declare exceptions or catch them. Since that time, no other language (I know of) has decided to copy this idea, but somehow the Java developers are in love with checked exceptions. Here, I am going to "try" to convince you that checked-exceptions, even though look like a good idea at first glance, are actually not a good idea at all:

Empirical Evidence

Let's start with an observation of your code base. Look through your code and tell me what percentage of catch blocks do rethrow or print error? My guess is that it is in high 90s. I would go as far as 98% of catch blocks are meaningless, since they just print an error or rethrow the exception which will later be printed as an error. The reason for this is very simple. Most exceptions such as FileNotFoundException, IOException, and so on are sign that we as developers have missed a corner case. The exceptions are used as away of informing us that we, as developers, have messed up. So if we did not have checked exceptions, the exception would be throw and the main method would print it and we would be done with it (optionally we would catch all exceptions in the main log them if we are a server).

Checked exceptions force me to write catch blocks which are meaningless: more code, harder to read, and higher chance that I will mess up the rethrow logic and eat the exception.

Lost in Noise

Now lets look at the 2-5% of the catch blocks which are not rethrow and real interesting logic happens there. Those interesting bits of useful and important information is lost in the noise, since my eye has been trained to skim over the catch blocks. I would much rather have code where a catch would indicate: "pay, attention! here, something interesting is happening!", rather than, "it is just a rethrow." Now, if we did not have checked exceptions, you would write your code without catch blocks, test your code (you do test right?) and realize that under some circumstances an exception is throw and deal with it by writing the catch block. In such a case forgetting to write a catch block is no different than forgetting to write an else block of the if statement. We don't have checked ifs and yet no one misses them, so why do we need to tell developers that FileNotFound can happen. What if the developer knows for a fact that it can not happen since he has just placed the file there, and so such an exception would mean that your filesystem has just disappeared! (and your application is not place to handle that.)

Checked exception make me skim the catch blocks as most are just rethrows, making it likely that you will miss something important.

Unreachable Code

I love to write tests first and implement as a consequence of tests. In such a situation you should always have 100% coverage since you are only writing what the tests are asking for. But you don't! It is less than 100% because checked exceptions force you to write catch blocks which are impossible to execute. Check this code out:
bytesToString(byte[] bytes) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
out.write(bytes);
out.close()
return out.toSring();
} catch (IOException e) {
// This can never happen!
// Should I rethrow? Eat it? Print Error?
}
}

ByteArrayOutputStream will never throw IOException! You can look through its implementation and see that it is true! So why are you making me catch a phantom exception which can never happen and which I can not write a test for? As a result I cannot claim 100% coverage because of things outside my control.

Checked exceptions create dead code which will never execute.

Closures Don't Like You

Java does not have closures but it has visitor pattern. Let me explain with concrete example. I was creating a custom class loader and need to override load() method on MyClassLoader which throws ClassNotFoundException under some circumstances. I use ASM library which allows me to inspect Java bytecodes. The way ASM works is that it is a visitor pattern, I write visitors and as ASM parses the bytecodes it calls specific methods on my visitor implementation. One of my visitors, as it is examining bytecodes, decides that things are not right and needs to throw a ClassNotFondException which the class loader contract says it should throw. But now we have a problem. What we have on a stack is MyClassLoader -> ASMLibrary -> MyVisitor. MyVisitor wants to throw an exception which MyClassLoader expects but it can not since ClassNotFoundException is checked and ASMLibrary does not declare it (nor should it). So I have to throw RuntimeClassNotFoundException from MyVisitor which can pass through ASMLibrary which MyClassLoader can then catch and rethrow as ClassNotFoundException.

Checked exception get in the way of functional programing.

Lost Fidelity

Suppose java.sql package would be implemented with useful exception such as SqlDuplicateKeyExceptions and SqlForeignKeyViolationException and so on (we can wish) and suppose these exceptions are checked (which they are). We say that the SQL package has high fidelity of exception since each exception is to a very specific problem. Now lets say we have the same set up as before where there is some other layer between us and the SQL package, that layer can either redeclare all of the exceptions, or more likely throw its own. Let's look at an example, Hibernate is object-relational-database-mapper, which means it converts your SQL rows into java objects. So on the stack you have MyApplication -> Hibernate -> SQL. Here Hibernate is trying hard to hide the fact that you are talking to SQL so it throws HibernateExceptions instead of SQLExceptions. And here lies the problem. Your code knows that there is SQL under Hibernate and so it could have handled SqlDuplicateKeyException in some useful way, such as showing an error to the user, but Hibernate was forced to catch the exception and rethrow it as generic HibernateException. We have gone from high fidelitySqlDuplicateKeyException to low fidelity HibernateException. An so MyApplication can not do anything. Now Hibernate could have throw HibernateDuplicateKeyException but that means that Hibernate now has the same exception hierarchy as SQL and we are duplicating effort and repeating ourselves.

Rethrowing checked exceptions causes you to lose fidelity and hence makes it less likely that you could do something useful with the exception later on.

You can't do Anything Anyway

In most cases when exception is throw there is no recovery. We show a generic error to the user and log an exception so that we con file a bug and make sure that that exception will not happen again. Since 90+% of the exception are bugs in our code and all we do is log, why are we forced to rethrow it over and over again.

It is rare that anything useful can be done when checked exception happens, in most case we die with error! Therefor I want that to be the default behavior of my code with no additional typing.

How I deal with the code

Here is my strategy to deal with checked exceptions in java:

  • Always catch all checked exceptions at source and rethrow them as LogRuntimeException.

    • LogRuntimeException is my runtime un-checked exception which says I don't care just log it.

    • Here I have lost Exception fidelity.



  • All of my methods do not declare any exceptions

  • As I discover that I need to deal with a specific exception I go back to the source where LogRuntimeException was thrown and I change it to <Specific>RuntimeException (This is rarer than you think)

    • I am restoring the exception fidelity only where needed.



  • Net effect is that when you come across a try-catch clause you better pay attention as interesting things are happening there.

    • Very few try-catch calluses, code is much easier to read.

    • Very close to 100% test coverage as there is no dead code in my catch blocks.



Permalink | Links to this post | 30 comments

The Plague of Entropy

By James Whittaker


Mathematically entropy is a measure of uncertainty. If there are, say, five events then maximum entropy occurs when those five events are equally likely and minimum entropy when one of those events is certain and the other four impossible.

The more uncertain events you have to consider, the higher measured entropy climbs. People often think of entropy as a measure of randomness: the more (uncertain) events one must consider, the more random the outcome becomes.

Testers introduce entropy into development by adding to the number of things a developer has to do. When developers are writing code, entropy is low. When we submit bugs, we increase entropy. Bugs divert their attention from coding. They must now progress in parallel on creating and fixing features. More bugs means more parallel tasks and raises entropy. This entropy is one reason that bugs foster more bugs ... the entropic principle ensures it. Entropy creates more entropy! Finally there is math to show what is intuitively appealing: that prevention beats a cure.

However, there is nothing we can do to completely prevent the plague of entropy other than create developers who never err. Since this is unlikely any time soon we must recognize how and when we are introducing entropy and do what we can to manage it. The more we can do during development the better. Helping out in code reviews, educating our developers about test plans, user scenarios and execution environments so they can code against them will reduce the number of bugs we have to report. Smoking out bugs early, submitting them in batches and making sure we submit only high quality bugs by triaging them ourselves will keep their mind on development. Writing good bug reports and quickly regressing fixes will keep their attention where it needs to be. In effect, it maximizes the certainty of the 'development event' and minimizes the number and impact of bugs. Entropy thus tends toward it's minimum.

We can't banish this plague but if we can recognize the introduction of entropy into development and understand its inevitable effect on code quality, we can keep it at bay.

Permalink | Links to this post | 4 comments

It is not about writing tests, its about writing stories


I would like to make an analogy between building software and building a car. I know it is imperfect one, as one is about design and the other is about manufacturing, but indulge me, the lessons are very similar.

A piece of software is like a car. Lets say you would like to test a car, which you are in the process of designing, would you test is by driving it around and making modifications to it, or would you prove your design by testing each component separately? I think that testing all of the corner cases by driving the car around is very difficult, yes if the car drives you know that a lot of things must work (engine, transmission, electronics, etc), but if it does not work you have no idea where to look. However, there are some things which you will have very hard time reproducing in this end-to-end test. For example, it will be very hard for you to see if the car will be able to start in the extreme cold of the north pole, or if the engine will not overheat going full throttle up a sand dune in Sahara. I propose we take the engine out and simulate the load on it in a laboratory.

We call driving car around an end-to-end test and testing the engine in isolation a unit-test. With unit tests it is much easier to simulate failures and corner cases in a much more controlled environment. We need both tests, but I feel that most developers can only imagine the end-to-end tests.

But lets see how we could use the tests to design a transmission. But first, little terminology change, lets not call them test, but instead call them stories. They are stories because that is what they tell you about your design. My first story is that:

  • the transmission should allow the output shaft to be locked, move in same direction (D) as the input shaft, move in opposite (R) or move independently (N)


Given such a story I could easily create a test which would prove that the above story is true for any design submitted to me. What I would most likely get is a transmission which would only have a single gear in each direction. So lets write another story

  • the transmission should allow the ratio between input and output shaft to be [-1, 0, 1, 2, 3, 4]


Again I can write a test for such a transmission but i have not specified how the forward gear should be chosen, so such a transmission would most likely be permanently stuck in 1st gear and limit my speed, it will also over-rev the engine.

  • the transmission should start in 1st and than switch to higher gear before the engine reaches maximum revolutions.


This is better, but my transmission would most likely rev the engine to maximum before it would switch, and once it would switch to higher gear and I would slow down, it would not down-shift.

  • the transmission should down shift whenever the engine RPM fall bellow 1000 RPMs


OK, now it is starting to drive like a car, but still the limits for shifting really are 1000-6000 RPMs which is not very fuel efficient way to drive your car.

  • the transmission should up-shift whenever the estimated fuel consumption at a higher gear ration is better than the current one.


So now our engine will not rev any more but it will be a lazy car since once the transmission is in the fuel efficient mode it will not want to down-shift

  • the transmission should down-shift whenever the gas pedal is depressed more than 50% and the RPM is lower than the engine's peak output RPM.


I am not a transmission designer, but I think this is a decent start.

Notice how I focused on the end result of the transmission rather than on testing specific internals of it. The transmission designer would have a lot of levy in choosing how it worked internally, Once we would have something and we would test it in the real world we could augment these list of stories with additional stories as we discovered additional properties which we would like the transmission to posses.

If we would decide to change the internal design of the transmission for whatever reason we would have these stories as guides to make sure that we did not forget about anything. The stories represent assumptions which need to be true at all times. Over the lifetime of the component we can collect hundreds of stories which represent equal number of assumption which is built into the system.

Now imagine that a new designer comes on board and makes a design change which he believes will improve the responsiveness of the transmission, he can do so because the existing stories are not restrictive in how, only it what the outcome should be. The stories save the designer from breaking an existing assumption which was already designed into the transmission.

Now lets contrast this with how we would test the transmission if it would already be build.

  • test to make sure all of the gears work

  • test to make sure that the engine is not allowed to over-rev


It is hard now to think about what other tests to write, since we are not using the tests to drive the design. Now, lets say that someone now insist that we get 100% coverage, we open the transmission up and we see all kinds of logic, and rules and we don't know why since we were not part of the design so we write a test

  • at 3000 RPM input shaft, apply 100% throttle and assert that the transmission goes to 2nd gear.


Tests like that are not very useful when you want to change the design, since you are likely to break the test, without fully understanding why the test was testing that specific conditions, it is hard to know if anything was broken if the tests is red.. That is because the tests does not tell a story any more, it only asserts the current design. It is likely that such a test will be in the way when you will try to do design changes. The point I am trying to make is that there is huge difference between writing tests before or after. When we write tests before we are:

  • creating a story which is forcing a particular design decision.

  • tests are a collection of assumptions which needs to be true at all times.


when we write tests after the fact we:

  • miss a lot of reasons why things are done in particular way even if we have 100% coverage

  • test are often brittle because they are tied to particulars of the current implementation

  • tests are just snapshots and don't tell a story of why the component does something, only that it does.


For this reason there are huge differences in quality when writing assumptions as stories before (which force design to emerge) or writing tests after which take a snapshot of a given design.

Permalink | Links to this post | 8 comments

The 7th Plague and Beyond

By James Whittaker


Sorry I haven't followed up on this, let the excuse parade begin: A) My new book just came out and I have spent a lot of time corresponding with readers. B) I have taken on leadership of some new projects including the testing of Chrome and Chrome OS (yes you will hear more about these projects right here in the future). C) I've gotten just short of 100 emails suggesting the 7th plague and that takes time to sort through.

This is clearly one plague-ridden industry (and, no, I am not talking about my book!)

I've thrown out many of them that deal with a specific organization or person who just doesn't take testing seriously enough. Things like the Plague of Apathy (suggested exactly 17 times!) just doesn't fit. This isn't an industry plague, it's a personal/group plague. If you don't care about quality, please do us all a favor and get out of the software business. Go screw someone else's industry up, we have enough organic problems we have to deal with. I also didn't put down the Plague of the Deluded Developer (suggested by various names 22 times) because it dealt with developers that as a Googler I no longer have to deal with ... those who think they never write bugs. Our developers know better and if I find out exactly where they purchased that clue I will forward the link.

Here's some of the best. As many of them have multiple suggesters I have credited the persons who were either first or gave the most thoughtful analysis. Feel free, if you are one of these people, to give further details or clarifications in the comments of this post as I am sure these summaries do not do them justice.

The Plague of Metrics (Nicole Klein, Curtis Pettit plus 18 others): Metrics change behavior and once a tester knows how the measurement works, they test to make themselves look good or say what they want it to say ignoring other more important factors. The metric becomes the goal instead of measuring progress. The distaste for metrics in many of these emails was palpable!

The Plague of Semantics (Chris LeMesurier plus 3 others): We misuse and overuse terms and people like to assign their own meaning to certain terms. It means that designs and specs are often misunderstood or misinterpreted. This was also called the plague of assumptions by other contributors.

The Plague of Infinity (Jarod Salamond, Radislav Vasilev and 14 others): The testing problem is so huge it's overwhelming. We spend so much time trying to justify our coverage and explain what we are and are not testing that it takes away from our focus on testing. Every time we take a look at the testing problem we see new risks and new things that need our attention. It randomizes us and stalls our progress. This was also called the plague of endlessness and exhaustion.

The Plague of Miscommunication (Scott White and 2 others): The language of creation (development) and the language of destruction (testing) are different. Testers write a bug report and the devs don't understand it and cycles have to be spent explaining and reexplaining. A related plague is the lack of communication that causes testers to redo work and tread over the same paths as unit tests, integration tests and even the tests that other testers on the team are performing. This was also called the plague of language (meaning lack of a common one).

The Plague of Rigidness (Roussi Roussev, Steven Woody, Michele Smith and 5 others): Sticking to the plan/process/procedure no matter what. Test strategy cannot be bottled in such a manner yet process heavy teams often ignore creativity for the sake of process. We stick with the same stale testing ideas product after product, release after release. This was also called the plague of complacency. Roussi suggested a novel twist calling this the success plague where complacency is brought about through success of the product. How can we be wrong when our software was so successful in the market?

And I have my own 7th Plague that I'll save for the next post. Unless anyone would like to write it for me? It's called the Plague of Entropy. A free book to the person who nails it.






Permalink | Links to this post | 9 comments