Friday, October 26, 2012

Why Are There So Many C++ Testing Frameworks?

By Zhanyong Wan - Software Engineer

These days, it seems that everyone is rolling their own C++ testing framework, if they haven't done so already. Wikipedia has a partial list of such frameworks. This is interesting because many OOP languages have only one or two major frameworks. For example, most Java people seem happy with either JUnit or TestNG. Are C++ programmers the do-it-yourself kind?

When we started working on Google Test (Google’s C++ testing framework), and especially after we open-sourced it, people began asking us why we were doing it. The short answer is that we couldn’t find an existing C++ testing framework that satisfied all our needs. This doesn't mean that these frameworks were all poorly designed or implemented. Rather, many of them had great ideas and tricks that we learned from. However, Google had a huge number of C++ projects that got compiled on various operating systems (Linux, Windows, Mac OS X, and later Android, among others) with different compilers and all kinds of compiler flags, and we needed a framework that worked well in all these environments and ccould handle many different types and sizes of projects.

Unlike Java, which has the famous slogan "Write once, run anywhere," C++ code is being written in a much more diverse environment. Due to the complexity of the language and the need to do low-level tasks, compatibility between different C++ compilers and even different versions of the same compiler is poor. There is a C++ standard, but it's not well supported by compiler vendors. For many tasks you have to rely on unportable extensions or platform-specific functionality. This makes it hard to write a reasonably complex system that can be built using many different compilers and works on many platforms.

To make things more complicated, most C++ compilers allow you to turn off some standard language features in return for better performance. Don't like using exceptions? You can turn it off. Think dynamic cast is bad? You can disable Run-Time Type Identification, the feature behind dynamic cast and run-time access to type information. If you do any of these, however, code using these features will fail to compile. Many testing frameworks rely on exceptions. They are automatically out of the question for us since we turn off exceptions in many projects (in case you are curious, Google Test doesn’t require exceptions or run-time type identification by default; when these language features are turned on, Google Test will try to take advantage of them and provide you with more utilities, like the exception assertions.).

Why not just write a portable framework, then? Indeed, that's a top design goal for Google Test. And authors of some other frameworks have tried this too. However, this comes with a cost. Cross-platform C++ development requires much more effort: you need to test your code with different operating systems, different compilers, different versions of them, and different compiler flags (combine these factors and the task soon gets daunting); some platforms may not let you do certain things and you have to find a workaround there and guard the code with conditional compilation; different versions of compilers have different bugs and you may have to revise your code to bypass them all; etc. In the end, it's hard unless you are happy with a bare-bone system.

So, I think a major reason that we have many C++ testing frameworks is that C++ is different in different environments, making it hard to write portable C++ code. John's framework may not suit Bill's environment, even if it solves John's problems perfectly.

Another reason is that some limitations of C++ make it impossible to implement certain features really well, and different people chose different ways to workaround the limitations. One notable example is that C++ is a statically-typed language and doesn't support reflection. Most Java testing frameworks use reflection to automatically discover tests you've written such that you don't have to register them one-by-one. This is a good thing as manually registering tests is tedious and you can easily write a test and forget to register it. Since C++ has no reflection, we have to do it differently. Unfortunately there is no single best option. Some frameworks require you to register tests by hand, some use scripts to parse your source code to discover tests, and some use macros to automate the registration. We prefer the last approach and think it works for most people, but some disagree. Also, there are different ways to devise the macros and they involve different trade-offs, so the result is not clear cut.

Let’s see some actual code to understand how Google Test solves the test registration problem. The simplest way to add a test is to use the TEST macro (what else would we name it?):

TEST(Subject, HasCertainProperty) {
  … testing code goes here …
}

This defines a test method whose purpose is to verify that the given subject has the given property. The macro automatically registers the test with Google Test such that it will be run when the test program (which may contain many such TEST definitions) is executed.

Here’s a more concrete example that verifies a Factorial() function works as expected for positive arguments:

TEST(FactorialTest, HandlesPositiveInput) {
  EXPECT_EQ(1, Factorial(1));
  EXPECT_EQ(2, Factorial(2));
  EXPECT_EQ(6, Factorial(3));
  EXPECT_EQ(40320, Factorial(8));
}

Finally, many C++ testing framework authors neglected extensibility and were happy just providing canned solutions, so we ended up with many solutions, each satisfying a different niche but none general enough. A versatile framework must have a good extension story. Let's face it: you cannot be all things to all people, no matter what. Instead of bloating the framework with rarely used features, we should provide good out-of-box solutions for maybe 95% of the use cases, and leave the rest to extensions. If I can easily extend a framework to solve a particular problem of mine, I will feel less motivated to write my own thing. Unfortunately, many framework authors don't seem to see the importance of extensibility. I think that mindset contributed to the plethora of frameworks we see today. In Google Test, we try to make it easy to expand your testing vocabulary by defining custom assertions that generate informative error messages. For instance, here’s a naive way to verify that an int value is in a given range:

bool IsInRange(int value, int low, int high) {
  return low <= value && value <= high;
}
  ...
  EXPECT_TRUE(IsInRange(SomeFunction(), low, high));

The problem is that when the assertion fails, you only know that the value returned by SomeFunction() is not in range [low, high], but you have no idea what that return value and the range actually are -- this makes debugging the test failure harder.

You could provide a custom message to make the failure more descriptive:

  EXPECT_TRUE(IsInRange(SomeFunction(), low, high))
      << "SomeFunction() = " << SomeFunction() 
      << ", not in range ["
      << low << ", " << high << "]";

Except that this is incorrect as SomeFunction() may return a different answer each time.  You can fix that by introducing an intermediate variable to hold the function’s result:

  int result = SomeFunction();
  EXPECT_TRUE(IsInRange(result, low, high))
      << "result (return value of SomeFunction()) = " << result
      << ", not in range [" << low << ", " << high << "]";

However this is tedious and obscures what you are really trying to do.  It’s not a good pattern when you need to do the “is in range” check repeatedly. What we need here is a way to abstract this pattern into a reusable construct.

Google Test lets you define a test predicate like this:

AssertionResult IsInRange(int value, int low, int high) {
  if (value < low)
    return AssertionFailure()
        << value << " < lower bound " << low;
  else if (value > high)
    return AssertionFailure()
        << value << " > upper bound " << high;
  else
    return AssertionSuccess()
        << value << " is in range [" 
        << low << ", " << high << "]";
}

Then the statement EXPECT_TRUE(IsInRange(SomeFunction(), low, high)) may print (assuming that SomeFunction() returns 13):

   Value of: IsInRange(SomeFunction(), low, high)
     Actual: false (13 < lower bound 20)
   Expected: true

The same IsInRange() definition also lets you use it in an EXPECT_FALSE context, e.g. EXPECT_FALSE(IsInRange(AnotherFunction(), low, high)) could print:

   Value of: IsInRange(AnotherFunction(), low, high)
     Actual: true (25 is in range [20, 60])
   Expected: false

This way, you can build a library of test predicates for your problem domain, and benefit from clear, declarative test code and descriptive failure messages.

In the same vein, Google Mock (our C++ mocking framework) allows you to easily define matchers that can be used exactly the same way as built-in matchers.  Also, we have included an event listener API in Google Test for people to write plug-ins. We hope that people will use these features to extend Google Test/Mock for their own need and contribute back extensions that might be generally useful.

Perhaps one day we will solve the C++ testing framework fragmentation problem, after all. :-)

1 comment:

  1. "... the TEST macro (what else would we name it?):"

    According to Kevlin Henney the word 'case' has significance here too:

    "A test case corresponds to a single case, it has a single reason to change.

    Kevlin Henney in "Making Steaks from Sacred Cows", at ca. 0:58, http://vimeo.com/105758303

    ReplyDelete

The comments you read and contribute here belong only to the person who posted them. We reserve the right to remove off-topic comments.