TotT: Avoiding friend Twister in C++

(resuming our testing on the toilet posts...)

In a previous episode, we extracted methods to simplify testing in Python. But if these extracted methods make the most sense as private class members, how can you write your production code so it doesn't depend on your test code? In Python this is easy; but in C++, testing private members requires more friend contortions than a game of Twister®.


// my_package/dashboard.h
class Dashboard {
private:
scoped_ptr<Database> database_; // instantiated in constructor

// Declaration of functions GetResults(), GetResultsFromCache(),
// GetResultsFromDatabase(), CountPassFail()

friend class DashboardTest; // one friend declaration per test
// fixture
};

You can apply the Extract Class and Extract Interface refactorings to create a new helper class containing the implementation. Forward declare the new interface in the .h of the original class, and have the original class hold a pointer to the interface. (This is similar to the Pimpl idiom.) You can distinguish between the public API and the implementation details by separating the headers into different subdirectories (/my_package/public/ and /my_package/ in this example):


// my_package/public/dashboard.h
class ResultsLog; // extracted helper interface
class Dashboard {
public:
explicit Dashboard(ResultsLog* results) : results_(results) { }
private:
scoped_ptr<ResultsLog> results_;
};

// my_package/results_log.h
class ResultsLog {
public:
// Declaration of functions GetResults(),
// GetResultsFromCache(),
// GetResultsFromDatabase(), CountPassFail()
};

// my_package/live_results_log.h
class LiveResultsLog : public ResultsLog {
public:
explicit LiveResultsLog(Database* database)
: database_(database) { }
};

Now you can test LiveResultsLog without resorting to friend declarations. This also enables you to inject MockResultsLog instance when testing the Dashboard class. The functionality is still private to the original class, and the use of a helper class results in smaller classes with better-defined responsibilities.

Remember to download this episode of Testing on the Toilet and post it in your office.

Permalink | Links to this post | 5 comments

Follow up: Call for posts...

Looks like there some strong support for having the community post on this blog. Great! Please send submissions to testengteam at gmail.com. In the subject line please prefix with "blogme:" to aid in spam filtering. See the 3 basic rules in the previous post. Looking forward to opening up to your great ideas.

Permalink | Links to this post | 0 comments

Call for posts...

Posted by Patrick Copeland, Test Engineering Director

I’d like to offer the readers a chance to post their ideas on this blog.

To keep this simple I only have 3 rules:

  1. no commercial postings or links to commercial sites,
  2. post need to be interesting and practical ideas relating to testing, and
  3. posts need to be proof read and reviewed on your end before submitting.

I’ll read the potential posts and make a decision within 1-2 weeks about posting it. Remember that this is a popular blog, the content is public, and your name will be on the posts. Make sure you don’t reveal proprietary information. If you are interested please post a comment to this entry. If enough people are interested, I’ll post later with an email address for the submissions.

Permalink | Links to this post | 22 comments

Automating tests vs. test-automation

Posted by Markus Clermont, Test Engineering Manager, Zurich

In the last couple of years the practice of testing has undergone more than superficial changes. We have turned our art into engineering, introduced process-models, come up with best-practices, and developed tools to support our daily work and make each test engineer more productive. Some tools target test execution. They aim to automate the repetitive steps that a tester would take to exercise functions through the user interface of a system in order to verify its functionality. I am sure you have all seen tools like Selenium, WebDriver, Eggplant or other proprietary solutions, and that you learned to love them.

On the downside, we observe problems when we employ these tools:

  • Scripting your manual tests this way takes far longer than just executing them manually.
  • The UI is one of the least stable interfaces of any system, so we can start automating quite late in the development phase.
  • Maintenance of the tests takes a significant amount of time.
  • Execution is slow, and sometimes cumbersome.
  • Tests become flaky.
  • Tests break for the wrong reasons.
Of course, we can argue that none of these problems is particularly bad, and the advantages of automation still outweigh the cost. This might well be true. We learned to accept some of these problems as 'the price of automation', whereas others are met by some common-sense workarounds:
  • It takes long to automate a test—Well, let's automate only tests that are important, and will be executed again and again in regression testing.
  • Execution might be slow, but it is still faster than manual testing.
  • Tests cannot break for the wrong reason—When they break we found a bug.
In the rest of this post I'd like to summarize some experiences I had when I tried to overcome these problems, not by working around them, but by eliminating their causes.

Most of these problems are rooted in the fact that we are just automating manual tests. By doing so we are not taking into account whether the added computational power, access to different interfaces, and faster execution speed should make us change the way we test systems.

Considering the fact that a system exposes different interfaces to the environment—e.g., the user-interface, an interface between front-end and back-end, an interface to a data-store, and interfaces to other systems—it is obvious that we need to look at each and every interface and test it. More than that we should not only take each interface into account but also avoid testing the functionality in too many different places.

Let me introduce the example of a store-administration system which allows you to add items to the store, see the current inventory, and remove items. One straightforward manual test case for adding an item would be to go to the 'Add' dialogue, enter a new item with quantity 1, and then go to the 'Display' dialogue to check that it is there. To automate this test case you would instrument exactly all the steps through the user-interface.

Probably most of the problems I listed above will apply. One way to avoid them in the first place would have been to figure out how this system looks inside.
  • Is there a database? If so, the verification should probably not be performed against the UI but against the database.
  • Do we need to interface with a supplier? If so, how should this interaction look?
  • Is the same functionality available via an API? If so, it should be tested through the API, and the UI should just be checked to interact with the API correctly.
This will probably yield a higher number of tests, some of them being much 'smaller' in their resource requirements and executing far faster than the full end-to-end tests. Applying these simple questions will allow us to:
  • write many more tests through the API, e.g., to cover many boundary conditions,
  • execute multiple threads of tests on the same machine, giving us a chance to spot race-conditions,
  • start earlier with testing the system, as we can test each interface when it becomes 'quasi-stable',
  • makes maintenance of tests and debugging easier, as the tests break closer to the source of the problem,
  • require fewer machine resources, and still execute in reasonable time.
I am not advocating the total absence of UI tests here. The user interface is just another interface, and so it deserves attention too. However I do think that we are currently focusing most of our testing-efforts on the UI. The common attitude, that the UI deserves most attention because it is what the user sees, is flawed. Even a perfect UI will not satisfy a user if the underlying functionality is corrupt.

Neither should we abandon our end-to-end tests. They are valuable and no system can be considered tested without them. Again, the question we need to ask ourselves is the ratio between full end-to-end tests and smaller integration tests.

Unfortunately, there is no free lunch. In order to change the style of test-automation we will also need to change our approach to testing. Successful test-automation needs to:
  • start early in the development cycle,
  • take the internal structure of the system into account,
  • have a feedback loop to developers to influence the system-design.
Some of these points require quite a change in the way we approach testing. They are only achievable if we work as a single team with our developers. It is crucial that there is an absolute free flow of information between the different roles in this team.

In previous projects we were able to achieve this by
  • removing any spatial separation between the test engineers and the development engineers. Sitting on the next desk is probably the best way to promote information exchange,
  • using the same tools and methods as the developers,
  • getting involved into daily stand-ups and design-discussions.
This helps not only in getting involved really early (there are projects where test development starts at the same time as development), but it is also a great way to give continuous feedback. Some of the items in the list call for very development-oriented test engineers, as it is easier for them to be recognized as a peer by the development teams.

To summarize, I figured out that a successful automation project needs:
  • to take the internal details and exposed interface of the system under test into account,
  • to have many fast tests for each interface (including the UI),
  • to verify the functionality at the lowest possible level,
  • to have a set of end-to-end tests,
  • to start at the same time as development,
  • to overcome traditional boundaries between development and testing (spatial, organizational and process boundaries), and
  • to use the same tools as the development team.

Permalink | Links to this post | 15 comments

Overview of Infrastructure Testing

Posted by Marc Kaplan, Test Engineering Lead

At Google, we have infrastructure that is shared between many projects. This infrastructure creates a situation where we have a many dependencies in terms of build requirements, but also in terms of test requirements. We've found that we actually need two approaches to deal with these requirements depending on whether we are looking to run larger system tests or smaller unittests, both of which ultimately need to be executed to improve quality.

For unittests, we are typically interested in only the module or function that is under test at the time, and we don't care as much about downstream dependencies, except insofar as they relate to the module under test. So we will typically write test mocks to mock out the downstream components that we aren't interested in actually running that simulate their behaviors and failure modes. Of course, this can only be done after understanding how the downstream module works and interfaces with our module.

As an example of mocking out a downstream component in Bigtable, we want to simulate the failure of Chubby , our external lockservice, so we we write a Chubby test mock that simulates the various ways that Chubby can interact with Bigtable. We then use this for the Bigtable unittests so that they a) run faster, b) reduce external dependencies and c) enable us to simulate various failure and retry conditions in the Bigtable Chubby related code.

There are also cases where we want to simulate components that are actually upstream to the component under test. In these cases we write what is called a test driver. This is very similar to a mock, except that instead of being called by our module (downstream) it calls our module (upstream). For example, if Bigtable component has some Mapreduce specific handling, we might want to write a test driver to simulate these Mapreduce-specific interfaces so we don't have to run the full Mapreduce framework inside our unittest framework. The benefits are all the same as those of using test mocks. In fact, in many cases it may be desirable to use both drivers and mocks, or perhaps multiple of each.

In system tests where we're more interested in the true system behaviors and timings, or in other cases where we can't write a driver or mocks we might turn to fault injection. Typically, this involves either completely failing certain components sporadically in system tests, or injecting particular faults via a fault injection layer that we write. Looking back to Bigtable again, since Bigtable uses GFS when we run system tests, we are running fault injection for GFS by failing actual masters and chunkservers sporadically, and seeing how Bigtable reacts under load to verify that when we deploy new versions of Bigtable that they it will work given the frequent rate of hardware failures. Another approach that we're currently work on is actually simulating the GFS behavior via a fault injection library so we can reduce the need to use private GFS cells which will result in better use of resources.

Overall, the use of Test Drivers, Test Mocks, and Fault Injection allows developers and test engineers at Google to test components more accurately, quickly, and above all helps improve quality.

Permalink | Links to this post | 1 comments

Testing Google Mashup Editor Class

Posted by Patrick Copeland, Test Engineering Director

Wanted to let you know about a partnership Google Test Engineering is doing with the University of California, Irvine. We've teamed up with Professor Hadar Ziv to sponsor a course that focuses on preparing students for industry (code.google.com and several other companies are also participating). Naturally, our project focuses on testing. George Pirocanac is heading up this work and recently went down to Irvine to talk about how they will test our mash-up editor. Here's the basic project outline if you are curious.

Class Project Plan: Testing Google's Mash-up Editor


Overall Class Goal
: To understand the basic software functional testing concepts through the experience of a case study of testing the Google Mash-up Editor and to provide meaningful feedback to Google about the effectiveness and usability of the tool.

Phase I - Gaining Domain expertise and Exploratory Testing
(four months)
Goals: Be able to explain what a mash-up is and why it is becoming important in today's internet. Be able to code a simple mash-up using a javascript api. Be able to code that same mash-up using Google Mash-up Editor tags. Be able to outline the basic features of the Google Mash-up editor. Be able to identify the essential elements of a functional test plan. Create a functional test plan outline for the Google Mash-up editor.

Phase II - Test Plan Execution over time
(Keeping in step with development) (three months)
Goals: Be able to identify the major challenges in executing a test plan during the life of a software project. Be able to identify testing technologies for dealing with these challenges. Be able to identify the effectiveness of a testing approach. Execute the test plan and provide feedback to Google.

Phase III - Usability & Competing Technologies Survey
(two months)
Goals: Be able to identify the essential elements of a usability study. Apply the topic of usability to programming. Compare and contrast the GME with three other industry mash-up editors.

Permalink | Links to this post | 5 comments

Performance Testing

Posted by Goranka Bjedov, Senior Test Engineer

This post is my best shot at explaining what I do, why I do it, and why I think it is the right thing to do. Performance testing is a category of testing that seems to evoke strong feelings in people: feelings of fear (Oh, my God, I have no idea what to do because performance testing is so hard!), feelings of inadequacy (We bought this tool that does every aspect of performance testing, we paid so much for it, and we are not getting anything done!), feelings of confusion (So, what the heck am I supposed to be doing again?), and I don't think this is necessary.

Think of performance testing as another tool in your testing arsenal - something you will do when you need to. It explores several system qualities, that can be simplified to:

  • Speed - does the system respond quickly enough
  • Capacity - is the infrastructure sized adequately
  • Scalability - can the system grow to handle future volumes
  • Stability - does the system behave correctly under load

So, I do performance testing of a service when risk analysis indicates that failing in any of the above categories would be more costly to the company than performing the tests. (Which, if your name is Google and you care about your brand, happens with any service you launch.) Note that I am talking about services - I work almost exclusively with servers and spend no time worrying about client-side rendering/processing issues. While those are becoming increasingly more important, and have always been more complex than my work, I consider those to be a part of functionality tests, and they are designed, created and executed by functional testing teams.

Another interesting thing about performance testing is that you will never be able to be 100% "right" or 100% "done. Accept it, deal with it, and move on. Any system in existence today will depend on thousands of different parameters, and if I spent the time analyzing each one of them, understanding the relationships between each two or each three, graphing their impact curves, trying to non-dimensionalize them, I would still be testing my first service two years later. The thought of doing anything less filled me with horror (They cannot seriously expect me to provide meaningful performance results in less than a year, can they?) but I have since learned that I can provide at least 90% of meaningful information to my customers by applying only 10% of my total effort and time. And, 90% is more than enough for vast majority of problems.

So, here is what I really do - I create benchmarks. If I am lucky and have fantastic information about current usage patterns of a particular product (which I usually do), I will make sure this benchmark covers most operations that are top resource hogs (either per single use or cumulative). I'll run this benchmark with different loads (number of virtual users) against a loosely controlled system (it would be nice to have 100 machines all to myself for every service we have, which I can use once a day or once a week, but that would be expensive and unrealistic) and investigate its behavior. Which transactions are taking the most time? Which transactions seem to get progressively worse with increasing load? Which transactions seem unstable (I cannot explain their behavior)? I call this exploratory performance testing, and I'll repeat my tests until I am convinced I am observing real system behavior. While I am doing this, I make sure I am not getting biased by investigating the code. If I have questions, I ask programmers, but I know they are biased, and I will avoid getting biased myself!

Once I have my graphs (think, interesting transaction latencies and throughput vs. load here) I meet with the development team and discuss the findings. Usually, there is one or two things they know and have been working on, and a few more they were unaware of. Sometimes, they look over my benchmark and suggest changes (could you make the ratio 80:20, and not 50:50?) After this meeting, we create our final benchmark, I modify the performance testing scripts, and now this benchmark will run as often as possible, but hopefully at least once a night. And, here is the biggest value of this effort: if there is a code change that has impacted performance in an unacceptable way, you will find out about it the next day. Not a week or a month later (How many of us remember what we did in the last month? So, why expect our developers to do so?)

Here is why I think this is the right thing to do: I have seen more bad code developed as a result of premature performance optimizations - before the team even thought they had a problem! Please don't do that. Develop your service in a clean, maintainable and extensible manner. Let me test it, and keep regression testing it. If we find we have a problem in a particular area, we can then address that problem easily - because our code is not obfuscated with performance optimization that have improved code paths that execute once a month by 5%.

I can usually do this in two - four weeks depending on the complexity of the project. Occasionally, we will find an issue that cannot be explained or understood with performance tests. At that point in time, we look under the hood. This is where performance profiling and performance modeling come in. And, both of those are considerably more complex than performance testing. Both great tools, but should be used only when the easy tool fails.

Tools, tools, tools... So, what do we use? I gave a presentation at Google Test Automation Conference in London on exactly this topic. I use open source tools. I discuss the reasons why in the presentation. In general, even if you have decided to go one of the other two routes (vendor tools or develop your own) check out what is available. You may find out that you will get a lot of information about your service using JMeter and spending some time playing around with it. Sure, you can also spend $500K and get similar information or you can spend two years developing "the next best performance testing tool ever," but before you are certain free is not good enough, why would you want to?

Final word: monitor your services during performance tests. If you do not have service related monitoring developed and set up to be used during live operations, you do not need performance testing. If the risks of your service failing are not important enough that you would want to know about it *before* it happens, then you should not be wasting time or money on performance testing. I am incredibly lucky in this area - Google infrastructure is developed by a bunch of people who, if they had a meeting where the topic would be "How to make Goranka's life easy?", could not have done better. I love them - they make my job trivial. At a minimum, I monitor CPU, memory and I/O usage. I cannot see a case when you would want to do less, but you may want to do a lot more on occasion.

Permalink | Links to this post | 7 comments

Post Release: Closing the loop

Posted by Michael Bachman, Test Engineering Manager

A testing organization's job is not done with the release of a product. As the software development cycle does not end with the release of the product but has an extension into the post-release diagnostics and evaluation. Learning from post-release metrics like product performance, defects, and behavior after it is in production (or in the field) provides valuable input into how to adjust future testing and development techniques. Measuring product defect trends and performance, and analyzing those results, can identify holes in test coverage, prevent bugs, plug gaps in the release cycle or product life cycle, and determine if the pre-release test environment was adequately representative of key customer scenarios.

Here are a few metrics that can help jump-start this effort

Pre- versus post-production defect ratio: This metric measures the ratio of total number of defects found before production divided by the overall number of defects in the product (including the post-release issues). This lets a team measure how many defects are being caught before release. This effort supplements the age old valuable practice of partnering with Product Support to measure incidents/defect rate. The goal is not and should not be to indulge in the blame game of "A defect found after release is test's "fault," " but as a means to find ways to make the product release cycle better. The thing to focus on is to identify what were the causes for the issue - gaps in the release cycle, communication mishaps between product, development, and testing, inadequate test environments, or the overall testability of a product. There may not be a perfect metric, but obvious ones might be: time to resolutions (how long it takes to react to a broken issue), cost to the customer, or cost to customer support. The main point is to agree on a cross-organizational metric, track it, do the root cause analysis, and make the time to change.

Breakdown of defects by component or functional area: In conjunction with monitoring which defects are found in production, categorizing them by functional area and component provides the necessary information to highlight trends and, more specifically, problematic areas. When a problematic component is identified, the test team can fill holes in test coverage, unit test coverage, product usability issues or the life cycle managing that functional area. Also, trending defects by component over time has additional advantages like - this data provides a better sense of the quality of particular components as they age, also provides the information of how effective the changes(if any) that were introduced in the engineering practices resulting in better quality and finally measure of introduction of new functionality caused any de-stabilizing effects to the system. This metric will allow product teams in making informed decisions regarding the product. Potential outcomes are resources allocation changes, feature deprecation or redesign of the features.

Performance measurements (CPU usage, memory consumption, disk load, database load, latency, etc.): Without going into the various load and performance (L&P) measurements one can monitor within a product (since that can be a whole separate article in itself :) ) the product teams should ensure they have mechanisms to measure key and product relevant metrics can be collected. In order to gauge the effectiveness of the test environments these measurements need to obtained both in the test labs and production environments. Identifying these mismatches allows test organizations to correct any topology issues early and before any subsequent releases (or similar releases).

Furthermore analysis of these could expose multiple causes of why the product behavior was different in production than in test labs. Some examples (like we have found at Google) of these issues are : different machine hardware, load mismatch on the system, localized tests not measuring round-trip latency, the number of concurrent users hitting the product simultaneously (your testing team compared to the potential millions of users in production). It is important to remember that, the most important point here to measure and monitor the performance of the product as well as determine the adequacy of the test environment. When a performance issue is found in production, ask yourself "could we have caught that in the test environment?"

Are users or your monitoring systems finding the defects? Having reliable monitoring and debugging systems, logs, and notifications are key in reacting immediately to large defects in production, as well as potentially finding the defect before your users do. Some of the best practices of engineering teams (and those followed at Google) are real-time notifications of exceptions, load, and performance, pager alerts when systems are unavailable, as well as robust logs to help developers and testing debug the system state before and after a crash. There are many open source and commercial solutions to these pieces rather than building in-house solutions. The bulk of the effort in setting up reliable monitoring typically lies in development, but it is key for test teams to assist in identifying the need and also ensure they are utilized in their test environments. This not only allows the testing organization to test the functionality of the monitoring tools, but also alerts testing of defects that might have slipped through and are not directly visible on a front-end.

So, what does this get you? A solid picture of the product's quality and performance trend over time. Measuring lets testing tweak their coverage and environments, as well as analyze how the team works together. Reacting to the findings helps open communication channels between development, production, and testing, and lets them join together to debug and reproduce defects and eventually reduce defects. Every part of the larger team can watch defect trends; help prioritize resources and features, and better increase unit test, system test, and performance test coverage. Getting to a robust, real-time defect, performance, monitoring framework takes effort from all teams, but, in the end, everyone can reap the benefit, especially, and most important, your users.

Permalink | Links to this post | 9 comments