Monday, October 08, 2007

Performance Testing

Posted by Goranka Bjedov, Senior Test Engineer

This post is my best shot at explaining what I do, why I do it, and why I think it is the right thing to do. Performance testing is a category of testing that seems to evoke strong feelings in people: feelings of fear (Oh, my God, I have no idea what to do because performance testing is so hard!), feelings of inadequacy (We bought this tool that does every aspect of performance testing, we paid so much for it, and we are not getting anything done!), feelings of confusion (So, what the heck am I supposed to be doing again?), and I don't think this is necessary.

Think of performance testing as another tool in your testing arsenal - something you will do when you need to. It explores several system qualities, that can be simplified to:

  • Speed - does the system respond quickly enough
  • Capacity - is the infrastructure sized adequately
  • Scalability - can the system grow to handle future volumes
  • Stability - does the system behave correctly under load

So, I do performance testing of a service when risk analysis indicates that failing in any of the above categories would be more costly to the company than performing the tests. (Which, if your name is Google and you care about your brand, happens with any service you launch.) Note that I am talking about services - I work almost exclusively with servers and spend no time worrying about client-side rendering/processing issues. While those are becoming increasingly more important, and have always been more complex than my work, I consider those to be a part of functionality tests, and they are designed, created and executed by functional testing teams.

Another interesting thing about performance testing is that you will never be able to be 100% "right" or 100% "done. Accept it, deal with it, and move on. Any system in existence today will depend on thousands of different parameters, and if I spent the time analyzing each one of them, understanding the relationships between each two or each three, graphing their impact curves, trying to non-dimensionalize them, I would still be testing my first service two years later. The thought of doing anything less filled me with horror (They cannot seriously expect me to provide meaningful performance results in less than a year, can they?) but I have since learned that I can provide at least 90% of meaningful information to my customers by applying only 10% of my total effort and time. And, 90% is more than enough for vast majority of problems.

So, here is what I really do - I create benchmarks. If I am lucky and have fantastic information about current usage patterns of a particular product (which I usually do), I will make sure this benchmark covers most operations that are top resource hogs (either per single use or cumulative). I'll run this benchmark with different loads (number of virtual users) against a loosely controlled system (it would be nice to have 100 machines all to myself for every service we have, which I can use once a day or once a week, but that would be expensive and unrealistic) and investigate its behavior. Which transactions are taking the most time? Which transactions seem to get progressively worse with increasing load? Which transactions seem unstable (I cannot explain their behavior)? I call this exploratory performance testing, and I'll repeat my tests until I am convinced I am observing real system behavior. While I am doing this, I make sure I am not getting biased by investigating the code. If I have questions, I ask programmers, but I know they are biased, and I will avoid getting biased myself!

Once I have my graphs (think, interesting transaction latencies and throughput vs. load here) I meet with the development team and discuss the findings. Usually, there is one or two things they know and have been working on, and a few more they were unaware of. Sometimes, they look over my benchmark and suggest changes (could you make the ratio 80:20, and not 50:50?) After this meeting, we create our final benchmark, I modify the performance testing scripts, and now this benchmark will run as often as possible, but hopefully at least once a night. And, here is the biggest value of this effort: if there is a code change that has impacted performance in an unacceptable way, you will find out about it the next day. Not a week or a month later (How many of us remember what we did in the last month? So, why expect our developers to do so?)

Here is why I think this is the right thing to do: I have seen more bad code developed as a result of premature performance optimizations - before the team even thought they had a problem! Please don't do that. Develop your service in a clean, maintainable and extensible manner. Let me test it, and keep regression testing it. If we find we have a problem in a particular area, we can then address that problem easily - because our code is not obfuscated with performance optimization that have improved code paths that execute once a month by 5%.

I can usually do this in two - four weeks depending on the complexity of the project. Occasionally, we will find an issue that cannot be explained or understood with performance tests. At that point in time, we look under the hood. This is where performance profiling and performance modeling come in. And, both of those are considerably more complex than performance testing. Both great tools, but should be used only when the easy tool fails.

Tools, tools, tools... So, what do we use? I gave a presentation at Google Test Automation Conference in London on exactly this topic. I use open source tools. I discuss the reasons why in the presentation. In general, even if you have decided to go one of the other two routes (vendor tools or develop your own) check out what is available. You may find out that you will get a lot of information about your service using JMeter and spending some time playing around with it. Sure, you can also spend $500K and get similar information or you can spend two years developing "the next best performance testing tool ever," but before you are certain free is not good enough, why would you want to?

Final word: monitor your services during performance tests. If you do not have service related monitoring developed and set up to be used during live operations, you do not need performance testing. If the risks of your service failing are not important enough that you would want to know about it *before* it happens, then you should not be wasting time or money on performance testing. I am incredibly lucky in this area - Google infrastructure is developed by a bunch of people who, if they had a meeting where the topic would be "How to make Goranka's life easy?", could not have done better. I love them - they make my job trivial. At a minimum, I monitor CPU, memory and I/O usage. I cannot see a case when you would want to do less, but you may want to do a lot more on occasion.

23 comments:

  1. Bravo Goranka!!

    One of these days you, Rob Sabourin, and I need to do a joint piece about exploratory performance testing! (Don't worry, Rob and I have been working on this piece since the end of day 2 at WOPR1).

    The only thing I want to point out to the general public is that the two - four weeks you mention is a testament to the fact that you are starting with very performance aware/concerned developers/admins, your team reacts quickly to performance issues that you do detect, and that you are quite good at what you do.

    I'm not saying that this is not achievable by other teams - it absolutely is! But depending on where a team is starting from, it may take having the performance tester on board from day 1, working side by side with the developers/admins to help them become more performance aware, etc. for a while to get to the two - four week time frame.

    The notion that buying a tool, sending someone to three days of vendor training and then expecting them to conduct a single test cycle with the tool that generates useful results (that the team has time to react to) during the final two weeks of a project is just as bogus as it has ever been. Or, as I'm often quoted...

    Only performance testing at the conclusion of system or functional testing is like ordering a diagnostic blood test after the patient is dead.

    Again, fabulous insights. I'll reference it often.

    Cheers,
    Scott
    --

    Scott Barber
    President & Chief Technologist,
    PerfTestPlus, Inc.
    Executive Director,
    Association for Software Testing

    "If you can see it in your mind...
    you will find it in your life."

    ReplyDelete
  2. So what would you recommend as an entry-level performance profiling tool for Java applications? "Entry-level", because I'd like to get the students in my undergraduate software engineering course to tune their programs, and this will be the first time most of them have had to worry about performance.
    - Greg (gvwilson@cs.toronto.edu)

    ReplyDelete
  3. Thanks Scott - and all good points. Beacause I work on the infrastructure that is really well suited for performance testing, and I have extremely interested development teams, I can turn in projects in 2 - 4 weeks. But, not every project would be like that. Would love to work with you and Rob... One question - does the marsupial get writing credits as well? :)

    Greg, I would suggest JProf. It has been a while since I last taught a class, but I would not be agaisnt giving different groups different tools and asking them to write reports - what worked well, what could be better, etc. This may end up being the best lesson they get - we do this all the time in "real life."

    ReplyDelete
  4. Great post.

    If applications are bridges, performance testing is finding out whether or not the bridge will collapse when people use it... before people use it.

    On the comment about open source tools:
    I too have used open source tools (pretty much exclusively). I would agree that they are quite powerful, my only complaint about them is that the vast majority of tools that I've been using have bugs that I tend to stumble upon, which causes a fair amount of time for me in rechecking my work due to concerns about the validity of my data. Enterprise level tools, offer at least someone to yell at when that happens :).

    It's awesome that you have usage data to work with and a rock solid infrastructure. I would say for me (and probably most companies that are looking into performance testing) this is usually not the case, and identifying usage patterns can be a fair amount of work in itself, and an infrastructure that isn't as stable requires more testing and more specific testing. In particular, on unstable systems I think the level of granularity needed to make comparative judgements between codebases simply isn't there, and thus benchmarking isn't as revealing as one would hope.

    In lieu of benchmark comparisons, I've found a lot of success by emulating user behavior at peak levels of load for extended periods of time and monitoring system performance looking for degradation during the course of the test. Another test that has been extremely effective for me has been targeted tests that exercise specific services at projected levels and then combining it with a peak load test.

    On your comment about functional testing:
    I would agree that the rendering of the page is in the domain of functional testing, but that being said I hate making assumptions that it just worked! I have not yet, but am considering trying to have functional tests run at the same time as the peak usage load described above to validate that remaining portion (and it provides even greater coverage of system behavior under load)

    The biggest problems that I'm having involve identifying failure. I'm curious how you approach the identification of "failure". A simple thing, but what I've been running in to a lot are errors that I wasn't looking for. For example, during 100,000 data submissions, 12 of them happened to be corrupted, and this was unexpected so it was chance that while running through the logs I noticed some corrupted data.

    I would preach that log analysis is an absolutely critical facet of performance testing, but what sort of things do you do to define a "failure" in a system aside from scanning logs / monitoring behavior? As simple as it is to say it "the system broke", for me there doesn't seem to be a very good science for identifying "broken" and as simple as it may sound, I'm curious how you know when something is broke.

    An excellent post Goranka, more please!

    James Chang
    QA Analyst
    Parature Inc
    http://www.parature.com/

    ReplyDelete
  5. Performance testing does not get stopped just with the measurement of the response times.Though its of most importance, there are other kind of tests which we should bring upon.Its by definition called destructive testing.
    Its a unique kind of performance testing where we analyze how the software application behaves when one or more of its back end application either slowdown or becomes non responsive.Expectation is that the front end application should handle the back end slowdown .Front end application should not become non responsive during that state.Instead they should fast fail the requests, rather than queuing at the server level resulting in shutdown or restart of the application server. The present day online services have complex architecture which talks to "n" number of services to fetch data.So dependency on the other application becomes vital and we expect not a momentum of unavailability of the dependent systems.In worst case of unavailability, the front end should be able to handle.
    This is just a small note on Destructive testing and lots further to discuss.

    ReplyDelete
  6. Hi, Thanks for the article and I watched your video. It is really helpful for a beginner like me.

    My question is: If I were to do benchmarking for web servers that are running under virtualization. Do I apply the same rules as if I were running one server on one machine. Because in virtualized environment, let's say we run like 5 VMs each hosting one web server.

    ReplyDelete
  7. Nice post. Its nice to see performance guidelines that actually work in practice. I've been reading a guide called 'Performance Testing Guidance for Web Applications' (it's published by Microsoft - I know I know :) ) Its quite verbose but really informative and your post sums up some of there points and much more. The video from GTAC is great. Please continue posting happily

    ReplyDelete
  8. This is an excellent article exploring the nuances of performance testing.

    We would just like to mention, if performance evaluation and testing is carried on as a continuous process as off a one time activity, it not only helps in improving performance of systems much more effectively but it reduces time to market too.

    Cheers
    Deepak
    PECATS

    ReplyDelete
  9. It is a useful post. Couple of questions about user pattern and load behaviour. How does meetings with development team helps in revising the load pattern and user behaviour? Agree that development and architect team can give more inputs with performance issues and resolution. But load pattern information is mostly with the end users and business group. Am I wrong?
    Another interesting topic (to me) end user experience for web application. Something like rendering of the web pages. I personally got requirements to test the performance (rendering time) of web page in different browsers. I feel it should be part of Performance testing. Can anyone share their experiences on this?

    Raj
    http://performancetestingfun.googlepages.com

    ReplyDelete
  10. Nice post. Many people regard load testing as something you rarely have to do - they think maybe once or twice a year is enough. It is nice to see people writing about it as a natural part of the development & testing cycle.

    We have just launched a new online load testing service - http://loadimpact.com - and we have an interesting use-case article there that might interest some people: Iterative performance tuning with automated load testing, a case study

    Regards,

    /Ragnar, Load Impact

    ReplyDelete
  11. I like the point about not investigating the code. It is real tempting to look into the code especially if you have been a developer in prior life. Lot of times there is tendency to also guess the root cause instead of taking methodological steps. Performance engineering is an art and science of narrowing down to problem areas by observing the system behavior and accurate measurements. “One accurate measurement is worth a thousand expert opinions.”
    - Adm. Grace Murray Hopper (December 9, 1906 – January 1, 1992)

    my blog is at www.esustain.com

    ReplyDelete
  12. What I feel is that the most important part of performance testing is gathering the application usage/usage pattern/workload mix and translating that to design a test scenario.But that is also most difficult part-gathering the application usage/usagepatter/workload mix.So do you have any process to gather that info?

    ReplyDelete
  13. josesum – For load test specifications I find the best approach is to go as high as you can (Project Manager, IT Manager IT Director etc) and ask what the business wants the application to do, basically get back to requirements.
    For example, a car insurance company might expect to sell 1000 policies a day, typically they'll do their business between 6pm and 11pm, from that you can work out that the application needs to process 200 policies in an hour, that's key - the most important action to simulate in your load tests. Other transactions such as policy maintenance would also be important to script as even if they are not heavily used as the load they induce may have a disproportionate effect on the application.

    It’s an art to weigh up the riskiest transactions for the business (the transactions that have a high performance requirement) and those that may have the highest usage.

    You’ll then need to use a spreadsheet to work out the user think times you’ll need to get your desired transaction rate (200/hour in this case) with the number of users you plan to simulate.
    You can then increase the number of users to increase the transaction request rate to simulate particularly busy periods.
    Hope that helps.
    Adam Brown
    http://www.quotium.com

    ReplyDelete
  14. I am happy to see so many enthusiastic posts on Performance testing. I have been in Performance Engineering business for ~12 years. Note, I refer to it as Performance Engineering since I believe Testing is only a part of what we do as a part of overall process. I love what I do since it has given me exposure not only to various infrastructures but also to different technologies, etc. I believe that in order to provide a true value to end user/stakeholder, application/infrastructure has to be thoroughly analyzed from end to end perspective. After all, outcome and value of performance engineering is just as good as collected requirements.

    ReplyDelete
  15. I believe Testing represents a significant endeavor which still has to be rethought in terms of framing minds and forging skillsets in order to optimize and leverage it.
    Performance Testing is definitely at the forefront of application gauging as it pertains to revenue, customer satisfaction and adoption rate overall. But the challenges of testing efforts are increasingly obvious and daunting as complexity and sophistication augment in applications. Keeping in mind that from a functional perspective, 100% coverage is practically unattainable in while testing, we can however devise predictable boundaries and from there provide for major flexibility and resource adaptability or reconfigurability as to optimize performance to its highest.

    Souma Badombena Wanta
    www.livelypulse.com

    ReplyDelete
  16. this is a very nice post. though I am new in this field. I have recently join an org as performance engineer. I have idea about things but after reading this article i got a question

    whatever is perf testing but if you want to use less words just say perf testing is just using loadrunner or any other tool that you are useing that's it ! is it true???

    ReplyDelete
  17. Nice article which touches on most of the performance testing fundamentals. You are lucky Goranka to work in an environment where performance gets profile - in my experience this is not typically the case. Several posters have mentioned gathering usage stats & this is a perennial: so often the business really does not know how it uses its systems. I find the best approach to this is to extract data from logfiles (often Apache weblogs) & reverse-engineer the workload then get the business to approve & sign-off on it.

    I have been doing performance testing for >10 years & still enjoy it. I have done the highly-controlled baselining & profiling you mention. It is a very broad field of testing, often poorly understood & can be demanding, but it provides exposure to many different technologies & tools. And getting a result - given all the difficulties involved in putting complex tests together - is very rewarding and satisfying.

    cheers
    Steve

    ReplyDelete
  18. great post, i am new to performance testing and am looking forward for more of such blogs a these...

    ReplyDelete
  19. Hi
    I am testing one client server application(.exe) which developed on C#, it deals with number of and loading the images, Is it possible to do performance test on an .eXE.
    I have VSTS performance tool available to test.

    ReplyDelete
  20. Great inputs!!!

    I wanted to point out that service/server monitoring is one of the area needs to get little bit more importance in performance testing. May be one of the reason it is not given so much importance is that it is considered to be the system admin or infrastructure engineer job..I have personally seen that most of the application team just looks for simulating so many users to load the system and expects the client side statistics like response time. Most of them -application team or performance test engineers - don't seem to know the importance of monitoring and the value we can get by doing that. And they want to look at the server statistics after the fact only when there is a performance problem identified. By being pro-active in monitoring strategy, lot of time in after the fact test can be saved.

    I would try to simplify server/service performance monitoring as

    -Hardware performance statistics and its impact - CPU, Memory, IO.

    -Server software performance statics - Web server http connections, App server thread pool, connection pool, JVM Heap etc...

    -Application code performance - Method CPU time, Memory allocated by objects, ...

    As Bravo rightly pointed in the blog, Hardware statistics needs to be monitored bare minimum..

    Hope it helps..

    thanks
    Raj
    PerformanceTestingFun

    ReplyDelete
  21. I'm also going for Jmeter but as mentioned in the post, it takes some time to learn & operate.

    So, searching for a possible 'wrapper' or online service that can help me save time, I came across this Google Groups Post http://goo.gl/uLgva

    I am now testing BlazeMeter.com (so far, the test results are quite impressive, I am now considering a larger test). Any inputs on or experience with this?

    ReplyDelete
  22. I am trying to write scripts in JavaScript but i need to know about its conversion to C so that the scripts would run in Vugen 11 ? Can anyone throw some light on this subject matter?

    ReplyDelete
  23. Excellent work!

    I just finished a script emulating 100 VU, got the results.
    But I don't have any benchmark to compare my results to. I don't either have a similar existing web application to base performance metrics on.
    What should be done in such a case. How to arrive at acceptable performance metrics.

    Karam
    QA
    TMS

    ReplyDelete

The comments you read and contribute here belong only to the person who posted them. We reserve the right to remove off-topic comments.