Posted by Goranka Bjedov, Senior Test Engineer
This post is my best shot at explaining what I do, why I do it, and why I think it is the right thing to do. Performance testing is a category of testing that seems to evoke strong feelings in people: feelings of fear (Oh, my God, I have no idea what to do because performance testing is so hard!), feelings of inadequacy (We bought this tool that does every aspect of performance testing, we paid so much for it, and we are not getting anything done!), feelings of confusion (So, what the heck am I supposed to be doing again?), and I don't think this is necessary.
Think of performance testing as another tool in your testing arsenal - something you will do when you need to. It explores several system qualities, that can be simplified to:
- Speed - does the system respond quickly enough
- Capacity - is the infrastructure sized adequately
- Scalability - can the system grow to handle future volumes
- Stability - does the system behave correctly under load
So, I do performance testing of a service when risk analysis indicates that failing in any of the above categories would be more costly to the company than performing the tests. (Which, if your name is Google and you care about your brand, happens with any service you launch.) Note that I am talking about services - I work almost exclusively with servers and spend no time worrying about client-side rendering/processing issues. While those are becoming increasingly more important, and have always been more complex than my work, I consider those to be a part of functionality tests, and they are designed, created and executed by functional testing teams.
Another interesting thing about performance testing is that you will never be able to be 100% "right" or 100% "done. Accept it, deal with it, and move on. Any system in existence today will depend on thousands of different parameters, and if I spent the time analyzing each one of them, understanding the relationships between each two or each three, graphing their impact curves, trying to non-dimensionalize them, I would still be testing my first service two years later. The thought of doing anything less filled me with horror (They cannot seriously expect me to provide meaningful performance results in less than a year, can they?) but I have since learned that I can provide at least 90% of meaningful information to my customers by applying only 10% of my total effort and time. And, 90% is more than enough for vast majority of problems.
So, here is what I really do - I create benchmarks. If I am lucky and have fantastic information about current usage patterns of a particular product (which I usually do), I will make sure this benchmark covers most operations that are top resource hogs (either per single use or cumulative). I'll run this benchmark with different loads (number of virtual users) against a loosely controlled system (it would be nice to have 100 machines all to myself for every service we have, which I can use once a day or once a week, but that would be expensive and unrealistic) and investigate its behavior. Which transactions are taking the most time? Which transactions seem to get progressively worse with increasing load? Which transactions seem unstable (I cannot explain their behavior)? I call this exploratory performance testing, and I'll repeat my tests until I am convinced I am observing real system behavior. While I am doing this, I make sure I am not getting biased by investigating the code. If I have questions, I ask programmers, but I know they are biased, and I will avoid getting biased myself!
Once I have my graphs (think, interesting transaction latencies and throughput vs. load here) I meet with the development team and discuss the findings. Usually, there is one or two things they know and have been working on, and a few more they were unaware of. Sometimes, they look over my benchmark and suggest changes (could you make the ratio 80:20, and not 50:50?) After this meeting, we create our final benchmark, I modify the performance testing scripts, and now this benchmark will run as often as possible, but hopefully at least once a night. And, here is the biggest value of this effort: if there is a code change that has impacted performance in an unacceptable way, you will find out about it the next day. Not a week or a month later (How many of us remember what we did in the last month? So, why expect our developers to do so?)
Here is why I think this is the right thing to do: I have seen more bad code developed as a result of premature performance optimizations - before the team even thought they had a problem! Please don't do that. Develop your service in a clean, maintainable and extensible manner. Let me test it, and keep regression testing it. If we find we have a problem in a particular area, we can then address that problem easily - because our code is not obfuscated with performance optimization that have improved code paths that execute once a month by 5%.
I can usually do this in two - four weeks depending on the complexity of the project. Occasionally, we will find an issue that cannot be explained or understood with performance tests. At that point in time, we look under the hood. This is where performance profiling and performance modeling come in. And, both of those are considerably more complex than performance testing. Both great tools, but should be used only when the easy tool fails.
Tools, tools, tools... So, what do we use? I gave a presentation at Google Test Automation Conference in
Final word: monitor your services during performance tests. If you do not have service related monitoring developed and set up to be used during live operations, you do not need performance testing. If the risks of your service failing are not important enough that you would want to know about it *before* it happens, then you should not be wasting time or money on performance testing. I am incredibly lucky in this area - Google infrastructure is developed by a bunch of people who, if they had a meeting where the topic would be "How to make Goranka's life easy?", could not have done better. I love them - they make my job trivial. At a minimum, I monitor CPU, memory and I/O usage. I cannot see a case when you would want to do less, but you may want to do a lot more on occasion.
7 comments:
Bravo Goranka!!
One of these days you, Rob Sabourin, and I need to do a joint piece about exploratory performance testing! (Don't worry, Rob and I have been working on this piece since the end of day 2 at WOPR1).
The only thing I want to point out to the general public is that the two - four weeks you mention is a testament to the fact that you are starting with very performance aware/concerned developers/admins, your team reacts quickly to performance issues that you do detect, and that you are quite good at what you do.
I'm not saying that this is not achievable by other teams - it absolutely is! But depending on where a team is starting from, it may take having the performance tester on board from day 1, working side by side with the developers/admins to help them become more performance aware, etc. for a while to get to the two - four week time frame.
The notion that buying a tool, sending someone to three days of vendor training and then expecting them to conduct a single test cycle with the tool that generates useful results (that the team has time to react to) during the final two weeks of a project is just as bogus as it has ever been. Or, as I'm often quoted...
Only performance testing at the conclusion of system or functional testing is like ordering a diagnostic blood test after the patient is dead.
Again, fabulous insights. I'll reference it often.
Cheers,
Scott
--
Scott Barber
President & Chief Technologist,
PerfTestPlus, Inc.
Executive Director,
Association for Software Testing
"If you can see it in your mind...
you will find it in your life."
So what would you recommend as an entry-level performance profiling tool for Java applications? "Entry-level", because I'd like to get the students in my undergraduate software engineering course to tune their programs, and this will be the first time most of them have had to worry about performance.
- Greg (gvwilson@cs.toronto.edu)
Thanks Scott - and all good points. Beacause I work on the infrastructure that is really well suited for performance testing, and I have extremely interested development teams, I can turn in projects in 2 - 4 weeks. But, not every project would be like that. Would love to work with you and Rob... One question - does the marsupial get writing credits as well? :)
Greg, I would suggest JProf. It has been a while since I last taught a class, but I would not be agaisnt giving different groups different tools and asking them to write reports - what worked well, what could be better, etc. This may end up being the best lesson they get - we do this all the time in "real life."
Great post.
If applications are bridges, performance testing is finding out whether or not the bridge will collapse when people use it... before people use it.
On the comment about open source tools:
I too have used open source tools (pretty much exclusively). I would agree that they are quite powerful, my only complaint about them is that the vast majority of tools that I've been using have bugs that I tend to stumble upon, which causes a fair amount of time for me in rechecking my work due to concerns about the validity of my data. Enterprise level tools, offer at least someone to yell at when that happens :).
It's awesome that you have usage data to work with and a rock solid infrastructure. I would say for me (and probably most companies that are looking into performance testing) this is usually not the case, and identifying usage patterns can be a fair amount of work in itself, and an infrastructure that isn't as stable requires more testing and more specific testing. In particular, on unstable systems I think the level of granularity needed to make comparative judgements between codebases simply isn't there, and thus benchmarking isn't as revealing as one would hope.
In lieu of benchmark comparisons, I've found a lot of success by emulating user behavior at peak levels of load for extended periods of time and monitoring system performance looking for degradation during the course of the test. Another test that has been extremely effective for me has been targeted tests that exercise specific services at projected levels and then combining it with a peak load test.
On your comment about functional testing:
I would agree that the rendering of the page is in the domain of functional testing, but that being said I hate making assumptions that it just worked! I have not yet, but am considering trying to have functional tests run at the same time as the peak usage load described above to validate that remaining portion (and it provides even greater coverage of system behavior under load)
The biggest problems that I'm having involve identifying failure. I'm curious how you approach the identification of "failure". A simple thing, but what I've been running in to a lot are errors that I wasn't looking for. For example, during 100,000 data submissions, 12 of them happened to be corrupted, and this was unexpected so it was chance that while running through the logs I noticed some corrupted data.
I would preach that log analysis is an absolutely critical facet of performance testing, but what sort of things do you do to define a "failure" in a system aside from scanning logs / monitoring behavior? As simple as it is to say it "the system broke", for me there doesn't seem to be a very good science for identifying "broken" and as simple as it may sound, I'm curious how you know when something is broke.
An excellent post Goranka, more please!
James Chang
QA Analyst
Parature Inc
http://www.parature.com/
Performance testing does not get stopped just with the measurement of the response times.Though its of most importance, there are other kind of tests which we should bring upon.Its by definition called destructive testing.
Its a unique kind of performance testing where we analyze how the software application behaves when one or more of its back end application either slowdown or becomes non responsive.Expectation is that the front end application should handle the back end slowdown .Front end application should not become non responsive during that state.Instead they should fast fail the requests, rather than queuing at the server level resulting in shutdown or restart of the application server. The present day online services have complex architecture which talks to "n" number of services to fetch data.So dependency on the other application becomes vital and we expect not a momentum of unavailability of the dependent systems.In worst case of unavailability, the front end should be able to handle.
This is just a small note on Destructive testing and lots further to discuss.
Hi, Thanks for the article and I watched your video. It is really helpful for a beginner like me.
My question is: If I were to do benchmarking for web servers that are running under virtualization. Do I apply the same rules as if I were running one server on one machine. Because in virtualized environment, let's say we run like 5 VMs each hosting one web server.
Nice post. Its nice to see performance guidelines that actually work in practice. I've been reading a guide called 'Performance Testing Guidance for Web Applications' (it's published by Microsoft - I know I know :) ) Its quite verbose but really informative and your post sums up some of there points and much more. The video from GTAC is great. Please continue posting happily
Post a Comment