GUI Testing: Don't Sleep Without Synchronization
Tuesday, October 28, 2008
So you're working on TheFinalApp - the ultimate end-user application, with lots of good features and a really neat GUI. You have a team that's keen on testing and a level of unit test coverage that others only dream of. The star of the show is your suite of automatic GUI end-to-end tests — your team doesn't have to manually test every release candidate.
Life would be good if only the GUI tests weren't so flaky. Every once and again, your test case clicks a menu item too early, while the menu is still opening. Or it double-clicks to open a tree node, tries to verify the open too early, then retries, which closes the node (oops). You have tried adding sleep statements, which has helped somewhat, but has also slowed down your tests.
Why all this pain? Because GUIs are not designed to synchronize with other computer programs. They are designed to synchronize with human beings, which are not like computers:
- Humans act much more slowly. Well-honed GUI test robots drive GUIs at near theoretical maximum speed.
- Humans are much better at observing the GUI, and they react intelligently to what they see.
- Humans extract more meaningful information from a GUI.
In contrast to testing a server, where you usually find enough methods or messages in the server API to synchronize the testing with the server, a GUI application usually lacks these means of synchronization. As a result, a running automated GUI test often consists of one long sequence of race conditions between the automated test and the application under test.
GUI test synchronization boils down to the question: Is the app under test finished with what it's doing? "What it's doing" may be small, like displaying a combo box, or big, like a business transaction. Whatever "it" is, the test must be able to tell whether "it" is finished. Maybe you want to test something while "it" is underway, like verify that the browser icon is rotating while a page is loading. Maybe you want to deliberately click the "Submit" button again in the middle of a transaction to verify that nothing bad happens. But usually, you want to wait until "it" is done.
How to find out whether "it" is done? Ask! Let your test case ask your GUI app. In other words: provide one or several test hooks suitable for your synchronization needs.
The questions to ask depend on the type, platform, and architecture of your application. Here are three questions that worked for me when dealing with a single-threaded Win32 MFC database app:
The first is a question for the OS. The Win32 API provides a function to wait while a process has pending input events:
DWORD WaitForInputIdle(HANDLE hProcess, DWORD dwMilliseconds). Choosing the shortest possible timeout (dwMilliseconds = 1) effectively turns this from a wait-for to a check-if function, so you can explicitly control the waiting loop; for example, to combine several different check functions. Reasoning: If the GUI app has pending input, it's surely not ready for new input.
The second question is: Is the GUI app's message queue empty? I did this with a test hook, in this case a WM_USER message; it could perhaps also be done by calling PeekMessage() in the GUI app's process context via CreateRemoteThread(). Reasoning: If the GUI app still has messages in its queue, it's not yet ready for new input.
The third is more like sending a probe than a question, but again using a test hook. The test framework resets a certain flag in the GUI app (synchronously) and then (asynchronously) posts a WM_USER message into the app's message queue that, upon being processed, sets this flag. Now the test framework checks periodically (and synchronously again) to see whether the flag has been set. Once it has, you know the posted message has been processed. Reasoning: When the posted message (the probe) has been processed, then surely messages and events sent earlier to the GUI app have been processed. Of course, for multi-threaded applications this might be more complex.
These three synchronization techniques resulted in fast and stable test execution, without any test flakiness due to timing issues. All without sleeps, except in the synchronization loop.
Applying this idea to different platforms requires finding the right questions to ask and the right way to ask them. I'd be interested to hear if someone has done something similar, e.g. for an Ajax application. A query into the server to check if any XML responses are pending, perhaps?
Isn't this style of coding automated GUI tests too grey or even white? As far as I understand, checking for the availability of GUI elements via the code itself prevents the tests from recognizing any delays or lags in the user interface.
ReplyDeleteIn general, could you provide any pointers to what you would consider the best approach to automated GUI tests regarding the level of reliance on code internas? As a simple example, merely to illustrate the question, should one use the actual Strings to check for menu items or constants in the code?
This comment has been removed by the author.
ReplyDelete"I'd be interested to hear if someone has done something similar, e.g. for an Ajax application"
ReplyDeleteSelenium has the waitForCondition() function. You pass in some javascript that will evaluate to true when the part of the page you're interested in is loaded.
Jemmy, a Java Swing GUI testing library, has a similar method to check if there are pending events in the event queue.
ReplyDeleteSee http://jemmy.netbeans.org/javadoc/org/netbeans/jemmy/QueueTool.html#waitEmpty(long)
Tyburn, a swing testing harness also works in a similar way.
ReplyDeleteIt adds an observer to the swing event queue and then listens for one of it's own messages.
Once it receives that, it knows that all previous messages have been processed
See: http://code.google.com/p/tyburn/
As eliot, the first think I though was in Selenium which enters in a loop checking if the desired condition is true and waiting a second for re-check if it isn't...
ReplyDeleteThe approach of synchronising via the GUI event queue only works if the system under test has no other inputs but the GUI. However, if the system reacts to other events (network messages, for example), then using the event message queue won't work. You don't know how long it will take the system to process the event that will, eventually, cause paint/expose or other GUI events to appear on the client's queue.
ReplyDeletenice post there, you touched upon the internals of how sync works...
ReplyDeleteIn case of automated tools like QTP and Robot we generally don't have to worry about these aspects since they are taken care by the tool itself.
Relying solely on the GUI queue is problematic in case there has other threads. Say a GUI operation ended, but a non-GUI thread is still running (calculating something). After some time, the non-GUI thread invokes a GUI operation (e.g. invokeLater() in swing).
ReplyDeleteFor a human, it will probably work since we can understands that we are suppose to wait, even if the GUI is not blocked (see excellent post http://googletesting.blogspot.com/2008/10/gui-testing-dont-sleep-without.html)
But a testing machine will break.
In our company, we have implemented a mechanism that is indeed waiting for the GUI thread queue to be empty, but every once in a while the application is indeed violating
and we had to add some waitFor() synchronizers.
The previous web address was cut.
ReplyDeleteHere it is in small pieces:
http://
googletesting.blogspot.com/
2008/
10/
gui-testing-dont-sleep-without.html
(it is worth reading)