48 hours - The Pursuit of Proper Performance

What EXACTLY Do You Want to Test?

First of all: What do you mean by “testing performance”? Unfortunately it wasn’t that easy to get hold of anyone, but finally I at least got some information.

They were using Java Server Faces for their UI. JSF is known for being very generous in using session state.

This StackOverflow thread [http://stackoverflow.com/questions/628570/jsf-tuning] describes a quite common effect of using JSF in your application.

So our client wanted to know how many concurrent sessions they can handle in their system. Good. At least something to get started with.

We are supposed to stress and capacity test the web application (terms used as defined in [http://perftestingguide.codeplex.com/]), answering the questions:

Can the system handle a particular steady number of users within the required response times and memory available memory constraints over a longer period of time?
How many concurrent users can the system handle in total before becoming unresponsive?

We couldn’t get any more information without spending most of the scheduled time on chasing the client hence we had to start with what we already knew.

Since we are familiar with some existing test frameworks such as JMeter, Grinder 3 and CLIF (see http://www.opensourcetesting.org/performance.php for more), we took a closer look at those ones first.

While all these tools are great, they require creating or recording http request scenarios in their own scripting language.

Our website under test uses a fair bit of Ajax, so the prospect of creating realistic scripts of all the interactions between browser and web server didn’t seem too appealing given our limited schedule. Let’s look for a plan B.

Build or Buy?

While helping out improving the quality of the application (and the development process in general) we had already written a few acceptance tests using WebDriver.

Being short of time, the idea was simple: Use one of the existing testing libraries and reuse some of the existing acceptance test scenarios.

Unfortunately the tools we tried don’t make it easy to execute custom Java code for simulating users. I’m not saying it is impossible but at least there was no obvious solution that matched what we were looking for. The question then was again “make or buy”.

Do we recreate some usage scenarios in one of the tool’s scripting languages and try to plug our code into one of the tools or shall we write a basic tool the fits our needs ourselves?

There’s a risk in both approaches and it is quite impossible to tell which takes longer, not to mention the risk of choosing an approach that turns out entirely impractical in the end (asking for estimates on this anyone?)

We took a step back and thought about our needs:

Identify the different user profiles that use the website. There might be different types of users, some just browsing, others actively submitting data, etc.
Some bit of logic that executes a particular usage scenario
A load generator that orchestrates the execution of above scenarios in a defined manner to create realistic load we need to find out how our users interact with the website.

For various interaction scenarios we already had implementations at hand from our acceptance test suite.

The fact that we used a separate test page model (more on this in a later post) for interacting with the website made it very easy and attractive to reuse these implementations.

We could tick off point 2 from our list, so let’s look at points 1 and 3.

Identifying Usage Profiles

There are several ways to get more information about usage profiles.

In the past we have used various techniques such as recording requests on the production system and performing some cluster analysis, use Google Analytics to track user activities on the website or simply coming up with an educated guess, based on business estimates and expectations.

Unfortunately none of these techniques were applicable. There was no information from the production system available, no Google tracking tags on the site and we also didn’t get much meaningful response regarding business expectations. Sigh.

We did what developers do: made reasonable assumptions and settled for 70% users that just browse for information, 25% that perform site searches and 5% that actually fill out and submit forms.

Creating Load

For our acceptance tests we already had implemented the logic for these usage scenarios. Alas as mentioned above it didn’t easily plug into the existing load testing tools. Record or code?

Unsure how much effort it might be to get tools like JMeter and Grinder to do what we wanted, we decided to bite the bullet and implement our own little load generator logic.

We took a quick look at ContiPerf [http://databene.org/contiperf] which looked very promising. Unfortunately, for our capacity testing goal we needed exactly what the site mentions under “Roadmap: Supporting ramp-up and random pause time between executions”.

Another dead end. So we implemented our own solution and published it at Github

[https://github.com/eeichinger/performance-test-demo-basic/zipball/perfblogpost-1]

It turned out only a little bit more complex than originally anticipated but in the end worked out surprisingly well.

Putting It Together

Given our approach above we are now able to easily perform stress- and capacity-tests of our web application using a defined mix of usage profiles.

The resulting little framework supports:

create steady load for a certain period
define a certain mix of different usage profiles to be executed
(linear) ramp up the number of users
reuse already implemented acceptance test scenarios
gather statistics about error-rates

To run a load stress test generating a steady load you would run:

-users ${performance.test.users}
-sessions ${performance.test.sessions}

To run a capacity test, continuously increasing the number of users, run:

java PerformanceTestRunner
       -users ${performance.test.users}
       -sessions ${performance.test.sessions}
       -usersMax ${performance.test.usersMax}
       -usersIncr ${performance.test.usersIncr

Although certainly not perfect, with our rather pragmatic solution we are already able to answer the questions:

Can the system handle x users over a longer period of time without failures?
How many users can the system handle before breaking down?

Are We There Yet?

In short: No. To make some meaning of a test run we need to gather metrics for the components (web server, database) involved.

This requires setting up some monitoring infrastructure. An easy way to monitor memory usage of a JVM is JConsole.

Unfortunately JConsole only gets you so far, if you want more detailed information about e.g. your web server and database behaviour during a test run, you need to look for more powerful tools like OpenNMS or Hyperic.

Also the numbers you get from such tests don’t make any sense at all if they are gathered from an infrastructure that doesn’t even remotely come close to the production environment.

Ideally you can create an exact copy of your target environment. Of course in larger environments you’ll very likely need to make some compromises for technical or political reasons.

From a technical perspective we found that – although it seemed very appealing – the use of WebDriver for performance testing is problematic.

Neither browsers nor HtmlUnit are fast enough to allow a single client to produce significant load on a web server. This requires distributing the test clients onto several machines to produce enough stress on the target server.

Retrospective

If you are in an environment that doesn’t have some spare LoadRunner licences available, it seems even in 2012 one still has to do a lot of manual groundwork for setting up a performance testing solution.

None of the available open source tools allowed us to confidently get started and concentrate on our goal out of the box. This uncertainty led us to implement our own framework, something that we usually avoid.

Why should we care about performance? As usual the answer is simple: Because bad performance poses a risk and may cost money (or even loss of life).

Performance is a *business* requirement and needs to be treated with the same care as we try to capture functional requirements.

Although our client at least thought about performance testing *before* going to production, the requirements still were very vague. We needed to work out some of the requirements ourselves – this definitely shouldn’t be the case.

Our client certainly is not alone – from our experience performance is often still treated as a second (or less) class citizen. Until there is a problem in production…

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

RETURN TO BLOG

SHARE