Archived PushToTest site

PushToTest TestMaker 6 Methodology

The PushToTest Methodology


This section contains the following sections:

Working with TestMaker

User Goal Oriented Testing (UGOT)

Method for Black Box and White Box (Profiling) Tests

Applying the Method to SOA and Web Services

Planning: Background and Goals

Definitions: for Use Case and Test Scenario

Additional Use Cases Considered but Not Implemented

Defining the Test Scenario

Identify the Test Environment (Hardware and Software)

Using the XSTest-Pattern for Performance Tests

Calibration Testing

Scalability Index

Understanding TPS

Calibration What-If Chart

Working with TestMaker

Scalability and performance test methodology has been developed at PushToTest to identify and quantify the benefits of business optimization, modeled after methods that are in place at places like General Motors, BEA, Lockheed Martin, Sun Microsystems and the European Union. The methodology makes apparent the tradeoffs a software developer makes when choosing coding techniques, code libraries and APIs. Despite the need for scalability and performance testing, organizations do not incorporate it into their SOA application usage.

Identifying Service Performance Metrics

The business value of running scalability and performance tests becomes clear after a business formalizes a test method that includes the following:

  1. 1. Choose the right set of test cases. For instance, the test of a multiple-interface and high volume service will be different than a service that handles periodic requests with huge message sizes. The test needs to be oriented to address the end-user goals in using the service and delivery of actionable knowledge.
  2. 2. Accurate test runs. Understanding the scalability and performance of a service requires dozens to hundreds of test case runs. Ad-hoc recording of test results is unsatisfactory. Test automation tools are plentiful and often free.
  3. 3. Make the right conclusions when analyzing the results. Understanding the scalability and performance of a service requires understanding how the throughput – measured as Transactions Per Second (TPS) at the service consumer – changes with increased message size and complexity and increased concurrent requests.

All of this requires much more than an ad-hoc approach to reach useful and actionable knowledge. An understanding of the scalability and performance of SOA in multiple environments and configurations is necessary. Identifying use and test cases and test scenarios are used to understand the scalability and performance of the SOA application and analysis of the resulting data.The PushToTest methodology is available in a set of developer scalability and performance kits. These kits are either available for free downloads under an open-source license or available as a commercially licensed product.

User Goal Oriented Testing (UGOT)

User Goal Oriented Testing (UGOT) method is a contrasts user goals with what a service (or application) actually delivers. The idea for UGOT testing stems from Alan Cooper countering software developers by saying "If you design for every possible user, no individual user will have their goals met when they use your software application!" Coverage tests are typically pointless, as users typically take a path through the functions in a service, one after another, like chain links. The agile development community approaches this problem by recommending a test first strategy, which is to write a unit test of a class before writing the class itself. At build-time the compiler environment compiles the object code for the class and then runs the unit test against the compiled code. The unit test completes successfully by receiving example data and validates the response. If the class returns an invalid response, the unit test throws an exception that the build and deploy environment handles.

Unit testing and agile development methods help but are not a complete solution to UGOT techniques. For instance, test first is usually only carried out on a unit level. SOA deploys applications as a collection of services, so testing individual units misses most of the big problems that occur during SOA integration and deployment. UGOT modeled tests check a service as an individual user would – by picking one feature after the next in a chain of service requests, which is a more global and inclusive testing mechanism.

UGOT is ideal to understand performance and scalability. Internet applications must deliver real value to those who build and operate services. Understanding what is being tested is the heart of the issue, as can be seen in the following chart that shows the building blocks of Java development tools for building applications.

Possible architecture components for test

The components build on each other in three tiers:

  • • At the bottom are the fundamental components for soap bindings, XML parsing, JAVA inter-application messaging services (JMS) and clustering.
  • • Building on these are service bus components for services to inter-operate at a message level.
  • • The top tier provides inter-operability at the application level.

Given these building blocks and tiers, testing should be done across the tiers, where key performance bottlenecks in soap Bindings and JMS services may happen. Each of the components in the above chart impacts the scalability and performance of an application implementation. Additionally, understanding the testing goal can be seen by reviewing the definition of service architecture as shown in the following illustration.

SOA as a consumer, service and broker architecture

Which should be tested and when should it be tested? Performance tests normally check at least two. For instance, one test may check a consumer and a service and another checks a consumer and a broker. Understanding the Tests lists options to understand what part of the system would benefit most from scalability and performance analysis.

Understanding the Tests

Test Name

Test Benefit

Test Type

Parameters Related to the Scalability Index

Service Interface

Decrease time for service request responses to lower network bandwidth and server hardware costs.


Message size and concurrent request level.

XML Parsing

Decrease time for routing service messages to lower network bandwidth and server hardware costs.


Schema complexity (depth and element count), document size and concurrent request level.

Data Persistence

Decrease time for storing and retrieving messages to lower network bandwidth, server hardware and disk costs.


Schema complexity (depth and element count), document size, concurrent request level.

Data Transformation

Decrease time for transforming a message into a given XML schema to lower network bandwidth and server hardware costs.


Source and destination schema complexity (depth and element count), request and output document size, concurrent request level.

Data Aggregation and Federation

Decrease time for responding to service requests requiring up-stream data to reduce network bandwidth and server hardware costs.


Schema complexity (depth and element count) for up-stream services, data persistence quantities, message time-to-live (TTL) values.

Data Mitigation

Reduce time when a service is unavailable at peak usage to improve service availability and user satisfaction.


Schema complexity (depth and element count) for each request, document size, concurrent request level.

Understanding the Tests includs test goals that are encountered most often, however, with the fast pace of Internet application service building tools this list is not exhaustive. Test Name and Test Benefit are self-explanatory, but Test Type and Parameters related to the Scalability Index need an explanation:

  • • The test type is either stateful or stateless with each requiring a different strategy and meeting different goals.
  • Stateless Testing: is for services responding to each request independently from all other requests and checks the impact of concurrent requests and message payload size on a service.
  • Stateful Testing: is similar to stateless testing, but the service provides data persistence, workflow transaction processing, message queuing, sessions and / or data indexing.
  • • The parameters related to the Scalability Index define the test inputs showing the Scalability Index – the usage pattern showing service performance expectations when in production.

The PushToTest method requires a series of steps to be followed, as is detailed in The PushToTest Method.

The PushToTest Method





Answer the question: How will this test benefit the organization?

Write Test Plan document.


Identify use cases, test cases and scenario, and test environment (hardware, software and network)

Add the use cases, test cases and test scenario to the Test Plan.

Calibration Test

Calibrate the test cases to the test environment.

Identify the use cases driving the test environment to its maximum throughput (as measured in TPS from the client).


Modify the service and / or test environment to optimize for the best performance based on what was learned in the Calibration Test.

Amend the Test Plan to add the changes for optimization.

Full Test

Run the TestScenario

Successful run of TestScenario .

Results Analysis

Identify test result metrics and trends against the TestScenario goals.

Present results and achieve adoption by management.

Here is a brief explanation of the terminology used in The PushToTest Method:

  • Use Case – describes the functionality of a test. For instance, a test comparing XML parsing techniques includes two use cases: the first uses the Xerxes DOM parser and the second uses the JAXB XML binding compiler.
  • Test Case – describes the inputs to an individual test. For instance, for a test comparing service throughput at low and high levels for message payloads, define 2 test cases: the first makes requests to the service using an XML document of 500 bytes and the second sends XML documents of 10,000 bytes.
  • Test Scenario – the aggregate of all test and use cases to run a complete test.

There is however, an important distinction between this SOA test method and typical software testing.

Method for Black Box and White Box (Profiling) Tests

Testing Internet applications for scalability and performance is different from testing software applications and code. SOA testing is focused on understanding how a service responds to increasing levels of concurrent requests, message sizes and response handling techniques. The nature of Internet applications testing is black-box testing; it doesn't matter what happens inside the box.

Code profilers have their place in testing software and software developers often rely on them to learn the location of performance problems. However, black-box testing often yields more actionable knowledge, with these recommendations:

  1. 1. Create a baseline performance metric (a Scalability Index) using black-box performance tests showing Transaction Per Second (TPS) results, measured at the service consumer, with a variety of message sizes, message schema complexities and concurrent request levels.
  2. 2. Compare performance and scalability between multiple servers, consumers or brokers by identifying each server's Performance Index and normalizing the test parameters to avoid reporting false slow-performance results. The test is run properly once the test lab has been calibrated in this Calibration step.
  3. 3. Determine the Performance Index of the service under test and use white-box testing techniques to profile the largest time expensive object operations to handle requests. Optimize the software based on the profile.
  4. 4. Continue optimizing the service by repeating steps 2 and 3.
  5. 5. Run the Performance Index and analyze the results.

This is the basis for understanding the PushToTest method of testing services for scalability and performance.

Applying the Method to Internet Applications and Web Services

The PushToTest method surfaces scalability, performance and developer productivity differences between Internet application services built with the JAVA application server and database tools and the same services built with native XML technology. Often we view scalability from two perspectives:

  1. 1. Soap binding acceleration. Implements the a JAVA objects and XQuery approach. The use cases contrast performance and developer productivity based on the typical developer choices of XML parsing techniques (XML binding compiler, Streaming XML parser and DOM approaches for JAVA and XQuery parsing).
  2. 2. Mid-tier caching for service acceleration. Implements a use case with native XML databases (XML DB) and relational databases (RDBMS) to contrast database performance across a variety of XML message sizes and database operations (insert, update, delete and query).
Planning: Background and Goals

Software architects and developers choose XML parsing techniques, service libraries, encoding techniques and protocols when building services using SOA techniques. Each choice has an impact on the scalability and performance of the finished service. In this example we have 3 goals:

  1. 1. Explain the changing landscape of APIs, libraries, encoding techniques and protocols to software architects and developers. The current generation of technology choices change approximately every 6 to 9 months. For instance, JAXB 1.0 is replaced by JAXB 2.0 and WebLogic Server 8.1 is replaced by WebLogic Server 9.
  2. 2. Identify and use real-world, test use case scenarios showing software architects and developers how to choose technology based on their service goals.
  3. 3. Deliver code compatible with the current techniques for building functional and scalability tests (black-box, unit, agile test-first). Many vendor tutorials are not compatible with what was used to build PushToTest. While these packages could be modified with a lot of work, a better resolution was to develop kits with open-source, incorporating public information accessable to anyone.

By pursuing these techniques we build a reusable method for evaluating Internet application performance and system scalability, plus the results feed basic business needs of cost / benefit and feature / function analysis including:

  • • Reduced hardware costs and reduced per-CPU licensed software by running more efficient services built with more efficient tools.
  • • Increased efficiency with lower network and processor bandwidth.
  • • Reduced time solving interoperability problems with more effective tools.

The kit arms business managers and software developers with the evidence needed to recommend and adopt new solutions internally, and get their projects funded.

Scalability and Performance Kit Contents



Source Code

Complete source code for each use case and test scenario including Ant build scripts to build the kit in a preferred environment.

Developers Journal

A Developer's Journal describing in detail:

Detailed use cases and test scenarios

Design decisions and trade-offs

XML and JAVA binding implementation stories

Client-side software calling the implemented services

Server-side software implementing the services

Use case scenario specific findings

Installation and performance tuning.

Pre-built JARs (start using immediately)

Pre-built JAR and WAR files for immediate use in any environment.

TestMaker and TestScenario Scripts

Scripts to stage a scalability and performance test of each use case and the test scenario.

Definitions: for Use Case and Test Scenario

Use the PushToTest methodology to measure soap binding performance and scalability of bindings created and deployed using J2EE-based tools and XQuery and native XML database. Performance testing compares several methods to receive a soap-based Web Service request and responds to it. Scalability testing looks at the operation of a service as the number of concurrent requests increase. Performance and scalability tests measure throughput as TPS at the service consumer.

The use cases and test scenarios contrast the TPS differences between the two most popular approaches to parse XML in a request, based on the following experiences:

  1. 1. A standard service test method for Web Services has not emerged. For instance, the SPECjAppServer1 test implements a 4-tier Web browser based application where a browser connects to Web, application and database servers in series. Internet applications, on the other hand, are truly a multi-tier architecture where each tier can make multiple soap requests to multiple services and data sources at any time. SPECjAppServer and similar 4-tier tests do not provide reliable Internet application information needed by capacity planners and software architects.
  2. 2. Software architects and developers specialize by service type. For instance, one developer works with complicated XML schemas in order processing services while another concentrates on building content management and publication services in portals.
  3. 3. The tools, technologies and libraries available for software architects change rapidly.

Responding to these issues, the kit uses cases common to many SOA environments. These use cases highlight different aspects of SOA creation and present different challenges to the software development tools examined.

  1. 1. Compiled XML binding using BOD schemas. In this scenario, codenamed TV Dinner, a developer needs to code a part ordering service. The service uses Software Technology in Automotive Retailing (STAR) Business Object Document (BOD) schemas.
  2. On the consumer side, the test code instantiates a previously serialized Get Purchase Order (GPO) request document and adds a predetermined number of part elements to the ordered part. On the service side, the service examines only specific elements within the GPO instead of looking through the entire document.
  3. The developer's code addresses compartments by their namespace so they add / put only the changing parts of the purchase order. The other compartments (company name, shipping information, etc.) do not change from one GPO request to another. To accomplish this, the TV Dinner uses JAXB created bindings allowing access to the individual compartments. This XML to object binding framework is used so only the required objects are instantiated.
  4. The TV Dinner scenario is named because in a TV dinner, the entire dinner is delivered at once while the food is in compartments.
  5. 2. Streaming XML (StAX) Parser. In this scenario, codenamed Sushi Boats, a developer builds a portal receiving a "blog" style news-stream. Each request includes a set of elements containing blog entries. The test code scenario parameters determine the number of blog entries included in each request. The developer needs to take action on the entries of interest and ignore the others. The test code for the Sushi Boats features the JSR 173 Streaming XML (StAX) parser.
  6. The Sushi Boats scenario is named from observations at a Japanese Sushi Bar where the food passes by in a stream and the diner selects the food they take from a selection of boats.
  7. 3. DOM Approach. In this scenario, codenamed Buffet, a developer writes an order validation service receiving order requests and must read all the elements in a request to determine its response. The test code scenario parameters determine the number of elements inserted into each request. The test code for the Buffet scenario uses Xerces DOM APIs.
  8. The Buffet JAVAscenario is named from experience eating at a buffet restaurant and feeling compelled to visit all the stations.

In addition to these use cases, the kit contrasts database performance differences between native XML databases and relational databases and storing XML data containing complex schemas and multiple message sizes in the mid-tier. The kit implements these use cases using both JAVA and XQuery tools.

Defining the Test Scenario

The Test Scenario is the aggregate of all use and test cases. For instance, the kit implements several use cases showing different approaches to XML parsing (DOM, XQuery, StAX and the Binding Compiler). Therefore, running four use cases with two message sizes results in eight test cases in the test scenario. The Test Scenario lists the four use cases: 2 technology choices, 3 message payload sizes and 4 concurrent request levels.

The Test Scenario

Use Case

Technology Choice

Request Payload Size

Concurrent Requests

JAVA XML Binding Compiler

XQuery Engine

5,000 bytes


JAVA Streaming XML Parser

JAVA Application Server

100,000 bytes



500,000 bytes


XQuery Streaming XML Parser


The test scenario is the aggregate of all the test cases. For instance, one test case uses the XML Binding Compiler running on an XML database at 100,000 bytes and 50 concurrent requests. This test scenario requires 96 test cases to run, given these parameters.

If each test case has a 5-minute warm-up period, takes 5 minutes to run and has a 5-minute cool-down period, the test scenario requires 1,440 minutes (24 hours) to run. Seeing how run time can increase significantly as the number of use cases increases, use caution when adding use cases to a test scenario; add the use cases if the resulting actionable knowledge is necessary.

Identify the Test Environment (Hardware and Software)

The last part of test definition concerns the test environment itself. The goal is to follow commonly used, well known and published best-practices. Here are the choices for the test environment:

  • Server Hardware - the fastest, least expensive server hardware with the best performance. For example, in 2007, this would be a rack mounted IBM 4-CPU Intel Xeon 2.8 GHz model 873 server with 4 Gbytes of memory, dual Gigibit Ethernet adaptors and running Windows 2003 Server - Service Pack 2.
  • Server Load-Generating Consumers - a white-box no-name rack mounted 2-CPU Intel Xeon 2.1 GHz with 2 Gbytes of memory, dual Gigibit Ethernet adaptors and running Windows 2003 Server - Service Pack 2. These systems are referred to as TestNodes.
  • Client-Side Load-Generating Test-Automation Framework - TestMaker is a free, open-source software package supported by a large user community. The kit comes as a set of TestMaker files, but it should be possible to implement the test scenario using any commercial or open-source performance tools.
  • JAVA Virtual Machine - the BEA JRockit JVM 1.5 to operate the server-side test software, where past performance testing on JRockit showed speed and stability. Additionally, countless datacenters show many JAVA application servers being run on JRockit.
  • JAVA Specific Optimizations - the consumer and server systems' memory was expanded to 4 Gigabytes (the PC's full capacity). Setting the –Xms and –Xmx memory settings to the same large number.
  • Execution Setup - logging, debugging and monitor components are disabled to reduce overhead and after each scenario, all servers are restarted to clean up resource allocations.

At this point, the test is well defined to begin coding the necessary use cases. The definition phase has been completed, so building the test environment and installing the test are the next steps. Once this has been done, the server’s maximum capacity can be tested.

Using the XSTest-Pattern for Performance Tests

When testing Internet applications for scalability and performance, the shear number of test cases in the test scenario makes it necessary to use test automation. An approach to test automation is a pattern, called XSTest and it is a feature of TestMaker. XSTest takes a test sequence – such as the scenario in The Test Scenario – as input, stages each test case in sequence and records the transaction results in an XML-based log file. The XSTest implementation in TestMaker then tallies the results from the log file into a TPS report. XSTest Sequence Diagram Showing How a Test Scenario is Run illustrates the XSTest pattern as a UML sequence diagram:

XSTest Sequence Diagram Showing How a Test Scenario is Run
XSTest sequence diagram

One of the key advantages to the XSTest pattern is its use of jUnit TestCase objects. These are familiar to most developers and also easily learned. The kit implements the tests as TestCase objects for use in the load test and for reuse as functional tests.

Calibration Testing

When defining the test scenario a set of assumptions are calculated into the Test Plan. For instance, the test scenario in The Test Scenario speculates a test case can achieve satisfactory throughput (in TPS) with a message payload request size of 500,000 bytes and 200 concurrent requests. A Calibration Test identifies a service agent's optimum throughput – measured in TPS at the consumer – against the given testing hardware and software. For instance, consider the data in Payload Size, Concurrent Agents and Transactions -Per-Second Results.

Payload Size, Concurrent Agents and Transactions -Per-Second Results

Payload Size (Bytes)

Concurrent Agents

Transactions Per Second (TPS)
















Payload Size, Concurrent Agents and Transactions -Per-Second Results lists the two input values for the test: the message size sent to the service, and the number of concurrent test agents. Using these values, XSTest operates a test case by instantiating one thread for each concurrent request. Each thread dynamically generates the defined payload size data and sends it to the service as a request. Then the thread receives the server's response, validates the response, handles any exceptions and logs the response as a completed transaction. The thread repeats the same steps until the test case period is finished. A Bar Chart of the Results Showing the Maximum Throughput Values displays the results from Payload Size, Concurrent Agents and Transactions -Per-Second Results in a bar chart clearly shows the maximum throughput values.

A Bar Chart of the Results Showing the Maximum Throughput Values
Throughput as payload increases

Payload Size, Concurrent Agents and Transactions -Per-Second Results and Scalability Index provide information about the service under test, including:

  • • As payload increases, TPS reduces proportionately. The test is not saturating or underutilizing the server, network, or consumer. If TPS increases, the testing level was not high enough, or if it was flat or dropping sharply, the testing level was not low enough.
  • • The reduction in TPS is not proportional to the increase in request size. When the network and consumer are not at high enough activity levels, the service has a poor performing request processor. One reason this could happen is if the message parsing system is not allocating resources (memory, network socket connections or message queues) of the correct size for the demands of the test.
  • • TPS takes a significantly larger reduction for test cases above 3000 bytes of payload. In this case, a code profiler is used to find a test experiencing a buffer overflow or an undersized object list.

While A Bar Chart of the Results Showing the Maximum Throughput Values identifies a few things about the service, additional information is needed to make a conclusion. Values, listed in Parameter Values Required to Calibrate a Test, are needed for the test parameters to make the conclusion.

Parameter Values Required to Calibrate a Test



Request Payload Size

Service request message-body size (in bytes)

Response Payload Size

Service response message-body size (in bytes)

Concurrent Requests

Total number of concurrent requests

Transactions Per Second (TPS)

Ratio of total completed responses to execution time (in seconds)

Network Utilization

% network bandwidth (measured from server)

Server CPU Utilization

% server processor bandwidth

Consumer CPU Utilization

% consumer / client processor bandwidth

Average Transaction Time

Average service response time (measured by consumer / client)

Minimum Transaction Time

Minimum service response time (measured by consumer / client)

Maximum Transaction Time

Maximum service response time (measured by consumer / client)

In a stateless system, for each request, a service allocates its own memory, CPU bandwidth, network bandwidth and other resources needed to generate a response. For a stateless calibration test, resource bottlenecks are identified. Network and CPU Utilization shows the test scenario results including network utilization and server and consumer CPU utilization values.

Network and CPU Utilization



Transactions Per Second - TPS

Network Utilization

Server CPU Utilization

Consumer CPU Utilization































The results in Network and CPU Utilization give some idea of what is going on during the test scenario:

  1. 1. The test is server-bound preventing greater throughput (TPS). When payload sizes are less than 4000 bytes, server CPU utilization is high but not saturated. At 4000 bytes and greater, the CPU is saturated.
  2. Stateless tests require resources to handle the concurrent requests load. Take away a resource – CPU bandwidth or free memory – to operate on larger payloads and response times increase lowering overall TPS.
  3. 2. The scale of the problem indicates there is a significant problem in the server. The payload size from 1000 to 5000 increases by a factor of 5, but TPS values decrease by a factor of 14, from 10.376 to 0.731. In a stateless test, the TPS value should be proportional to the input.

Since this is a stateless test, each request should be served from an independent group of resources (threads, memory, etc.). Watching CPU and memory utilization levels is an appropriate way to identify scalability and performance thresholds.

However, this is not the case for stateful services such as database and workflow applications. Stateful services use data caches, server queues and typically have session managers overhead. These items impact service CPU and memory utilization levels independent of the consumer request load.

For a scalability and performance test running in a defined software and hardware environment, a calibration test helps determine the appropriate service concurrent request levels and message payload sizes. The results show a Scalability Index for the service.

Scalability Index

A Scalability Index is a function of service performance (in TPS) as concurrent request levels and payload sizes change in a test scenario as shown in Scalability Index.

Scalability Index
Calibration testing

There are three distinct parts to this Scalability Index:

  • View A: Testing at 10, 20 and 30 concurrent request (CRs) levels shows the TPS level also rises. Imagine the conclusion if the CRs levels if View A were tested, where the assumption that the service will scale to handle the increase in CRs levels would be made. However, at 40 CRs the TPS value begins to decrease.
  • View C: When testing at the higher levels – 70, 80 and 90 CRs – the TPS value does not change noticeably. Imagine the conclusion if the CRs level if View C were tested, where the assumption that the service will not be able to handle an increase in CRs levels would be made. System managers typicall buy overly equipped server hardware to contend with situations where the server received too many responses to handle efficiently. The managers could have saved a lot of money if they bought multiple smaller (less expensive) systems and used load balancing to split large loads into multiple smaller loads handled by the multiple servers.
  • View B: Calibration tests seek TPS values for the 40, 50 and 60 CRs levels. In this range the service responds to a moderate number of CRs at a TPS rate acceptable to the organization hosting the service.

View B shows the optimum where underdriving (too few requests) nor overdriving (too many requests) the service within its CRs levels range occurs. If View B is the Calibration Test results, a range of CRs levels to use in the Full Test of our test scenario is defined. At this point, additional run scenarios that use alternative APIs and products, with the CRs levels used in View B to compare the TPS results.

In this example, the CPU, network and memory utilization values are used to determine where performance and scalability bottlenecks occur. As a note of reference, CPU and memory bandwidth are helpful in stateless tests. CPU and memory bandwidth are usually meaningless for stateful tests.

Understanding TPS

Transactions Per Second (TPS) can be measured and maybe counterintuitive. TestMaker shows a system’s Scalability Index chart shown in Test Results of a System’s Throughput at 4 Concurrent Requests Levels.

Test Results of a System’s Throughput at 4 Concurrent Requests Levels

System throughput is measured as the number of transactions a system handles as more requests are received. A perfect information system handles requests at a constant rate regardless of the number of requests; it increases its transactions per second to maintain a constant response time. Charting a perfect system's scalability shows the TPS rate increases in equal proportion to the number of received requests; it is a linear relationship.

As throughput increases

For instance, if a perfect system handles 100 concurrent requests in 10 seconds with a 2 second response time, then maintaining the 2 second response time, it should handle 200 concurrent requests in the same 10 second period. Scalability Index for a Perfect System shows the Scalability Index of a perfect system - a system with linear scalability.

Scalability Index for a Perfect System

At each concurrent request level, the system handles them at a measured rate in transactions per second, yielding the system's response time. As concurrent request levels increase, the system handles the requests at the same rate so the number of transactions increases in equal proportion. TPS keeps going up-and-to-the-right in equal proportion to the number of requests.

Linear scalability

For instance, at 100 concurrent requests with a system handling 1000 requests in a 10 second period yields a handling rate of 100 transactions per second. The same system at 200 concurrent requests handles 2000 requests in the same 10 second period increasing the handling rate to 200 transactions per second. That is perfect scalability – the Holy Grail of performance testing. Receiving more requests does not slow down the overall system response time. However, a typical system hitting a bottleneck has the performance shown in A Service Exhibiting a Performance Bottleneck.

A Service Exhibiting a Performance Bottleneck

As the system receives a larger number of concurrent requests it slows down when responding to each request. Increasing past 400 concurrent requests, the system would eventually reach zero transactions per second. Many systems checked for scalability have this problem. The Scalability Index helps system managers plan system capacity to achieve the desired throughput needed to keep user efficiency constant, while helping developers understand how their design and coding decisions impact performance. However, the situation depicted in A Service Exhibiting a Scalability Problem, is more typical of what is observed.

Bottlenecked performance

A Service Exhibiting a Scalability Problem

Hitting a performance bottleneck

As shown in the first three columns, the system is capable of handling an increased level of concurrent requests. However, the fourth column shows the system hits an upper limit in handling transactions, typically resulting from a database indexing problem, a full data cache or a saturated network connection.

This simple TPS method can be convoluted as shown by the results shown in the following scalability test in Parameter Values Required to Calibrate a Test.

Parameter Values Required to Calibrate a Test

Concurrent Requests

Transactions PerSecond (TPS)

Completed Transactions

Average Response Time (milliseconds)













From the results in Parameter Values Required to Calibrate a Test TPS increases only slightly considering the test is making 2.5 (25 / 10) times more concurrent requests. The TPS value should have increased by 2.5 times to 0.775 (0.31 TPS at 10 concurrent requests times 2.5). Free-running threads generate concurrent requests with no sleep time between requests as illustrated in Throughput (TPS) Decrease as Service Response Time Increase as Measured from the Consumer. Their job is to keep making requests to the server during the test period. Yet, the average response time with 25 users is 5.34 times longer (65,344 milliseconds at 25 users divided by 12,234 milliseconds at 10 users). Consequently, there are fewer opportunities to log results and therefore increase the TPS value.

Throughput (TPS) Decrease as Service Response Time Increase as Measured from the Consumer
Concurrently running transactions

When a test increases the number of concurrent users, one of three things can happen:

  1. 1. The server takes less time (on average) to respond than at lower CRs levels. In this condition each CR finishes sooner, logs a response (a transaction) and makes its next request much sooner. TPS increases from lower CRs levels.
  2. 2. The server takes the same time (on average) to respond as at lower CRs levels. In this condition each CR takes the same amount of time but there are more CRs running concurrently. TPS goes up from the lower CRs levels proportionately to the CRs’ increase.
  3. 3. The server takes more time (on average) to respond than at the lower CRs levels. In this condition each CR finishes later resulting in fewer opportunities for the server to handle more requests. TPS drops proportionately to the increased response time.
Calibration What-If Chart

The nature of PushToTest Testmaker doesn’t enable exploring everything that might happend durig a Calibration test. However, Calibration What-If Chart lists a few significant issues to be aware of.

Calibration What-If Chart

Test Experience

Likely Problem

What to Do Next

Increase CRs with a decrease in TPS

Check the average response time.

Run a test case comparing response times as CRs increase. Identify the least acceptable response time and work back from there.

Increase CRs with an increase in TPS

Test is not calibrated for high enough CRs and payload sizes.

Run calibration test to determine optimal Scalability Index and set correct CRs and payload-size levels.

Increase CRs with little change in TPS

CRs levels are set too high. Check CPU utilization if doing a stateless test.

Run another test with the CRs level reduced by 50%.

Server CPU at 95% utilization and increasing CRs levels increases TPS

This is probably a stateful system test.

Run a test case to determine the service-under-test Scalability Index.

Consumer CPU at 95% utilization and increasing CRs levels decreases the TPS

CRs levels are too high for the number of load generating consumers.

Add more load generating consumers.

Consumer CPU utilization at 15%, Server CPU utilization at 30% and increasing CRs levels barely changes TPS

Network is probably saturated.

Check network bandwidth utilization. Add network adaptors to server or consider a faster network.

Calibration Testing shows a server's possible Transactions Per Second (TPS) levels given its equipment, software and network configuration. The next step in the PushToTest method is to run the real test at the calibrated test levels.

Additional documentation, product downloads and updates are at While the PushToTest testMaker software is distributed under an open-source license, the documenation remains (c) 2011 PushToTest. All rights reserved. PushToTest is a trademark of the PushToTest company.