Resolving a web application performance problem is as much an art as it is a technical challenge. Although performance problems can have many causes, the outcome is always the same—unscalable and slow-to-respond application software. To resolve such problems the best approach is an all-out systems approach where the application software, network, and the underlying computing hardware are all considered and evaluated. In this article we describe examples of performance problems and several approaches in solving them.
A performance problem could be caused by any number of things: a poorly designed architecture, an underpowered CPU, limited network bandwidth, or a combination of several factors. For example, a higher than expected load can easily overwhelm a system's resources. However, a higher volume is not always required to uncover performance problems. Poorly designed software that does not handle resource allocation and contention properly can easily cause deadlocks that eventually lead to nefarious performance problems even at a normal load.
Regardless of what causes a performance problem in a web-based application, the first step in resolving such a problem is to create a performance plan document—even if it is a short one. When you put such a document together, you should identify and involve all the domain experts relevant to the web-based application at hand.
Performance troubleshooting of a web-based system, much like a mind-body healing, requires a holistic approach. A web-based system is much more than just an application server, a few thousand lines of code, a database, and a firewall. It is more than the sum of its parts. It is an interconnected whole. Therefore, the most effective approach to solving a performance problem is to take a total systems approach, a process where all parts of the application domain are examined from the perspective of performance.
Why is it important, as a manager, or a lead engineer, to consider all the aspects of an application? The foundation of any web system includes software, hardware and the network. A short leg on any one of these three foundations can cause performance problems. A system's performance is dependent as much on a well-tuned, well-configured network as it is on fast computing hardware or well designed software architecture. Making your SQL statements more efficient will not solve a performance problem caused by a lack of network bandwidth. Faster computers will not solve a performance problem caused by a poorly configured routing table. Replacing your current network with faster fiber optics will not solve a performance problem caused by a lack of memory or an underpowered CPU.
For example, during one project we came across a performance problem during deployment. After initial investigations it was decided to add more CPUs and memory to the server machines. After an initial spike in the performance, the system slowly degraded and we were back in the same place. After reviewing and further testing we determined that the performance problem was a combination of an unscalable architecture and a poor network routing table configuration. The good news was that we had solved the problem. The bad news was that we had spent over $10,000 for hardware that had not solved our problem. Furthermore, we had spent a good amount of time chasing the wrong solution. Sometimes only a combination of software design, hardware, and network changes will solve a performance problem.
In another project we discovered that the underlying architecture of the system we were developing was not scalable in that too few users were able to logon to the system. The project was using MQSeries  as the input queue for the incoming requests. In our initial investigation, we discovered that the request input module was using only a single input queue. In our first attempt, we tried to solve the problem by changing the software design to a multi-threaded, multi-process architecture that used a number of input queues. To our surprise, however, we discovered during a production run that because many more users were now able to log in, the system itself would run out of key resources such as CPU and memory. It was as if we had upgraded a pickup truck with a more powerful engine so that the truck could move a heavier payload, but as a result it required a larger frame, a more sturdy chassis, and wider tires. Once our architecture was able to handle larger volumes it needed a faster machine to handle the larger load. Consequently, a software redesign had to be combined with faster hardware before the system performance problem could be solved and turned into a multi-process environment.
In an ideal world, you would catch and resolve all performance problems by load testing prior to deployment. In the real world, however, the test and production environments are not always identical. Even when the test and production environments are identical, a load test itself is only an approximation of the real load the application will face in production. Therefore, performance bottlenecks and slowdown issues often show up right after a new project is deployed. The key is being able to resolve such performance problems efficiently without too much guesswork and therefore avoiding losses in the deployment time and potential business revenues. So it is important to plan ahead for performance problems.
If you are dealing with a new project, schedule time throughout the project development for design reviews that focus on potential deadlock scenarios. While coding the application, think about inserting key timing and trace information that may come in handy later when debugging for performance issues. In addition, put together and involve a team of domain experts that will be key in helping you resolve a future performance problem.
Should a performance problem surface, make sure you have a plan. As mentioned previously, it is important that the plan be an overall performance resolution strategy where the effects of software design changes are evaluated in conjunction with the systems network and the underlying computing hardware. Although fancy tools and monitoring agents can be a great help when faced with a performance problem, the central part of resolving a web-based performance problem is a sharp-minded project manager or an experienced lead systems engineer with a good knowledge of the systems domain and armed with the plan.
In the rest of this article we will outline a series of steps that you may use to help you solve a crash or lockup problem.
Performance is a system-wide issue that spans consideration over application software, network system, and the underlying computing hardware. A solid performance strategy and a planned effort coordinated with the operations staff, QA, development, and network systems domain experts are key elements in resolving systems performance issues and creating an optimized and well tuned web application. Since software changes are often the easiest to make, a good approach would be to start with software code or software architectural changes first and then later proceed to bigger and costlier changes to the network or the computing hardware systems.
Jeffrey Blake is also a developer and consultant who helps clients solve performance problems of web-based applications.