Sudoku may be the latest paper-and-pencil craze, but application-performance management is the puzzle driving IT managers nuts. Application architectures are complicated, and data center virtualization adds complexity to the quagmire of performance management. If your bottom line depends on application speed and user satisfaction, application performance is critical—and duct-taping overlapping tools together is not an adequate strategy. APM vendors want to deliver predictive application performance based on new, holistic approaches that could offer better, faster and more reliable monitoring, but how can your organization be sure of a good fit?
Unlike many first-generation products that depend on a manual, labor-intensive approach, products like Hewlett-Packard’s Mercury Application Mapping strive to discover and map relationships between apps and underlying infrastructure automatically. Although few other APM vendors offer this mapping feature, products from Digital Fuel Technologies, Integrien, Oblicore and OpTier are working to fill this gap or can act as an overall consolidation point for APM data from a variety of vendors. Once a problem has been identified, holistic APM should take corrective actions to resolve performance issues or to integrate with other tools that can, such as those from Opalis Software, Opsware/iConclude or RealOps. This may include allocating additional network bandwidth, processing capability on the server, or even rolling back configuration changes. Without corrective-action capability, organizations must be content with fast problem identification and notification. This is a big step forward in many cases, but not ideal.
Holistic Architecture
Holistic design generally implies a setup that functions in harmony with the surrounding environment, all the disparate pieces fitting gracefully into the bigger picture, but in APM, this technological nirvana has yet to be fully realized. The APM architecture includes several components installed at the network and application layers. Although complex to implement, agents are critical, especially on application servers and supporting system components. Without information from the agents, it’s difficult to pinpoint a problem’s cause—one of the most common sources of application-performance puzzles.
Additionally, agents installed on application tiers, the OS, hardware, database components and even client workstations will detect problems that are affecting the app, such as memory usage, CPU and network activity. Many IT organizations may forgo the client agent that tracks performance glitches resulting from the user workstation, but where customer satisfaction counts, this is critical. Likewise, organizations may need to install agents on numerous VMs, depending on their needs. Since pricing for these tools is based on a per-deployment model, the architecture can have an impact on purchasing decisions. Synthetic transaction monitoring detects performance snags during off-peak hours and finds problems users may experience but not report, while network probes capture actual user data so you can monitor and baseline the end-user experience and detect problems during an application slowdown.
|
Even if organizations collect all this data, without a central engine to provide correlations and analysis, IT managers will quickly be overwhelmed and unable to isolate and resolve problems. In a holistic APM architecture, data must be collected by a midtier server, then forwarded to a correlation engine for root-cause analysis. Many organizations are then looking to problem-automation software to take corrective action once the correlation engine has detected a problem. A holistic APM architecture (see graph ‘Holistic APM Architecture’) takes all aspects of performance into consideration and helps dig beneath the surface diagnostic to find the real cause of a problem. As more converged apps are widely deployed, the network demand and the number of potential performance problems will only increase. Gaps in the architecture will create holes in your understanding of a problem, and ultimately increase the MTTR (mean time to repair), frustrations and cost.
Active, Passive and Beyond Most vendors use the term APM to address any type of application-performance management, defining products using basic active/passive categories. While these categories may be over-simplified in most cases, active and passive tools still provide the backbone of holistic approaches designed to provide as much data from different perspectives as possible in order to triangulate and target the actual performance problem. Although this can result in duplicate monitoring and redundancy with existing management, a holistic approach can save IT time and reduce headaches when trying to identify and troubleshoot the issue.
Synthetic transaction products avoid the need to deploy a specific agent to detect a performance problem. Instead they mimic the actions of real users on your system, and may place additional load on the app being monitored. Users often need to work with the applications team to build appropriate synthetic transactions. Also, certain applications may require some modification—after all, you don’t actually want to ship the product a simulated customer orders from your e-commerce site.
Passive monitoring tools track network application traffic and avoid any additional load on the apps, or they deploy agents on clients, application servers or hardware. Some tools track and measure end-user response time without an agent. As TCP application packets travel through the network, passive monitors track network round-trip time, server response time, data transfer time and other key metrics. Although this method is less intrusive, its viability is determined by your network architecture. In a distributed environment, you may need many passive appliances to track all application data.
Synthetic-transaction-monitoring products simulate an end-user and perform scripted, macro-like transactions against an application and report on the results. This may identify performance problems from a user perspective, but without agent technology it’s difficult to isolate the actual cause of the performance problem on the application or the hardware and OS infrastructure. Symantec offers Indepth, Inform and Insight—a suite of products it calls i3—to provide several options for APM offering agents and synthetic transactions. Indepth agents provide deep application metrics for J2EE and .Net apps as well as other common enterprise apps. Indepth also provides information on the internals of the application that may be causing a performance problem. Inform adds alerting, trending and performance reporting, while Insight aggregates response information across app tiers, applying algorithms to correlate activity to individual transactions.
|
Symantec introduced Insight Inquire this January; this product adds a synthetic transaction monitoring capability to track availability and performance of critical Web apps. Insight Inquire injects synthetic transactions into an application’s transaction stream, monitoring availability and performance—with multiple instances available at no additional charge, allowing installation at different geographic locations. Although application performance is the primary focus, Symantec doesn’t provide visibility into network performance but needs server agents to determine what’s wrong with an application.
HP’s Mercury End User Management proactively monitors Web site and app availability in real time, from an end-user perspective. It simulates end-user processes against apps for common Web and enterprise apps from PeopleSoft, Oracle, SAP and others. Mercury Real User Monitor complements synthetic monitors for environments where there are a large number of users distributed across multiple locations. It tracks individual user information to the specific application that handles the user request, allowing IT managers to focus on discrete users or periods in time to catch problems. With the breadth of product offerings from HP and its licensing model, many IT pros might find it confusing to choose the right products and components to meet their particular situation. BMC’s Performance Manager/Transaction Manager, IBM’s Tivoli Composite Application Manager, NetIQ’s AppManager and Quest Software’s Foglight products are also strong in this area.
Network Probe Monitoring
In complex enterprises, one of the keys to understanding and resolving application-performance puzzles is correlating application response time with other application and network activity. NetScout’s nGenius monitors test response time of key business apps, providing a broad context for analyzing problems. It tests application traffic against service-level delivery and measurements for troubleshooting problems with end-users. NetScout’s approach provides a context for application-response time that includes traffic volume, utilization, error conditions, alarms, hosts, conversations and packet captures. However, once a performance problem is detected, nGenius doesn’t offer server agents that can indicate what component of the application may be causing the glitch. NetQoS’ SuperAgent also tracks and measures end-user response time—without desktop or server agents. It monitors all TCP application packets as they travel through the network, providing a way to measure round-trip time, server-response time, data-transfer time and other metrics. SuperAgent breaks response time into its basic components: application, network and server latency. NetQoS continually measures and analyzes performance for all transactions, compares the response time against the baselines, and alerts IT when performance deteriorates. As with NetScout’s product, you’ll need another piece, such as collection agents from Quest Software, Symantec or Wily Technology, to grab information within an app after a performance problem is detected.
Merging Monitors & Probes
Although some organizations find it difficult to deploy agents on all applications, many are using passive, network probe technology to monitor all application traffic on their networks, while using agent technology to provide a deeper level of monitoring of critical systems. Wily, purchased by CA in March 2006, offers application agents and network traffic analyzers to collect the detailed information required to diagnose performance issues. Wily Introscope agents collect performance data from various components inside Web applications, then report these metrics to the Collector Enterprise Manager. It acts as the repository of performance metrics and receives data from one or more Introscope agents, letting users collect data centrally from many applications, application servers and supporting systems.
|
With that information, the Collector Enterprise Manager processes performance data and makes it available to users for production monitoring, triage and diagnosis. Introscope’s approach uses byte-code instrumentation of the J2EE applications as the agents are loaded into the Java classes. Wily in March announced a synthetic transaction product that will complement its Customer Experience Manager appliance which monitors all actual transactions. This appliance resides at the switch-level, connected to a SPAN port. Wily doesn’t focus on the underlying network-performance management, nor does it offer an easy way to integrate and correlate network- or application-performance data from other systems into its product.
Quest also provides tools that combine server-agent technology with network-probe analysis. Foglight Experience Monitor uses network traffic monitoring and minimizes the impact on the network infrastructure and applications. Experience Monitor tracks each individual user’s interaction with the applications, and aggregates the data to a central reporting platform. Also included in Foglight product line are a number of ‘Cartridges’ that focus on specific active monitoring solutions. Foglight Cartridges monitor a variety of applications including Java, .Net, Oracle, SAP and PeopleSoft, collecting diagnostic information on poorly performing transactions. Foglight can discover application problems and through a host of other products offer some help in this area.
Putting the Pieces Together
While contemplating the ideal, holistic APM, there are a few realistic limitations. The ability to correct performance problems may be limited by external factors beyond APM capability. If the problem lies in the application, for example, and wasn’t detected in testing or quality assurance, it may be outside the scope of APM. Performance problems might be the result of a change such as a new security patch, in which case configuration-management vendors come into play to rollback those changes and pinpoint the problem. Critical for the success of the next generation of application management will be the ability to automatically correlate network, system and application problems and take corrective action to resolve them.