An esteemed consultant friend of mine once commented - "in capacity management, it is the step changes in capacity that are the most difficult to plan for". In agile release practise, such step changes are increasing in frequency. As each new release hits, the historical metrics describing quality of service data lose relevance, making capacity planning harder.
To respond to this change, an agile capacity management practice is called for, which must be lightweight, largely automated, and relevant to both deployed software and software not yet released. Indeed, the process must be able to support all aspects of the DevOps performance cycle - from infrastructure sizing, through unit and load testing, to operational capacity management. In shared environments, such as cloud infrastructures, it is easy to become lost in the "big data" of application or infrastructure performance.
When executing a DevOps strategy however, it is critical to embed performance and capacity management as a core principle - structuring the big data to become relevant and actionable. Here are 5 top tips for success:
To respond to this change, an agile capacity management practice is called for, which must be lightweight, largely automated, and relevant to both deployed software and software not yet released. Indeed, the process must be able to support all aspects of the DevOps performance cycle - from infrastructure sizing, through unit and load testing, to operational capacity management. In shared environments, such as cloud infrastructures, it is easy to become lost in the "big data" of application or infrastructure performance.
When executing a DevOps strategy however, it is critical to embed performance and capacity management as a core principle - structuring the big data to become relevant and actionable. Here are 5 top tips for success:
1. A well-defined capacity management information system (CMIS) is fundamental
The foundation of your capacity management capability is data - building a strong foundation with a capacity
management information system is crucial. The purpose of this foundation is to capture all relevant metrics that assist a predictive process, a process that provides insight about the current environment to help drive future decision-making. Context is crucial, and configuration information must be captured - to contain virtual and physical machine specifications along with service configuration data. It is advisable also to design this system to be able to accommodate business contextual data as well, such as costs, workloads or revenues. Automation of the data collection is critical when designing an agile process, and this system should be designed to be scalable so to deliver quick wins, but grow to cover all the platforms in your application infrastructures. This system should not replace or duplicate any existing monitoring, since it will not be used for real-time purposes. Also note: it is easy to over-engineer this system for its purpose, hence another reason to adopt a scalable system that can grow to accommodate carefully selected metrics.
CMIS takes data from real-time monitors |
2. Aquire a knowledge base around platform capacity
A knowledge base is crucial when comparing platform capabilities. Whether you are looking at legacy AIX
server or a modern HP blade, you must know how those platforms compare in both performance and capacity. The knowledge base must be well maintained and reliable, so that you have accurate insight over the latest models on the market as well as the older models that may be deployed in your data centres. For smaller organisations, building your own knowledge base may be a viable option, however beware of architectural nuances which affect platform scalability (such as logical threading, or hypervisor overheads). For this reason, it is practical to acquire a commercially maintained knowledge base - and avoid benchmarks provided by the platform vendors. Avoid the use of MHz as a benchmark, it is highly inaccurate. Early in the design stage for new applications, this knowledge base will become a powerful ally - especially when correlated against current environmental usage patterns.
Quantify capacity of different platforms |
3. Load Testing is for validation only
For agile releases, incremental change makes it expensive to provision and assemble end-to-end test
environments, and time-consuming to execute. However, load testing still remains a critical part of the performance/capacity DevOps cycle. Modern testing practise has "shifted left" the testing phase, using service virtualization and release automation, resulting in component-level performance profiling activity that provides us with a powerful datapoint in our DevOps process. By assimilating these early-stage performance-tested datapoints into our DevOps thinking, we can provide early insight into the effect of change. For this to be effective, a predictive modelling function of some sort is required, where the performance profile can be scaled to production volumes and "swapped in" to the production model. Such a capability has been described in the past as a "virtual test lab". For smaller organisations, this could be possible with an Excel spreadsheet, although factoring in the scalability and infrastructure knowledge base will be a challenge.
DevOps and performance testing |
4. Prudently apply predictive analytics
Predictive Analytics at work |
To be relevant, predictive analytics need to account for change in your environment - predictive analytics applied only to operational environments are no longer enough. In a DevOps process, change is determined by release, so investing in a modelling capability that allows you to simulate application scalability and the impact of the new release is crucial. Ask yourself the question - "how detailed do you need to be?" to help drive a top-down, incremental path to delivering the results you need. Although it is easy and tempting to profile performance in detail, it can be very time-consuming to do. Predictive analytics are fundamentally there to support decision-making on provisioning the right-amount of capacity to meet demand - it can be time-consuming and problematic to use them to predict code- or application- bottlenecks. Investment in a well-rounded application and infrastructure monitoring capability for alerting and diagnostics remains as important as it ever did.
5. Pause, ensure to measure the value
As a supporting DevOps process, it can be easy to overlook the importance of planning ahead for
performance and capacity. Combining the outputs with business context, such as costs, throughputs or revenues will highlight the value what you are doing. One example is to add your infrastucture cost model to your capacity analyics - and add transparency into the cost of capacity. By combining these costs with utilization patterns, you can easily show a cost-efficiency metric which can drive further optimization. The capacity management DevOps process is there to increase your agility by reducing the time spent in redundant testing, provide greater predictability into the outcomes of new releases, improve cost-efficiency in expensive production environments, and provide executives with the planning support they need in aligning with other IT or business change projects.
Showing cost-efficiency of infrastructure used |