Capacity Management Re-Branded: 2013

Thursday, 12 December 2013

IT's Day of Reckoning Draws Near

Bob Colwell, Intel's former chief architect, recently delivered a keynote speech proclaiming that Moore’s law will be dead within a decade. Of course, there has to come an end to every technological revolution - and we've certainly noted the stablization of processor clock speeds over recent years, in conjunction with an increasing density of cores per chip.

Moore's Law has been so dominant over the years, it has influenced every major hardware investment and every strategic data center decision. Over the last 40 years, we have seen a consistent increase in processing capacity - reflected in both the increase in processor speeds and the increased density of transistors per chip. In recent years, whilst processor clock speed has reached a plateau - the density of cores per chip has increased capacity (though not performance) markedly.

The ramifications of Moore's Law were felt acutely by IT operations, in two ways.

It was often better for CIOs to defer a sizable procurement by six or twelve months, to get more processing power for your money.
Conversely, the argument had a second edge - that it was not worthwhile carrying out any Capacity Management, because the price of hardware was cheap - and getting cheaper all the time.

So, let us speculate what happens to IT operations when Moore's Law no longer holds:

IT Hardware does not get cheaper over time. Indeed, we can speculate that costs may increase due to costs of energy, logistics etc. Advancements will continue to be made to capability and performance, though not at the same marked rate charted above.
The rate of hardware refresh slows due to the energy and space savings available in the next generation kit. Hardware will stay in support longer, and the costs of support will increase.
Converged architectures will gain more traction as the flexibility and increased intra-unit communication rates drive performance and efficiency.
You can't buy your way out of poor Capacity Management in the future. Therefore the function of sizing, managing and forecasting capacity becomes more strategic.

Since capacity management equates very closely to cost management, we can also speculate that these two functions will continue to evolve closely. This ties in neatly, though perhaps coincidentally, with the maturing of the cloud model into a truly dichotomous entity - being that a supplier and a provider will have two differing views of the same infrastructure. As the cloud models mature in this way, it becomes easier to compare the market for alternative providers on the basis of cost and quality.

Those organisations with a well-established Capacity Management function are well placed to navigate effectively as these twin forces play out over the next few years, provided they:

Understand that their primary function is to manage the risk margin in business services, ensuring sufficient headroom is aligned to current and future demands
Provide true insight into the marketplace in terms of the alternative cost / quality options (whether hardware or cloudsourced)
Develop effective interfaces within the enterprise to allow them to proactively address the impacts of forthcoming IT projects and business initiatives.

So - the day of reckoning draws near - and IT operations will adapt, as it always does. Life will go on - but perhaps with a little bit more careful capacity planning....

Tuesday, 10 December 2013

Disastrous Decision Making in IT

Disastrous Decision Making in IT from Robert Limbrey

Tuesday, 3 December 2013

The dichotomy of Capacity Management in a private cloud

The Pushmi-Pullyou - an analogy for the dichotomy of private cloud

The Fable

The two great heads of IT sat and stared at each other across a meeting room table. It was late in the day, and thankfully their services had all been restored. Now was the time for recriminations. The CIO had been called into firefighting meetings with the board all day. They knew he was going to be royally pissed off, but who was going to get the blame?

The beginning

The story began when service performance nose-dived. It was always a busy period, the lead-up to Christmas, but this season had been marked by some hugely successful promotional campaigns, and their online services had been humming with traffic. Nobody quite knew what caused it, but suddenly alarms started sounding. Throughput slowed to a trickle - and response times rocketed through the roof. Downtime. At first the infrastructure team, plunged into a triage and diagnostics scenario, did what they always did. Whilst some were busy pointing fingers, they formed a fire-fighting team, and quickly diagnosed the issue - they'd hit a capacity limit at a critical tier. As quickly as they could, they provisioned some additional infrastructure and slowly brought the systems back online.

The background

But why had all this happened? Some months ago, and at the advice of some highly-paid consultants, the CIO had restructured the business into a private cloud model. The infrastructure team provided a service to the applications team, using state-of-the-art automation systems. Each and every employee was soon enamoured with this new way of working, using ultra-modern interfaces to request and provision new capacity whenever they needed it. Crucially, the capacity management function was disbanded - it just seemed irrelevant when you could provision capacity in just a few moments.

The inquisition

But as the heads of IT discussed the situation it seemed there were some crucial gaps they had overlooked. The VP of Applications confessed that there there was very little work being done in profiling service demand, and in collaborating with the application owners to forecast future demands. He lacked the basic information to be able to determine service headroom - and crucially was unable to act proactively to provision the right amount of capacity. In an honest exchange, the VP infrastructure also admitted to failings in managing commitment levels of the virtual estate, and in sizing the physical infrastructure needed to keep on top of demand. In disbanding the Capacity Management function, they realized that they had fumbled - and in fact needed those skills in both of their teams.

The Conclusion

The ability to act pro-actively on infrastructure requirements distinguishes successful IT organisations from the crowd. What these heads of IT had realised is that the private cloud model enhances the need for Capacity Management, instead of diminishing it. The dichotomy of Capacity Management in the private cloud model is that these two functions belong to both sides of the table - to the provider, and to the consumer. Working independently, they would be able to improve demand forecasts and diminish the risk of performance issues. Working collaboratively, these twin dichotomies combine in a partnership that allows a most effective way of addressing and sizing capacity requirements to align and optimize cost and service headroom.

Take-aways

As a consumer, ensure you are continually well-informed on service demand and capacity profiles. Use these profiles to work with your application owners in forecasting different 'what if' scenarios. Use the results to identify which are the most important metrics, and prepare a plan of action when certain thresholds are reached.
As a provider, ensure you are continually tracking your infrastructure commitment levels and capacity levels. Use the best sizing tools you can find to identify the right-size infrastructure to provision for future scalability.
Have your capacity management teams work collaboratively to form an effective partnership that ensures cost-efficient infrastructure delivery and most effective headroom management.

Will you wait for your own downtime before acting?

Tuesday, 24 September 2013

Dear CFO - 4 factors to evaluate before you invest in new IT

written in response to this great article on WSJ Blogs: http://t.co/lbyNeeMlxm

Dear CFO,

before you decide on your strategy for investing in IT assets, whether on-premise or public cloud, there are a couple of important correlations I think it's important for you to consider.

Clearly, your decision ultimately will be based on:

The fit with your longer-term business plans
The measurable benefit to your business
The investment needed

Investments in IT capacity enable your business to transact a certain amount of business. In similarity with other areas of your business, investments in capacity should be made to alleviate bottlenecks and increase the ability to transact business. However, in IT there are certain complexities to factor in:

comparing and contrasting capacity options has descended into a "dark art", with many stakeholders and an over-riding aversion to risk
measuring capacity usage has become a specialized platform function, leading to difficulties in getting an end-to-end perspective of how much business can be transacted
an increasingly agile enterprise is causing rapid fluctuations in capacity requirements, again with an aversion to risk

Long ago, a management function was created to address these problems for the mainframe platform - Capacity Management. That function can be leveraged again now, to allow you to plan ahead effectively for your long-term IT future. Evaluating that function within your IT department, you should ensure it:

endeavours to provide a complete picture of capacity usage across all silos
provides visibility of service headroom, potential bottlenecks and abundances
couples together with your financial management controls, providing governance over capacity allocation
gives insight into future business scenarios, allowing investments to be rebalanced against the needs to transact business

In summary - if capacity management can be harnessed to better manage the costs of IT capacity, greater focus can be made on transformational activities that add value in other ways.

Tuesday, 16 July 2013

Accelerate Innovation - by increasing efficiency

I'm always highly impressed by the number of innovations coming from IT teams to tackle unique business challenges. Some organisations leverage the power of Business Intelligence to improve marketing and operational analytics. Other stories highlight the importance of connecting people and improving collaboration. The more I read about IT innovators, the prouder I feel of our industry in general -> staffed full of amazing problem solvers, equipped with amazing technology to help them along the way.

But the life of an IT executive is not all about innovation. That's the headline grabbing stuff. If you'll pardon the duck analogy, there is an awful lot of paddling going on under the water to keep the lights on. And when you look at the data - where up to 70% of the IT budget is being spent on operations, the struggle becomes immediately apparent.

But what if there was a way of liberating some of that 70%? After all, most datacenters are operating at less than 20% capacity at peak. One UK customer told me that their windows estate was running at around 7% of capacity - during peak hours. Even assuming a high-availability scenario, where capacity levels should be maintained at less than 50% --> there is still a significant amount of overspend on excess capacity: possibly as much as double.

Were we to apply some of our innovation capabilities to this problem, it should be clear that we as an industry are struggling with a conflict of interest around sizing. Every risk-taking bone in our bodies is shouting out "more!" - the last thing we want to be associated with is a non-responsive service, crushed beneath the weight of demand. Whenever we meet with a vendor, an outsourcer, a new market - the answer is always "more!".

Entropy never decreases, and the voices for "more!" are louder and more consistent than ever. And so we face a challenge. "More!" is expensive. Looking at Koomey's article on TCO, we can see that capital expense (software and hardware) accounts for about 1/3 of buying an asset. In this model, a $10,000 server would have TCO of around $60,000 over a 3-year period.

A simple financial calculation should help deduce that running low utilization in datacenters may be wasting a huge amount of our IT budgets, on power, on facilities, on routine administration tasks. For every 100 servers running at 10% utilization, that could equate to over $1M in potential savings every year.

This has always been the great argument for virtualization. Consolidate these windows servers, without introducing risks of compatibility, or costs for administration. The great wave of virtualization has swept over us, leaving many organisations with highly virtualized landscapes, and little further room for optimization.

Except, that assumption is wrong. Native sizing tools for virtualized landscapes depend on inefficient algorithms, using a flawed assumption - that every MHz is the same. It turns out that we are still oversizing our virtualized landscapes - by as much as 2x.

The point of sizing efficiently is not an arbitrary one. Every server that is not provisioned may be saving the enterprise $60,000 or more. Re-investing these savings into creative projects will only help drive successful

innovators, and generate more and more CIO headlines.

Saturday, 27 April 2013

Capacity Planning the #Devops way

The notion of #Devops serves to accelerate time to market through greater cohesion in the release management life cycle.

So called 'service virtualisation', such as offerings from IBM and CA LISA, enables modular testing practise by learning typical behaviour patterns of defined systems. The effect is a more tightly focused testing process that reduces the dependency on external (inert) services.

Release Automation, such as in the newly acquired Nolio solution, allows the testing process to be further streamlined by providing cohesion through the multistage process. The benefits are most highly felt where complex dependencies and configurations add magnitude to setup and teardown for QA.

Agile methods need agile release management processes, and this is the whole point of #Devops. However the risks in this agile thinking come in end- to-end performance.

The missing link here is provided by prerelease Capacity Planning (such as provided by CA Performance Optimizer) , a virtual integration lab that brings together the performance and capacity footprints of each component in the production service. And while some of those components may be changing and therefore measured through the release management process, others are not - and are measured in production. Creating and assimilating component performance models allows the impact of each sprint to show on IT operations.

Capacity Planning is a true #Devops process. Only by adapting the capacity plan to take into account the step changes due to software release, can the risks of poor scalability and damaging impact be accurately guarded against.

Monday, 25 March 2013

Take a Capacity Healthcheck

Heart, lungs, liver, kidneys: we all recognize the importance of looking after your vital organs - and regularly take medical advice whenever we have concerns. Yet it is not the fitness of individual organs that causes concern - it is the fitness of the weakest.

But what about our IT enterprise? It's not uncommon - even in 2013 - to come across siloed thinking that stifles the health of the organisation. Purchasing decisions are made within the silos, leading to distorted allocations of capacity based on political, rather than engineering needs. Further - due to the manifestation of these silos, entropy increases as financial accountability struggles to permeate the organisation. Provisioning decisions are made on a risk-averse basis without insight into how business demands translate into capacity requirements.

And now it's time to change.

The financial crisis has caused a significant change in emphasis in most major corporations. IDC estimate that over 50% of major corporations are actively planning investments in better capacity management functions. Cloud-sourcing is on the increase to assist with the deferment of cost, and a refocus of investment on the core business. Capacity has become a commodity, and in an open economy, is becoming subject to the same commercial forces that balance cost and quality in any marketplace.

But how can enterprises leverage their purchasing power, when they don't know how much commodity capacity they really need? How can they right-size investments and defer costs without incurring risk to their top-line revenues?

Actually, this is a question that enterprises have addressed in many of their other lines of operation. Full financial accountability has ensured the right-sizing of many other enterprise assets; whether that is employees, hot-desks, freight containers or manufacturing capacity. Successful companies have figured out that costs must be aligned proportionately with revenues. The only thing that makes IT different is the complexity, and the lack of insight.

So where to start in right-sizing IT?

The answer is perhaps startlingly clear - you should begin where you always begin, with your requirements in mind: measure capacity consumption across all silos. The trick then is to bring in a method of normalization, a model library that provides weighting factors according to the make and configuration of your estate. This same method can then be applied to plan a migration, transposing configurations easily to determine the optimum sizing on alternative real-estate.

So what's stopping you? Find a friendly capacity management service provider and ask them about their IT Capacity Healthcheck. If they know what they're doing: they'll get their scheduler out straight away. And if they don't? Well - drop me a tweet or an email and I'll point you in the right direction.

Thursday, 24 January 2013

Consumer/Provider : the twin forces in Capacity Management

Those schooled in traditional IT capacity management have long recognised the cause and effect of observed system behaviour. Few have managed to bridge the gap in quantifying the correlation, however, and for good reason. Straying too deep into this territory can leave you struggling with data overload, and no way of mapping volumetric and utilization data together. The age of the CMDB and automated discovery and mapping has changed the landscspe in thus regard. Finally, using configuration mapping to correlate volumetric data against utilization data can be done reliably, consistently and accurately since all feeds are automated.

Correlating service throughput against observed utilization provides intelligence to optimise design, streamline performance, & predict and optimize application scalability. But in a consumer/provider scenario there are two contexts to consider. Presenting the customer with data about your underlying infrastructure utilisation lays bare the margin or risk levels of your operating model. Equally, the customer's main concern is ensuring their service levels are not jeopardised, and they are not burdened with excessive costs for underutilized environments.

Despite the advantages commonly sought in quantifying the capacity of the physical environment, it is the capacity of the contractual environment that is crucial to the customer. In a cloud context, the provider must diligently ensure the reliability if their operating model. This is crucial to brand equity. But the customer's primary concern will be in managing the flexibility of their service based contract, and ensuring that risks are properly balanced against costs.

Monday, 14 January 2013

To Transform IT - Revisit The Basics

Big Data and the world of business analytics has much in common with Capacity and Demand Management as we know it. Pertaining to derive competitive advantage by acting on timely business intelligence, business operations analytics requires number crunching on a huge scale. In the highly competitive world of e-business, the imperative for business agility reaches its peak. Where aligning appropriate investment with prevailing demand becomes a critical business decision, no less is the importance of that decision to the world of capacity management. Meaning - business agility depends on Big Data to make sure there are enough sales reps selling hot products, to make sure there are enough of the right sort of product on the shelves, and to commit the right amount of marketing to the products or services delivering the highest profit. The connection with the IT cloud here is clear - aligning IT resources to demand is equally critical to the agile business. Indeed, such agility is one of the main drivers behind cloud computing. By transforming IT delivery into a service model, one has the ability to quickly and easily ramp up investment when warranted by demand - or to ramp down.

Nice idea. But does this happen widely in the field? Evidence indicates that the transformation to the cloud model in the majority of organisations has hit a glass ceiling. With the existence of service catalogues, virtual adaptable infrastructures, and increasingly automated processes - IT organisations have put in place the basic ingredients that enable some of the tasks associated with cloud service provision. However, the vision of agile IT-on-demand has been held back by slow adoption of a business-integrated view more aligned with the balance sheet. IT resources like many business resources come at a cost. Not just a cost to purchase, but a cost to provision, a cost to operate, a cost to maintain and a cost to support. Factoring cost of ownership into resource provisioning requests, and aligning investment appropriately according to demand, are the two pre-requisites for the agile business.

These pre-requisites translate themselves into two management capabilities that are widely missing in IT operations today. Adding these capabilities to IT management functions will not only provide insight and control over efficient use of IT resources, but will also provide consumer-friendly insight to support optimal alignment of resources.

Firstly, by garnering control of operational costs - IT cloud operations can start to truly drive efficient investments. A simple cost / utilization correlation is an excellent start to determining efficiency. For the most accurate approach, this analysis should be carried out by the Capacity Management team to factor in variables like the different sorts of utilisation (meaning virtual, physical, logical - all of which are environmental related), and for what-if scenario analysis to determine the possibilities for optimization.

Secondly, by assessing usage patterns quickly, dynamically and providing short and mid-term trends to the consumer. The aim here is to ensure the right amount of headroom is maintained in the environment. A service-aligned view of capacity allocated is essential, such that the service headroom can be correctly calculated according to the weakest link in the chain. The insight that's needed is to gauge whether the service headroom will be sufficient to meet demands, according to trends, forecasts and other business analytics. Regression and correlation between workload and resource usage is another function classically described in Capacity Management.

So - what are we saying here? That Capacity Management is the missing link between cloud operations and an agile enterprise? No - not quite. Capacity Management as it is currently executed and understood is not fit for this purpose. However, connecting Capacity Management with both the demand cycle - notably from a service not an infrastructure point of view, and also with Financial Management has the ability to disrupt the enterprise cloud, and transform it to become a true partner to the agile business.