Why does infrastructure operations still suck?
Last Friday, I met with an individual that leads a 300-person team, responsible for running the networking and computing infrastructure in 50 data centers around the globe. I asked him what he thought of his OSS stack – the set of tools his team uses to stay on top of what’s going on in their infrastructure.
He hates it.
As I want to keep this blog post PG-rated, I’ll refrain from using his adjectives, but I can tell you he’s not happy with it. It’s a clobber of open source and commercial tools. The tools required a lot of customization and a variety of extensions written over the years. At the end of the day, though, it only gives him up/down monitoring and no ability to proactively avoid the next outage. Over 70% of outages occur due to human error and misconfigurations and the tools available to him are incapable of identifying even one percent of that.
It is amazing to me that the market of Infrastructure Operations has barely changed with regards to getting visibility into your infrastructure – an activity commonly referred to as monitoring: Still the same SNMP monitoring tools, still flooding admins with data and alerts, still staying on the defensive and being reactive, still waking up every morning to a new surprise. The person I met on Friday, a veteran of the industry, actually estimates that Infrastructure Monitoring has been going backwards in recent years. The infrastructure is growing in complexity and the monitoring tools aren’t changing their approach to providing true visibility, so they are becoming even less useful at their job.
Just in the networking space, Wikipedia lists 65 different tools. I have seen most of these tools in use by mid-size and very large enterprises. They are usually very similar – come with some basic monitoring functionality and allow some customizations and extensibility. Only after the team invests dozens of man years in setting up the system, is it capable of only telling them if their network is up or down. How useful is that?
When we started working on our product, it was baffling to us. Over 60 tools? That’s unheard of in tech. In each market, there are usually 5-10 competitors with one or two being dominant. Look at workstations (Microsoft), server (Unix-like systems), networking gear (Cisco), load balancing (F5), CRM (Salesforce), Server Virtualization (VMware) and other markets. A market with 60 tools generally means it is either huge (tens-to-hundreds of billions, capable of supporting so many competitors) or simply all the solutions haven’t delivered, so new ones keep showing up. Many customers we speak with have several different tools, with overlapping capabilities, all of them not delivering.
This needs to end. A superior technology and product must surface and provide a solution customers have been waiting for nearly two decades. A few years ago we, at indeni, have decided to contend for this title and are welcoming anyone else attempting to do the same.
We have been successful at bringing proactivity to network and security teams around the globe who are utilizing Cisco, Check Point, F5 and Palo Alto Networks in their environments.
Over the next few months, we will be rolling out our strategy for delivering true proactivity to every single piece of infrastructure deployed in large enterprises. I will be detailing our plans and actions, and the rationale behind them, over a series of posts.
Stay tuned, the year 2016 will be a turning point in this market!