Last Friday, I met with an individual that leads a 300-person team, responsible for running the networking and computing infrastructure in 50 data centers around the globe. I asked him what he thought of his OSS stack – the set of tools his team uses to stay on top of what’s going on in their infrastructure.
He hates it.
As I want to keep this blog post PG-rated, I’ll refrain from using his adjectives, but I can tell you he’s not happy with it. It’s a clobber of open source and commercial tools. The tools required a lot of customization and a variety of extensions written over the years. At the end of the day, though, it only gives him up/down monitoring and no ability to proactively avoid the next outage. Over 70% of outages occur due to human error and misconfigurations and the tools available to him are incapable of identifying even one percent of that.