Avatar

Long Outages Prevented By Prediction and Simple-to-Understand Fixes

Old-school monitoring provides network administrators simple metrics and indications that a problem exists at a particular device or location in a network infrastructure. But without insight into precise problems and their causes, difficult-to-diagnose issues can quickly turn into costly outages, inaccessibility, or downtime.

By contrast, proactive infrastructure monitoring solutions tap into device APIs for statistics and to read configurations, arriving at deeper insights than simple metrics make possible. These solutions also consider and validate the best solutions provided by a group of practitioners, using troubleshooting fixes crowdsourced from industry experts to provide administrators with easy-to-follow directions that will correct the issue.

In other words, unlike the limited and siloed information conventional monitoring tools give IT pros, modern proactive monitoring solutions can consume large amounts of data from across the enterprise’s infrastructure. It can be turned into accurate projections about when a problem is likely to occur, giving administrators clear insights about the best course of action to correct it—all in simple-to-understand human language.

Here’s a look at the various infrastructure components that can cause headaches for your team if they’re not carefully monitored and properly managed.

Switches

Networks all start with switches. One easy way to understand what a switch does is to compare its role in a network with a router. Switches create a network; routers connect networks.

Many mysterious network problems involve switches. One of the most common causes is when a change is made to a switch without saving configurations. A problem might have caused a switch to crash, resulting in a core dump and reboot. Whenever a switch is power cycled, any unsaved configurations causing the switch to be out of sync, and problems are likely to ensue.

Routers

Routers connect disparate networks. Advanced infrastructure health tools can scour routing paths calculated by different protocols for potential interruptions or vulnerabilities. Advanced monitoring can even determine when vendor-recommended configurations may be the unlikely (or at least unforeseen) cause of problems for many users. While many router-related issues are standard across the board, there can also problems at the router level that are brand-specific.

For example, Cisco routers enable a feature called Proxy ARP by default, in which a router with the Proxy ARP feature enabled will reply to any broadcast with its own MAC address. Clients that try to communicate with devices outside the local network will be sent to the router that then forwards the traffic.

Despite the potential benefits of such a feature, it’s also fraught with risk. Any device can be reached by sending an ARP request, which may increase the amount of ARP traffic on your network. That would make it harder to detect ARP spoofing, since an attacker could easily hide behind the MAC address of the router or switch.

Conventional monitoring tools won’t uncover the problem until it’s too late. Proactive solutions, however, are able to scour your entire environment to identify potentially problematic or vulnerable routers and recommend disabling or otherwise re-configuring the devices through a vetted, crowd-sourced script with a step-by-step guide for how to apply it.

Load Balancers

A load balancer takes a request for a resource (such as a server) and directs it to one of the available systems according to a load-balancing policy. These policies can be based on simple round-robin rotation or on which server has the lowest system load to ensure network and application stability as well as an overall optimized performance.

Load balancing requires checking which systems are available and making sure to spread user requests intelligently across multiple servers. It also ensures that requests from a user who already has a session will go to the same server (otherwise that session’s work will never be completed).

Load balanced systems may also be clustered, allowing even sessions in progress to continue in the event of a hardware failure. However, one of the difficult-to-diagnose problems involving a load balancer might come as the result in email failure. For example, despite having backup and failover email servers, it could take hours of investigation to discover that the secondary member was not configured for the mail VLAN.

Firewalls

Firewalls run on a machine, real or virtual. A number of issues including memory leaks can cause a core dump and reset. Other difficult-to-diagnose problems might result from a myriad of firewall misconfigurations that can degrade performance or create security vulnerability.

Servers

Beyond the kind of problem discussed in load balancing, servers may be involved in a number of other hidden issues. Often, those challenges are related in some manner to certification. For example, a server’s attempts to authenticate a device asking for its resources may fail if a Certificate Authority is not included in the VLAN configuration.

Network Storage

Storage shared across the network and used by databases can fail for a number of reasons, creating huge and costly outages that take your team offline for prolonged periods of time. In relational databases (DB), for example, reserve memory called a cursor for any ongoing database transaction. Occasionally, an abundance of database cursors may have been opened, draining memory and causing the DB to time out.

End-To-End Proactive Network Monitoring

Today’s IT infrastructures are complex, sprawling entities that require time and attention to maximize performance and optimize health. Conventional monitoring tools are limited in scope and capability, usually revealing issues in the environment after they’ve become problems in need of an immediate and reactive response.

In contrast, the best modern monitoring solutions look at all network components in combination and simplify or even automate resolution of the many mysterious network problems and outages that occur–even if such a problem is the result of something as simple as a configuration or introduction of a brand new network device feature.

From routers and switches, to load balancing and storage optimization, proactive monitoring systems periodically check configurations to detect issues like clock drift or the presence of core dumps in real time, while offering simple remediation steps before firewall issues create a security issue and result in downtime.

By analyzing hundreds of device statistics in real-time and combining them with insights and expertise from thousands of the world’s leading IT professionals, Indeni proactive monitoring solutions help find errors before they become problems, reduce enterprise downtime, and free administrators to focus more heavily on higher-value activities and strategies.

Join Indeni Crowd, a community for next-generation IT professionals