Why Aren’t Our Networks Staying Up?

images (41)Does anyone besides me remember the phone system? You could be just about anywhere in the world at any time and you could pick up a phone, call someone, and your call would go right through. The Plain Old Telephone System (POTS) just worked. Now nothing is ever perfect and the POTS wasn’t perfect either, but it was 99.999% perfect which meant that it only didn’t work for about 5 minutes per year. Clearly, despite the importance of information technology, the networks that we’re designing and building today don’t work anywhere this reliably. Why not?

Network Outages Seem To Be A Part Of Life

Back in the day, when the phone network was “the network”, an outage was a big deal. It got stories written about it in papers and people talked about it on TV. The reason that it was such a big deal was because it didn’t happen very often. Things have certainly changed. In the first half of 2015 alone the NYSE halted trading because of a technical glitch and United Airlines had to ground all of their airlines because of problems with a program that scheduled pilots.

I think that there are a few things that have probably lead us to where we find ourselves today. First off, as any person with the CIO job can tell you, we have a lot more networks that we are using to run things. In any given company there are the networks that deal with creating the products and services that the company sells and then there are the networks that are used to actually run the company. Just to make things a little bit more difficult, each of these networks is now more complex. They have more boxes and software and other components that make them up.

One of the other reasons that network outages seem to occur more frequently these days is simply because CIOs don’t staff their IT departments to deal with outages like the phone company used to. For regulatory reasons, the phone company was under the gun when they had a network outage. They needed to fix it fast. This meant that they hired and trained an army of skilled technicians who would spring into action any time there was a network outage. People in the CIO position don’t do this today and so our outages tend to last much longer.

It’s All About Managing Change

As a CIO you would prefer that your corporate networks not experience any downtime. However, of course, this will never be possible. What we need to do is to take some time and try to get to the root cause of just exactly why we and our peers are seeing so many high profile network outages.

I sorta hate to say this, but the answer to this question is actually pretty obvious. The reason that so many of us CIOs have been experiencing network outages is because of the high rate of change that is occurring within our networks. Just when we get our network stable and configured the way that it has to be in order to work with and for our company, along comes yet another change. The change can be either hardware or software but because it changes our network into a partially upgraded beast for a while, bad things can easily happen.

As a CIO we can’t always prevent outages from happening. However, what we can do is to take steps to minimize the probability that they will occur. What we need to do within our IT departments is to make sure that we are using solid well-documented and automated processes as much as possible in order to test, build, upgrade and configure our networks. It’s only by doing this that we’ll start to drive some of human error out of these processes and reduce the possibility of having yet another network outage.

What All Of This Means For You

I believe that most of us can remember a time when there were things that just seemed to always work. Now we find ourselves living in an age where a computer failure seems to take down the NYSE every month or so, Internet providers experience massive outages, etc. Why do today’s modern networks seem to work so much worse than the phone network of yesterday?

It turns out that there are a lot of different reasons that are all contributing to our current lack of network reliability. Many firms prefer to invest in other things until they experience an infrastructure problem. In the past firms maintained army’s of technicians to fix issues, today’s lean organizations can take much longer to clear errors. Today’s networks are more complex, carry more data, and change more rapidly than ever before. The result is that we’ll keep seeing more network outages.

As CIOs we need to understand the situation that we find ourselves in. Our IT departments have created some wonderfully functional networks that because of the great deal of change that is always going on in IT may at times experience outages. What we need to do is to take the time to develop contingency plans that determine what action we’ll take when, not if, our networks go down. Being ready for bad things is what being a CIO is all about.