The trouble with our big baby, network complexity
While not being able to browse Facebook might be bad for your morning mood, network outages affecting enterprise networks or critical services like hospitals is a cost not easy to contemplate.
Network has become key to our lives
Recent outages at Amazon AWS show how any sort of outages in key infrastructure can affect millions of people. The network has become key to our lives.
Large networks typically experience a number of outages each month. Digging further to understand where, why and how, it turns out most outages are the result of human error, or human “factors” – misconfiguration, poor build, incorrect patches, whatever. Despite pretty good approaches to designing hardware and service redundancy, things fail, mainly because someone missed something somewhere.
Complexity makes things fragile
The problem is not that our engineers are poor, but that the scale of our networks, datacenters and internet infrastructure is so vast and diverse it’s hard to even explain. Networks have become massively complex, and though increasing connectedness offers resilience, complexity tends to make things fragile.
Ensuring a network stays up is extremely hard. Service assurance might be the most important function in a large operator.
Compounding network complexity, the rise and scale of cloud computing and the increase in virtualisation mean architecting control through traditional means is not really possible. And with the number of devices coming in the IoT wave, we only expect the network to get bigger.
Unfortunately, network management, provisioning and assurance tooling are not keeping up with this complexity. The way we build and support networks in real life operating environments is not consistent with perfect deployments.
Emerging techniques
What do we do? Thinking about how to approach keeping House of Cards at 1080p for millions of people, there are some emerging techniques and approaches that show promise: so-called software defined networking (SDN) is the ability to programmatically manage network behaviour dynamically through software.
We increasingly use mathematical or algorithmic verification of configuration templates and data flows in models of the network to test its function. We model the network in software to predicting weak points.
The common philosophies tend to include an increasing reliance on software to manage the network – to programmatically control where we previously architected – and the reliance on algorithms and mathematical techniques to inform what the nature of that control and instruction should look like.
A big shift
It’s a big shift in how we think and how we build – possibly the biggest since the rise of vendor behemoths like Cisco in the late 1990s fueled the first internet boom.
As the network increases in importance so does the scale, complexity and reliance on it, and risks of failure or breach are increasingly consequential. Through software and algorithms, we will hopefully enable the next big boom.