Vodafone has taken huge numbers of people offline, but this time, it wasn’t deliberate. According to a post on their forums, (Intermittently available, I guess due to load) they experienced a break in at a data centre which caused some damage and has knocked out large numbers of users – twitter and blogs are suggesting it’s mostly west of London.
I had always assumed with the Egypt troubles that you’d need to shut down a few locations to cause real damage as everything would be run at least in pairs hosted in diverse locations. No matter how careful you are with technical measures, there’s a limit to how much you can protect in a single location against fire, flood or someone doing something nasty with a truck full of fertiliser.
It’s perhaps ironic that at the LINX meeting last week, there was a discussion on how well an “internet kill switch” would work in the UK. It was noted the Egyptian one wasn’t 100% effective as people still managed to get out and even if you take whole exchange points in the UK offline, we’ve shown we can route round the problems quite effectively. We may be reasonably confident that it’s hard to take the Internet down, but perhaps it’s rather easier to shut down the mobile and fixed-line communication networks than I’d realised.
Situations like this can be nasty – one hopes that Vodafone has enough space capacity in their network to simply reroute traffic elsewhere, depending on what was stolen/damaged. Some of it may be quite specialist/expensive kit and replacements could take days or weeks to source from abroad. At the very least, if the data centres have redundancy within them, they can “borrow” half the equipment from another location and relocate it to the affected data centre! (Prior to 2001, we used to talk about how to ensure redundancy in the event of hypothetical aircraft hitting data centres as a “worst case” scenario. We don’t use that example any more)
Of course, as well as embarrassing technical questions there are a few embarrassing questions for security staff too. How did intruders get that far into such a critical data centre and manage to do damage before they were stopped? Were they even stopped or did they get away with their booty?
Multi-routing requires a level of equipment and pathway redundancy that is simply too “inefficient” for profitable/parasitical multi-natonals. A one day per annum outage is a lot cheaper than kit sitting around 364 days a year.
and if you think that phones and internet are bad, don’t look at the power utitlities
I know of very few medium-to-large internet service providers that don’t have some level of redundancy. (Including ourselves!) Multiple fibres between locations are a must as a minimum as a fibre fault could take you offline for days and they’re far from uncommon.