Massive outage shows the vulnerability of the global tech ecosystem

The way in which a minor and routine software update paralysed IT systems globally on Friday has highlighted the vulnerability of a global economy increasingly reliant on a complex web of interconnected systems managed by a small set of dominant companies.

It has also shown that some of these companies have become complacent and somewhat slapdash in their processes, even as their customers have become completely dependent on them for the stability of their own systems and businesses.

A Woolworths checkout in Sydney shows a dead screen during the CrowdStrike outage.

A Woolworths checkout in Sydney shows a dead screen during the CrowdStrike outage.Credit: Dion Georgopolous

The global outage, sparked by CrowdStrike’s bungled software update, caused chaos. Air travel was disrupted, hospital systems were frozen, payments systems, banks and other financial intermediaries were hit, as were retailers, media and logistics companies.

With individual computers needing to be manually rebooted and the offending files deleted by someone with administrative privileges, cleaning up the mess CrowdStrike has generated will take time, considerable effort, and expense.

Then, there will be post-mortems within government, the cybersecurity community and individual businesses about the way in which a buggy piece of software was released and created so much havoc and how similar episodes might be prevented or responded to in future.

Loading

CrowdStrike, ironically, sells cybersecurity products to protect its customers from cyberattacks by hackers. Its previously highly regarded software identifies and neutralises them by using a blend of traditional approaches and, increasingly, artificial intelligence. It is second only to Microsoft in the global market for enterprise security software, with 29,000 customers and a market share of about 18 per cent.

It is telling that the global outage only affected IT equipment running Windows, the world’s dominant operating system. Apple’s products were unaffected.

That’s because Apple runs a closed, or “walled garden” system, denying software developers access to the core of its technology. It’s also far more focused on individual products than on enterprise-wide systems.

Microsoft operates an “open” operating system, allowing developers access to the core or “kernel” of its system under a competition policy agreement it reached with the European Commission in 2009 that gives security software providers the same level of access to Windows as Microsoft itself has.

That, and Windows’ dominance, may explain why Microsoft has been subjected to a series of cyber hacks in recent years. These hacks forced Microsoft to promise to overhaul its system’s security. Microsoft has said it will use artificial intelligence and automation to make its software more secure.

Melbourne Airport passengers affected by the global outage.

Melbourne Airport passengers affected by the global outage.Credit: Getty Images

Part of the company’s challenge is the complexity of its business, which offers its products (including its market-leading cybersecurity products) via the cloud to companies with their own servers and via patches for legacy systems.

That, and the fact that the computers had to be online to receive the infected update, explains why different businesses were impacted differently and even individual computers and other pieces of technology within those businesses responded differently.

What happened on Friday wasn’t, thankfully, a cyberattack but a mistake made by a developer with privileged access to the heart of Microsoft’s operating system, a level of access Microsoft might normally reconsider, although the legal implications – and CrowdStrike’s need for that level of access to protect its customers and its own anti-virus software – might complicate any effort to reduce that particular vulnerability.

CrowdStrike, which has grown rapidly and aggressively, might also need to examine its own processes and do significantly more stress-testing of the updates it sends routinely to its customers. Enterprise customers might need to think more deeply about whether writing increasingly large cheques to effectively outsource the protection of their own networks is sufficient.

In the global, interconnected, web of multitudes of different systems and software on which the modern global economy relies, with its global supply chains and just-in-time processes and real-time payments infrastructure, the stability and security of the relatively new digital architecture is taken for granted, until it isn’t.

Usually, as we’ve seen here with the Medibank and Optus cyber hacks, it is criminal activity that exposes the flaws in that architecture. The CrowdStrike episode is chilling because it highlights how a single, flawed, software update from a trusted source – one of a multitude that occurs routinely – can cause large parts of the global system to fail.

The global dominance of the Windows operating system and the dominance of the three major cloud providers – Microsoft, Amazon and Google’s parent, Alphabet – means that any mistake they make or distribute will have global ramifications.

Loading

Competition regulators may need to examine that dominance and the risks to competition and security it represents.

It might also be that companies need to consider reducing their reliance on single providers and investing more in backup systems so that they can continue to operate if the “Blue Screens of Death” ever reappear within their networks. Perhaps some thought will need to be given to old-school fallbacks that don’t involve IT systems.

The pandemic caused companies to rethink and redesign their physical supply chains, re-shoring or “near-shoring” critical elements. CrowdStrike’s software bug might, indeed should, force a similar re-evaluation of corporate and government systems’ vulnerabilities.

Artificial intelligence is seen as a potential aid to improving cybersecurity, improving systems’ ability to identify and respond immediately to cyber threats—even as some of those involved in developing AI products warn that it could represent a threat to humankind.

Friday’s global outage is a reminder of how dependent the world has become on increasingly complex and increasingly interconnected technologies, with data flowing through quite concentrated choke points including, increasingly, the cloud and AI providers.

Those represent potential points of global failure, whether generated by sloppy coding or something more malicious. AI might help strengthen the protections against such failures but could just as easily add new vulnerabilities.

The global technology ecosystem is so large and complex and vulnerable to human error or unlawful intent that it is inconceivable that it could ever be made completely secure.

It is, however, incumbent on the big tech companies on which the system rests to make it as safe and resilient as is practicable and to prioritise that objective over speed to market and profit. If they can’t, it is inevitable that governments will intervene to regulate their operations more closely.

CrowdStrike is now likely to be hit by a deluge of lawsuits and the loss of significant chunks of its customer base. Microsoft was already under siege from customers and governments for the previous breaches of its security. There are obvious commercial rationales for Microsoft, Amazon and Google, and the host of developers who work with them, to do whatever they can to avoid a repeat of what happened on Friday.

Most Viewed in Technology

Loading

Read More