Avoidable Software Failures Cost Trillions


Talking to Robert N. Charette can be quite depressing. Charette, who has written about software outages for this magazine for 20 years, is a renowned risk analyst and systems expert who, over his 50-year career, has seen more than his share of delusional thinking among IT professionals, government officials and business executives, before, during and after massive software outages.
In “Why Software Fails” from 2005, in IEEE Spectruma seminal paper documenting the causes of large-scale software failures, Charette noted: “The greatest tragedy is that software failures are mostly predictable and preventable. Unfortunately, most organizations do not view failure prevention as a pressing issue, even though this view risks harming the organization and perhaps even destroying it. Understanding why this attitude persists is not just an academic exercise; it has enormous implications for business and society.”
Two decades and several billion wasted dollars later, he finds that people are making the same mistakes. They claim that their project is unique and therefore the lessons of the past do not apply. They underestimate the complexity. Managers leave with unrealistic budgets and deadlines. Testing is inadequate or ignored completely. Supplier promises that are too good to be true are taken literally. Newer development approaches like DevOps or AI co-pilots are implemented without proper training or organizational change needed to get the most out of them.
Worse yet, the enormous impacts of these missteps on end users are not fully considered. For example, when the Canadian government’s Phoenix paycheck system initially failed, developers glossed over the prolonged financial and emotional distress inflicted on tens of thousands of employees receiving erroneous paychecks; the problems persist today, nine years later. Perhaps that’s because, as Charette told me recently, IT project managers have no professional licensing requirements and are rarely, if ever, held legally responsible for software debacles.
Although medical devices may seem far removed from giant IT projects, they have a few things in common. As Special Projects Editor Stephen Cass found in this month’s The Data, the U.S. Food and Drug Administration recalls an average of 20 medical devices per month due to software problems.
“Software is as important as electricity. We would never put up with a power outage every other day, but we certainly have no problem with AWS going down.” —Robert N. Charette
Much like IT projects, medical devices face fundamental challenges posed by software complexity. Which means that testing, while rigorous and regulated in the medical field, cannot cover every scenario or every line of code. The main difference between failed medical devices and failed IT projects is the enormous responsibility that falls on the former.
“When you create software for medical devices, you have to meet a lot more standards and you worry a lot more about the consequences of failure,” observes Charette. “Because when these things don’t work, there’s tort law, which means manufacturers are saddled with liability. It’s a lot harder to bring a suit and win when it’s an electronic payroll system.”
Whether a software outage is hyperlocal, such as when a medical device fails inside your body, or spans an entire region, such as when an airline’s ticketing system fails, organizations must investigate the root causes and apply those lessons to the next device or IT project if they hope to prevent history from repeating itself.
“Software is as important as electricity,” says Charette. “We would never accept a power outage every two days, but we certainly have no problem accepting AWS outage or telcos or banks shutting down.” He heaves a deep sigh worthy of AA Milne’s Eeyore. “People shrug their shoulders.”
From the articles on your site
Related articles on the web




