key: cord-0779125-ox169jl8 authors: Clark‐Ginsberg, Aaron; DeSmet, David; Rueda, Ismael A.; Hagen, Ryan; Hayduk, Brian title: Disaster risk creation and cascading disasters within large technological systems: COVID‐19 and the 2021 Texas blackouts date: 2021-10-12 journal: Journal of Contingencies and Crisis Management DOI: 10.1111/1468-5973.12378 sha: 081ed5ee3a7b7f8326228d6376aa3c4abfa24af5 doc_id: 779125 cord_uid: ox169jl8 Given the right organisational attributes and sets of incentives, power grids, water systems and other large technological systems can function reliably, even as high‐reliability networks. However, high reliability remains ‘unlikely, demanding and at risk’ as organisational sociologist Todd La Porte stated 25 years ago. What is much more common is risk creation—the creation or exacerbation of hazard, increase in exposure and propagation of vulnerability—that can interact and cascade across these systems when realized as a disaster. Here we describe the 2021 Texas blackouts during the COVID‐19 pandemic through this lens of disaster risk creation and cascading disaster, showing how risk emerges and propagates across large technological systems. Given their ubiquity and criticality, we argue that more research is desperately needed to understand how to support high‐reliability networks and that more efforts should be made to invest in their resilience. development of expanding dependencies on uninterrupted service provision, but unreliable enough to create disaster when their service provision periodically fails. Reliability can mean different things in the context of large technological systems (Berthod et al., 2017 (Berthod et al., , 2015 ; here we define it as the ability of a system to maintain a continuity of critical service provision, with a window of acceptable downtime, in the face of ongoing shocks and stresses. For the Texas power grid, factors such as problematic maintenance practices and underinvestment in adequate winterisation, lack of connections to two other main interconnects of the U.S. power grid and challenges in managing the grid during the pandemic, appear to have laid the roots of risk and contributed to the blackout. The blackout was realized as a disaster because millions of Texans depended on the grid to provide uninterrupted power for heating. The effects of such dependencies were magnified by social factors such as wealth inequalities that degrade the resilience capabilities of the poor, the aged and those with pre-existing health conditions. Here, we look at the intersecting disasters of the Texas blackout and the COVID-19 pandemic-their origins and impacts-through the lens of large technological systems to argue for a renewed focus on the high-reliability management of networked systems as a disaster risk reduction strategy. High-reliability networks (HRNs) are the heterogenous, interorganisational networks that can function reliably in situations of extreme stress (Berthod et al., 2017) . HRNs are conceptual 'cousins' of more established high-reliability organisations (HROs), organisations able to conduct complex, critical and often high-risk operations safely and continuously under conditions of extreme duress. Unlike HROs, however, reliability is not a property of a single set of organisations; it is contingent on the emergent outcomes of decisions made within and among the multiple organisations that structures the network-a reliable system made from unreliable parts (Berthod et al., 2017) . Twenty-five years ago, La Porte (1996) described HROs as 'unlikely, demanding and at risk'. Achieving high reliability requires, along with substantial capital investments, a cultural commitment within organisations to preoccupation with detecting, avoiding and resolving failures; a reluctance to simplify interpretations of events and processes; 'heedful interrelating' that cultivates a sensitivity to operations; a capacity to adapt to unanticipated challenges, or to be resilient; and a deference to local expertize (Weick et al., 1999) . While there is some research emerging on how high-reliability organisational features may translate to the production and maintenance of high reliability in networked systems or fields (Berthod et al., 2017 (Berthod et al., , 2021 Roe & Schulman, 2016) , there is not the same body of research and scientific focus on HRNs as there is on HROs. Given their ubiquity and criticality, we argue that more research is desperately needed to understand how to support HRNs and more efforts are needed to invest in their resilience. In what follows, we describe how the Texas blackouts demonstrate a lack of high reliability by focusing on (1) the organisational factors that led to the creation of the blackout and (2) how the blackout cascaded to impact other critical systems and resulted in crisis. In the conclusion, we argue that, given our dependence on large technological systems, significant research and effort must be brought to bear on imparting features of HRNs to critical infrastructure systems. The ERCOT grid, which covers most of the state of Texas, is a heterogenous, multiorganisational network comprising 147 different generations, transmission and distribution companies that collectively provide energy to their customers. ERCOT is responsible for ensuring power is dispatched to these customers, while the Public Utility Several factors appear to have contributed to the failure to manage in the face of this weather event. While some policymakers blame high reliance on renewable energy, arguing that a large part of the outage was wind turbines icing over, ERCOT needed only 3% of total electricity generation to come from wind turbines to maintain power (O'Shea, 2021) . At the blackout's peak, total generation deficiency was 20%-45% over the course of the day-considerably above 3%. Instead of being overly dependent on renewables, a major contributing factor appears to be the assumption of 'normal' operating conditions rather than the 'abnormal' conditions experienced in February. Weather was one of these abnormal conditions. In late February, ERCOT and power providers expected to be powering down gas plants for routine repairs since the demand for heating decreases as the temperature warms. Consequently, several coal and gas-fired power plants equivalent to about 20% of the total daily generation needs of Texas were taken offline for routine maintenance. ERCOT did try to respond to the emerging weather pattern by issuing a directive in the week preceding the cold weather calling for these plants to be put back online, but the call was a suggestion, not a mandate (O'Shea, 2021). COVID-19 was another abnormal condition, potentially compromising maintenance schedules and staffing. In response to potential challenges to business continuity related to COVID-19, several electric grid policy bodies identified a series of interventions like temperature checks for mutual assistance and the operation of control rooms designed to reduce virus spread (Wailes et al., 2021) . Field work essential to the maintenance and monitoring of the grid was partially to wholly paused, and significant considerations had to be given for resuming operations as normally as possible while maintaining occupancy constraints. Workplace dynamics may also have suffered, as office dynamics were interrupted by social distancing and employees faced the additional burden of verification of contaminated workplaces (Wailes et al., 2021) . Failures to recognize and remediate short-term abnormalities were coupled with insufficient investment in infrastructure reliability over the long term. One significant underinvestment appears to be in weatherization against winter storms. The ERCOT energy grid was not appropriately winterized. Without this, the natural gas supply chain was vulnerable to extremely cold conditions, and both turbines and coal plants iced over and were incapacitated. Additionally, critical redundancies and alternatives were not incorporated into the grid. For instance, a common practice across the Eastern U.S. is the use of 'dual-fuel generators', generators built with the ability to switch to distillate or fuel oil if a fuel source were compromised due to snowstorms and other forms of extreme weather (Hibbard et al., 2017) . Such a practice was not implemented in ERCOT's jurisdiction. Underlying these issues is the unique structure of ERCOT's energy markets, which appears to have disincentivized many common reliability practices. For instance, the lack of dual-fuel generators appears related to ERCOT's energy market structure since, unlike in the Eastern U.S. where regulations mandate including dual-fuel generators, Texas generation companies can decide whether to include dual-fuel generators based on factors such as potential market returns. Furthermore, in ERCOT, generators receive direct compensation only from the energy that they sell, not from having an extra capacity for reserves that might be used in times of crisis and facilitate reliability. Energy market structures also influenced the lead up to the cold snap. As the weather changed, energy speculators drove the price of energy up tremendously, increasing gas prices more than 20-fold, from $7/mmBTU to $150/mmBTU; this speculative 'disaster capitalism' (Klein, 2007) created extreme difficulty procuring gas when producers needed it most (O'Shea, 2021). Another key feature, the separation of ERCOT from the two other major grids, compounded the creation of disaster risk. Separation has two ramifications: First, as an 'island unto itself', the ERCOT grid cannot rely on the energy that a broader multistate network of generation and transmission companies might be able to provide. While such disconnect means ERCOT avoids larger cascading blackouts that might occur across multiple states, the larger network provides redundancy that makes large failure less likely than localized, for example, state-level, blackouts. Second, by not crossing state lines and staying independent of the rest of the United States the ERCOT grid remains outside the jurisdiction of the Federal Energy Regulatory Commission (FERC), the federal regulatory body responsible for regulating reliability across states. Established in response to major blackouts in the 1960s, with its standards mandatory since 2003, FERC's regulations are a powerful force in ensuring reliability (Clark-Ginsberg & Slayton, 2018). Escaping FERC means escaping the costs associated with complying with these regulations, but also the reliability benefits that such compliance can entail. The fallout from the blackout compromised other critical systems. Like other disasters (Mitsova et al., 2018) , the well-off were able to escape the worst impacts of the blackout, demonstrating their resilience in the process. For instance, Texas Senator Ted Cruz briefly flew himself and his family to Cancun before the senator's departure was reported in the press and he returned home. But for the less-fortunate residents of Texas, the event resulted in stories of unimaginable tragedy-what can only be described as a catastrophic, cascading disaster. First are the potential financial impacts of the blackouts on families and households. Many Texans were charged exorbitant amounts for electricity and heat-in some cases tens of thousands of dollars (Del Rio et al., 2021)-which was itself a result of ERCOT's market design. To the extent that these bills are not struck down in court or eliminated by ERCOT, they will inflict significant and lasting financial damage on consumers in the state, which will further degrade their resilience against future disasters Second are the physical and health impacts of the blackout. Left without power and exposed to extreme cold, electricity consumers throughout the state attempted to turn on electric heaters and found that they did not have power. In the subfreezing temperatures, Texans were fighting to survive without any proper source of electric heating, let alone natural gas service. Many turned to makeshift heating sources to fight the cold. Some that failed to secure these sources died from hypothermia, such as an 11-year-old who died while trying keep his 3-year-old brother warm (Madani, 2021) . Many of the makeshift sources that others found were risky in their own right. Fire was one hazard; in Sugar Land, Texas, a family of four died in a house fire after using their fireplace to try and stay warm (Bellware, 2021 (Treisman, 2021) . The blackout also led to a breakdown in other critical lifelines, including water and hospital services. The water system depends on the electric grid to function, including to treat and pump potable drinking and to process wastewater. The blackout compromised this system and left 7.9 million Texans without access to clean water for CLARK-GINSBERG ET AL. | 447 up to a week (Reuters, 2021) . Without power, many hospitals were also not able to maintain essential functions and were forced to shut down completely or operate at extremely limited capacity. These breakdowns created conditions shaping another disaster, the COVID-19 pandemic. Shelters were set up to house families without power and provide them with emergency food, water and heat (Menchaca, 2021) . Others without power stayed with friends and family. The storm's impacts also significantly set back vaccination efforts across the state, delaying a total of 400,000 vaccines doses (Texas Health and Human Services, 2021). As necessary as these choices were, the methods placed friends and families togetherpotential nexuses for the spread of the disease-and prevented needed vaccine delivery. HRNs are networks that can function reliably under conditions of extreme stress. The Texas electric grid is an example of a networked structure that failed under stress, resulting in catastrophe for many families and individuals. Instead of being able to manage and quickly react to prevent a hazardous event from spinning into a disaster, key systems broke down and collapsed. A disaster emerged that cascaded across other systems, visiting the greatest harm on those with the fewest resources. The Texas blackout is one of a string of recent failures of networked structures that includes the COVID-19 pandemic. COVID-19 evolved into a pandemic precisely because the highly networked nature of global travel, infrastructure and social and technological systems allowed rapid propagation of a virus, with the system designed to work relatively well for steady state or normal operating environments, not an unusual event. The costs of this networked failure are even more severe-by orders of magnitude-than the Texas blackout, with upwards of 220 million cases, 4.6 million deaths and billions in economic damages at the time of writing. If other large-scale catastrophes-such as Hurricane Katrina, the Deepwater Horizon oil spill and the 2010 Haiti earthquake-are any guide, the impacts of these two major networked disasters, the Texas blackouts and ongoing COVID-19 pandemic, will continue to unfold over not just months or even years, but decades and longer. Communities upended by disaster might be irreversibly changed for the worse, economies degraded and physical and mental trauma of the disaster passed down to the next generations. The breakdown of the Texas grid that we describe above is the beginning of the disaster, not its end. But while the instances above are examples of failures in reliability, many networks that we depend on may also be operating reliably as HRNs. Since there are so many interconnected infrastructures and since infrastructures are so ubiquitous, we can expect there to be a number of cases where HRNs are already operating reliably by way of preventing cascades based only on sheer numbers. It is clear that investment in mitigation to build networked reliability for these large systems is needed. As former Deputy Administrator for Resilience for the US Federal Emergency Management Agency, Daniel Kaniewski, states, 'every type of infrastructure has vulnerabilities. What we really need to do is invest in infrastructure and… make sure we're taking appropriate preparedness' (Yahoo Finance, 2021) . What is less clear is exactly what that investment should look like. In part, we need better understanding of how HRNs function. To be sure, we have some basic understanding of HRNs-for instance, we know that people and organisational processes that manage these infrastructures play critical roles in ensuring reliability, and should be supported as part of infrastructure investment efforts focused on facilitating reliability (Roe & Schulman, 2016) . But core questions remain unanswered. The Texas case indicates the critical role of broader social processes in shaping the cascading impacts of the disaster and managing networked failure; people turn to their own 'reliability networks', rely on their friends and family for support, adapt by using different forms of heat, or use their economic and political resources to relocate. These dimensions are well articulated elsewhere (Oliver-Smith et al., 2016; Wisner et al., 2004) , but have not yet been incorporated into the study of HRNs. If the goal is to develop a stream of research that facilitates the development of HRNs, we must focus not only on how HRNs function but how the adoption and perpetuation of HRN principles and practices can be incentivized. In Texas, the deregulated market-based energy system does not currently provide the right set of incentives for the grid to function as an HRN. It is easy to see how reverting to some form of regulation could be a tempting response to these market failures, and indeed many examples can be found documenting how private pursuits of profit can disincentivize and undermine societal risk management (Dunn-Cavelty & Suter, 2009; Ellis, 2020; Perrow, 2015) . But in complex and interconnected systems, regulatory approaches can themselves create perverse incentives, which, in extreme cases, can undermine the very attempts at reducing risk they seek to instill (Clark-Ginsberg & Slayton, 2018) . Such discrepancies indicate that better specification of the conditions and scenarios for structuring HRNs is necessary. Developing highly reliable networks therefore, requires, among other things, better understanding of how practices and incentives are to be crafted to make reliability out of unreliable parts. Meeting the challenge will be no small feat. The large size of these networks, their massive complexity and the multitude of stakeholders involved in their management, means that HRNs cannot be 'solved' by the work of a single discipline or sector. Instead, collaboration is needed for understanding these networks, both between disciplines in the form of interdisciplinary work involving social scientists and technologists, and between scientists and key stakeholders. Policymakers and also communities are to be involved as well, as they both depend on and shape the function of these systems. The research agenda on HRNs should be guided by knowledge gained through research on existing HROs and new forms of high-reliability management-if only to examine near-misses in highly reliable systems and to understand and learn from the conditions under which small failures occur and how they are prevented from lapsing into catastrophe. Because complex systems in states of failure often behave in ways incomparable to systems in stable operating states, there is only so much that can be learned by studying cases of failure and recovery. Future collaborations should be directed at understanding the conditions under which HRNs successfully operate today. The Texas winter storm and power outages killed hundreds more people than the state says Three generations of Texans were trying to stay warm in the blackout. Then a deadly fire erupted From high-reliability organizations to high-reliability networks: The dynamics of network governance in the face of emergency Managing resource transposition in the face of extreme events: Fieldwork at two public networks in Germany and the US Some characteristics of high-reliability networks Regulating risks within complex sociotechnical systems: Evidence from critical infrastructure cybersecurity standards His lights stayed on during Texas' storm. Now he owes $16,752. The New York Times Texas was minutes away from monthslong power outages, officials say. The Texas Tribune Public-private partnerships are no silver bullet: An expanded governance model for critical infrastructure protection Letters, power lines, and other dangerous things: The politics of infrastructure security The butterfly defect: How globalization creates systemic risks, and what to do about it Technology as a geological phenomenon: Implications for human well-being Electricity markets, reliability and the evolving U.S. power system The shock doctrine: The rise of disaster capitalism High reliability organizations: Unlikely, demanding and at risk Mother of 11-year-old Texas boy who died during power outage sues ERCOT The development of large technical systems Warming centers in Texas: How to find them, get help and help others. The Texas Tribune Socioeconomic vulnerability and electric power restoration timelines in Florida: the case of Hurricane Irma Breaking down the Texas winter blackouts: what went wrong? Wood Mackenzie Forensic investigations of disasters (FORIN): A conceptual framework and guide to research A tale of two freezes: How the Texas power grid stayed on in the 1989 cold snap Cracks in the 'Regulatory State Over 7.9 million Texans still facing disrupted water supplies Identifying, understanding, and analyzing critical infrastructure interdependencies Reliability and risk: The challenge of managing interconnected infrastructures Texas allocated nearly 600,000 first doses of COVID-19 vaccine for next week A Disaster Within A Disaster Assessing and mitigating the novel coronavirus (COVID-19) a resource guide Organizing for high reliability: Processes of collective mindfulness At risk: Natural hazards, people's vulnerability and disasters How businesses can financially prepare for future natural disasters The authors would also like to thank Gary Cecchine and Jay Balagna for their support, as well as RAND's Homeland Security Operational Analysis Center and the RAND Gulf States Policy Institute. Clark-Ginsberg, DeSmet and Rueda were funded by theJohn and Carol Cazier Initiative for Energy and Environmental Sustainability. There is no data available to share.