Threats Happen; Consequences Don’t Have to

In May 2021, papers across the country were covered with pictures of long lines at gas stations and stories of gas shortages all over the East Coast. The cause was a ransomware attack on Colonial Pipeline, but it wasn’t on their operational technology (OT)—it was on their billing department. The company decided to shut down its entire OT because it was unsure how far the malware had spread. For six days, one of the biggest suppliers of fuel to millions of people in the United States simply stopped functioning.

Imagine if instead all similarly situated operators had a robust plan in place to respond to an attack—a plan that assumed a breach would occur and practiced a quick recovery with minimal disruption to critical functions. A company focused on resilience would not only reduce vulnerabilities to cyber threats but also make sure it could continue essential operations if a threat succeeded in disrupting computer-enabled operations. The headline would be very different, or more likely nonexistent, and disruptive consequences avoided.

Most entities today, including the federal government, look at risk from the perspective of threats and vulnerabilities. At the emergence of a newly identified threat, both public and private sectors marshal their collective efforts to shore up defenses, creating an ever-increasing maze of protective measures. Committed public servants evaluate a slate of potential threats to critical infrastructure, from physical attacks like shootings to cyberattacks like ransomware. Owners and operators of critical infrastructure apply for grants to build a higher wall or purchase anti-malware software, trying to apply a never-ending stream of patches to close identified vulnerabilities, and then look to the next round of threats. This approach is what the current resource picture supports, but it assumes analysts can see around every corner and is ultimately a brittle strategy.

Unfortunately, this approach is inadequate to the increasing pace and sophistication of threats. It also ignores the tremendous tech debt the government in particular has accumulated, piling patches on outdated “legacy” systems. A strategy premised on chasing discrete threats and plugging vulnerabilities alone can never fully assure the provision of essential functions for the nation. Such an approach will only dissipate companies’ collective energies as they attempt to hold every inch of ground with increasingly dispersed and exhausted public and private resources. 

Just as risks have evolved, cybersecurity approaches must evolve to match. This evolution starts with assuming a consequence-based approach to risk management. Entities should focus on building effective defensive measures against recognized threats, but also developing resiliency within the infrastructure most vital to the nation’s security and prosperity. Such an approach remains vigilant to defending critical infrastructure but assumes that breaches will occur. U.S. infrastructure needs to be capable of absorbing these inevitable shocks to the system and sustain or restore operations quickly to avoid cascading ripples.

For national critical functions, resilience means that even if a vulnerability is suddenly exploited by a threat actor, the consequences to operations are minimal, and the system is set up for quick recovery. It means practicing the crash and stress-testing of recovery procedures and drilling downtime procedures and minimizing the time to recovery. Not only does reducing the consequences of a cyberattack benefit those who depend on the functions that would otherwise be disrupted; it also denies the adversary the impact they hoped to achieve. A bad actor can be deterred by denying benefits, not just by imposing costs.

To get to that resilient future, the U.S. government should take two steps: First, it should gradually shift resources away from reactive emergency recovery grants; any emergency funds remaining unused at the end of the year should be used to support ongoing work to identify national critical functions that are cyber-dependent and support continuity planning that will mitigate consequences from a range of threats. Second, it should think about those functions as products of intertwined systems, rather than focus on specific assets stovepiped in particular entities. For example, in 2019, an estimated 18,000 customers installed an infected update in the Solar Winds Orion platform—a tool that allows the IT department to monitor the health of all its networks on a single pane of glass. Only a subset of those infected were actually exploited, but victims ranged from industry to government, including Mandiant, Microsoft, the Department of Treasury, and the Department of Defense. This attack, known as Sunburst, demonstrated how a vulnerability in one widely trusted piece of software can have wide-ranging effects. Also, in 2015, the full scale of the U.S. Office of Personnel management breach was not understood until responders realized that much of the sensitive data about individuals holding security clearances was actually stored in a database managed by the Department of the Interior and that private contractors compiling the sensitive data had also been breached. The private sector is not immune to these linkages: electricity provision depends on water and vice versa. These interdependencies need to be fully understood and part of government-wide continuity planning.

For this first shift, planners should start their risk assessment by identifying the consequences that must be avoided—those with the greatest potential impact on the business or mission-essential functions. Working backward, planners can then attempt to assess the likelihood of a successful incident that could produce those consequences, looking at vulnerabilities that could be exploited and, last, scanning the horizon for the threats that could penetrate those vulnerabilities. Risks can then be prioritized based primarily on consequences, and that assessment modified to the extent the planner has confidence in the ability to assess likelihood. The key to successful continuity planning is including the entire enterprise to brainstorm all the ways to mitigate those consequences. Often the solution will be found in the analog past, like hand cranks, mechanical dials, and paper. To use the Colonial Pipeline example, a total shutdown of computer-run OT systems hampered operations on the East Coast, from passenger cars to air travel. Having an analog way to run those operations—even at reduced capacity—would have mitigated the consequences of the attack considerably. Figuring it out in the heat of the attack is nearly impossible; resources should be devoted ahead of time to understand the full consequences of a cyber incident and implement measures to reduce the impact when prevention fails.

For the second shift, federal, state, local, tribal, and territorial governments should evaluate the entire system supporting a national critical function, rather than focusing on individual physical assets. For example, providing safe, clean water to a community—a function that is regulated primarily at the local level—requires a pumping station and healthy pipes, but what are the power requirements? What operations are totally dependent upon industrial control systems that are connected to the internet? What public transportation systems bring essential workers to the facility? What chemicals are required for healthy functioning of the systems? Where do they come from, and are those ports/railyards/trucks single points of failure? Each of these assets is probably controlled by a particular entity, separate from the others. While these national critical functions are an outcome of each individual agency managing their own assets, adversaries also see them as a set of steps that create a system—a system that can be disrupted. Government entities seeking to help protect this critical function need to also understand the upstream and downstream dependencies, not just the particular facility providing the water.

The federal government already includes resilience in its cyber strategies, as evidenced by the focus on zero-trust architecture that assumes an adversary gains access to your network and works to mitigate the impact on your system. The National Cyber Strategy devotes a pillar to building resilience now and for the future. The Infrastructure Investment and Jobs Act also includes provisions to build resilience. Yet, resilience is not yet supported at the level necessary to ensure national critical functions are sustained.

The Department of Homeland Security (DHS) is about to enter its quadrennial review process—a perfect time to reevaluate the standard approaches to planning for resilience and recovery. Many resilience measures can be implemented relatively quickly without great expense. Others, such as building redundancy, will take time and money. The DHS’s review should include a phased plan to shift resources from the heavy emphasis on threat- and vulnerability-based prevention to consequence-based resilience. Taking this shift in phases will be important: pulling significant resources away from threat discovery and vulnerability management immediately would leave a potentially disastrous gap until resilience measures have been implemented.

Congress can help drive planning for continuity of national critical functions by funding it. One example would be to appropriate one dollar for consequence identification and prevention for every dollar appropriated for emergency response grants. Recipients of emergency grants can be asked to also submit a resilience plan for minimizing consequences for future incidents. Unspent emergency funds could automatically roll over into a resilience fund. Government functions like the National Risk Management Center at the Cybersecurity Infrastructure Security Agency (CISA) should be adequately funded to work with the private sector on mapping national critical functions to help prioritize resources and mitigate risk. Indeed, the administration should further empower CISA in its statutory role as the national coordinator for critical infrastructure security and resilience by further defining that role in the updated Presidential Priority Directive 21, which details how the United States protects and secures critical infrastructure. Critically, all levels of government should create a common framework for consequence management, so federal, state, local, tribal, and territorial governments understand what such an approach would entail.

The federal government should also think across borders. Canada and Mexico are clear partners given shared resources, but allies like the United Kingdom are also grappling with building resilient systems. The war in Ukraine has demanded resilience in Europe, from Kyiv recovering several times from concerted Russian cyberattacks to Europe-wide communications systems like Viasat coming under attack. European powers discovered how resilient they could be in the face of Russia using fuel deliveries for leverage as well. The United States has much to learn from their recent painful experiences.

For critical infrastructure, an entity will never be able to stop all threats. But creating a resilient system that recovers from a crisis quickly, with little to no consequences, is within reach, if approach to risk is reframed. Everyone wants to live in a world without disruptive headlines.

Emily Harding is the deputy director and senior fellow with the International Security Program at the Center for Strategic and International Studies (CSIS) in Washington, D.C. Suzanne Spaulding is the senior adviser for homeland security and director of the Defending Democratic Institutions project at CSIS.

Emily Harding
Deputy Director and Senior Fellow, International Security Program
Suzanne Spaulding
Senior Adviser, Homeland Security, International Security Program