The question I raise in this blog is: “Can we reduce risk by reducing consequence severity?” Dale Peterson touched upon this topic in his recent blog.
It has been a topic I struggled with for several years, initially thinking “Yes, this is the most effective way to reduce risk”, but now many risk assessments (processing thousands of loss scenarios) later I come to the conclusion it is rarely possible.
If we visualize a risk matrix with horizontally the likelihood and vertically the consequence severity, we can theoretically reduce risk by either reducing the likelihood or the consequence severity. But is this really possible in today’s petro-chemical, refining, or oil and gas industry? Let’s investigate.
If we want to reduce security risk by reducing consequence severity, we need to reduce the loss that can occur when the production facility fails due to a cyber attack. I translate this in we need to create an inherently safer design.
This topic is already a very old topic addressed by Trevor Kletz and Paul Amyotte in their book “A handbook for inherently Safer Design, 2nd edition 2010”. This book is based on an even earlier work (Cheaper, safer plants, or wealth and safety at work: notes on inherently safer and simpler plants – Trevor Kletz 1984) from the mid-eighties which shows that the drive to make plants inherently safer is a very old objective and a very mature discipline. Are there any specific improvements to the installation that we did not consider necessary from a process safety point of view, but should be done from an OT security point of view?
Let’s look at the options we have. If we want to reduce the risk induced by the cyber threat we can approach this in several ways:
- Improve the technical security of the automation systems, all the usual stuff we’ve written a lot of books and blogs about – Likelihood reduction;
- Improve automation design, use less vulnerable communication protocols, use more cyber-resilient automation equipment – Likelihood reduction;
- Improve process design in a way that the threat actor has less options to cause damage. For example do we need to connect all functions to a common network so we can operate them centrally, or is it possible to isolate some critical functions making an orchestrated attack more difficult – Likelihood / consequence reduction;
- Reduce the plant’s inventory of hazardous materials, so if something would go wrong the damage would be limited. This is what is called intensification/minimization – Consequence reduction;
- An alternative for intensification is attenuation, here we use a hazardous material under the least hazardous conditions. For example storing liquefied ammonia as a refrigerated liquid at atmospheric pressure instead of storage under pressure at ambient temperature – Consequence reduction;
- The final option we have is what is called substitution, in this case we select safer materials. For example replacing a flammable refrigerant by a non-flammable one – Consequence reduction.
So theoretically there are four options that reduce consequence severity. In the past 30 years the industry has invested very much in making plants more safe. There are certainly still unsafe plants in the world, partially a regional issue / partially lack off regulations, but in the technologically advanced countries these inherent unsafe plants have almost fully disappeared.
This is also an area where I as OT security risk analyst have no influence, if I would suggest in a cyber risk report that it would be better for security risk reduction to store the ammonia as a refrigerated liquid they would smile and ask me to mind my own business. And rightfully so, these are business considerations and the cyber threat is most likely a far less dangerous threat than the daily safety threat.
Therefor the remaining option to reduce consequence severity seems to be to improve process design. But can we really find improvements here? To determine this we have to look at where do we find the biggest risk and what causes this risk?
Process safety scenarios where we see the potential for severe damage are for example: pumps (loss of cooling, loss of flow), compressors, turbines, industrial furnaces / boilers (typically ignition scenarios), reactors (run-away reactions), tanks (overfilling), and the flare system. How does this damage occur? Well typically by stopping equipment, opening or closing valves / bypasses, manipulating alarms / measurements/positioners, overfilling, loss of cooling, manipulating manifolds, etc.
A long list of possibilities, but primarily secured by protecting the automation functions. So a likelihood control. The process equipment impacted by a potential cyber attack are there for a reason. I never encountered a situation where we identified a dangerous security hazard and came to the conclusion that the process design should be modified to fix it. There are cases where a decision is taken not to connect specific process equipment to the common network, but this is also basically a likelihood control.
Another option is to implement what we call Overrule Safety Control (OSC) this is a layer of safety instrumentation, which cannot be turned off or overruled by anything or anybody. When the process conditions enter a highly accident-prone, life-safety critical state such as for example the presence of hydrogen in a sub-merged nuclear reactor containment (mechanically open the enclosure to flood the containment with water) or the presence of methane on an oil drilling rig, an uninterruptible emergency shutdown is automatically triggered. However this is typically a mechanical or fully isolated mechanism because as soon as it has an electronic / programmable component it can be hacked if it would be network connected. So I consider this solution also as a yes/no connection decision.
I don’t exclude the possibility that situations exist where we can manage consequence severity, but I haven’t encountered them in the past 10 years analyzing OT cyber risk in the petro-chemical, refining, oil & gas industry apart from these yes / no connect questions. The issues we identified and addressed were always automation system related issues, not process installation issues.
Therefor I think that consequence severity reduction, though the most effective option if it would be possible, is not going to bring us the solution. So we end up focusing on improving automation design and technical security managing the exposure of the cyber vulnerabilities in these systems, Dale’s suggested alternative strategy seems not feasible.
So to summarize, in my opinion there is not really an effective new strategy available by focusing on reducing cyber risk by managing consequence severity.
There is no relationship between my opinions and references to publications in this blog and the views of my employer in whatever capacity. This blog is written based on my personal opinion and knowledge build up over 43 years of work in this industry. Approximately half of the time working in engineering these automation systems, and half of the time implementing their networks and securing them, and conducting cyber security risk assessments for process installations since 2012.
Author: Sinclair Koelemij
OTcybersecurity web site