In my previous blog I quickly touched upon the subject Consequence and the difference between Consequence and impact and its relationship with functionality. This blog focuses specifically on what Consequence is for OT cyber security risk.
I mentioned that a cyber attack in an OT system consist in its core elements out of:
A threat actor carrying out a threat action by exploiting a vulnerability resulting in a Consequence that causes an impact in the production process.
In reality this is of course a simplification, the threat actor normally has to carry out multiple threat actions to make progress in achieving the objective of the attack, this is explained by the cyber kill chain concept of Lockheed Martin. And of course the defense team will normally not behave like a sitting duck and starts to implement countermeasures to shield the potential vulnerabilities in the automation system and would build barriers and safeguards to limit the severity of specific Consequences. Or where possible remove the potential Consequence all together. Apart from these preventive controls the defense team would establish situational awareness in an attempt to catch the threat actor team as early as possible in the kill chain.
Sometimes it looks like there are simple recipes for both the threat actors and defenders of an industrial control system (ICS):
- The threat actor team uses the existing functionality for its attack (something we call hacking), the defense team will counter this by reducing functionality available to the attacker (hardening);
- The threat actor team adds new functionality to the existing system (we call this malware), the defense team will white list all executable code (application white listing);
- The threat actor team adds new functionality to the existing system using new components (such as in the supply chain attack), the defense team requests for white listing the supply chain (e.g. the widely discussed US presidential order);
- Because a wise defense team doesn’t trust the perfectness of its countermeasures and respects the threat actor team’s ingenuity and persistence, the defense team will add some situational awareness to the defense by sniffing around in event logs and traffic streams attempting to capture the threat actors red-handed.
Unfortunately there is no such thing as perfectness in cyber security or something like zero security risk. New vulnerabilities pop-up every day, countermeasures aren’t perfect and can even become an instrument for the attack (E.g. the Flame attack in 2012, creating a man-in-the-middle between the Windows Updates System distributing security patches and the systems of a bank in the Gulf Region). New tactics, techniques and procedures (TTP) are continuously developed, and new attack scenarios designed, the threat landscape is continuously changing.
To become more pro-active the OT cyber security defense teams adopted risk as the driver, cyber security risk being equivalent to the formula Cyber Security Risk = Threat x Vulnerability x Consequence.
Threats and vulnerabilities are addressed by implementing the traditional countermeasures. A world of continuous change and research for vulnerabilities, the world of network security.
In this world Consequence is often the forgotten variable to manage, an element far less dynamic than the world of threats and vulnerabilities. But offering a more effective and reliable risk reduction than the Threat x Vulnerability part of the formula. So this time the blog is written to honor this essential element for OT cyber security risk, our heroine of the day – Consequence.
Frequent readers of my blog know that I write long intros and tend to follow them up sketching the historic context of my subject. Same recipe in this blog.
When the industry was confronted with the cyber threat, after more and more asset owners changed over from automation systems based on proprietary technology toward systems build on open technology, a gap in skills became apparent.
The service departments of the vendor organizations and the maintenance departments of the asset owners were not prepared for dealing with the new threat of cyber attacks. In the world of proprietary systems they were protected by the lack of knowledge of the outside world of the internal workings of these systems, the connectivity between systems was less, and the number of functions available was smaller. And perhaps also because in those days it was far more clear who the adversaries were than it is today.
As a consequence asset owners and vendors were looking for IT schooled engineers to expand their capabilities and IT service providers attempted to extend their business scope. IT engineers had grown their expertise since the mid-nineties, they benefited from a learning curve over time to improve their techniques in preventing hacking and preventing infection with malicious code.
So initially the focus in OT security was on network security, a focus on the hardening (attack surface reduction / increasing resilience) of the open platforms, and network equipment. However essential security settings in the functionality of the ICS, the applications running on these platforms, were not addressed or didn’t exist yet.
Most of the time the engineers were not even aware these settings existed. These security settings were set during the installation, sometimes as default settings, and were never considered again if there were no system changes. Additionally the security engineers with a pure IT background had never configured these systems, they were not familiar with the functionality in these systems. If the Microsoft and network platforms were secure the system was secure.
OT cyber security became organized around network security, I (my personal opinion, certainly not my employer’s 🙂 – for those mixing up the personal opinion of a technical security guy and the vision of the employer he works for. ) compare it with the hypothalamus of the brain, the basic functions sex and a steady hart rhythm are implemented, but the wonders of the cerebral cortex are not yet activated. The security risk aspects of the automation system and process equipment were ignored, too difficult and not important because so far never required. This changed after the first attack on cyber physical systems, the Stuxnet attack in 2010.
Network security focuses on the Threat x Vulnerability part of the risk formula, the Consequence part is always linked to the functional aspects of the system. Functional aspects are within the domain of the risk world.
Consequence requires us to look at an ICS as a set of functions, where the cyber attack causes a malicious functional deviation (the Consequence) which ultimately results into impact on the physical part of the system. See my previous blog for an overview picture of some impact criteria.
What makes OT security different from IT security, is this functional part. An IT security guy doesn’t look that much at functionality, IT security is primarily data driven. An OT security guy might have some focus on data (some data is actual confidential) but his main worry is the automation function. The sequence and timing of production activities. OT cyber security needs to consider the Consequence of this functional deviation, they need to think about barriers / safeguards, or what ever name is used to reduce the overall Consequence severity in order to lower risk.
Therefore Sarah’s statement in last week’s post “Don’t look at the system but look at the function” is so relevant when looking at risk. For analyzing OT cyber security risk we need to activate the cerebral cortex part of our OT security brain. So far the historical context of hero in this blog. Let’s discuss Consequence en the functions of ICS in more technical detail.
For me, from an OT security perspective, there are three loss possibilities:
- Loss of required performance (LRP) – Defined as “The functions, do not meet operational / design intent while in service”. Examples are program logic has changed, ranges were modified, valve travel rates were modified, calibrations are off, etc.
- Loss of Ability (LoA) – Defined as “The function stopped providing its intended operation” Examples are loss of view, loss of control, loss of ability to communicate, loss of functional safety, etc.
- Loss of confidentiality (LoC) – Defined as “Information or data in the system was exposed that should have been kept confidential.” Examples are loss of intellectual property, loss of access credential confidentiality, loss of privacy data, loss of production data.
Each of these three categories have sub categories to further refine and categorize Consequence (16 in total). Consequence is linked to functions, but functions in an OT environment are generally not provided by a single component or even by a single system.
If we consider the process safety function discussed in the TRISIS blog, we see that at minimum this function depends on a field sensor function/asset, an actuator function/asset, a logic solver function/asset together called the safety integrity function (SIF). Additionally there is the HMI engineering function/asset, sometimes a separate HMI operations function / asset, and the channels between these assets.
Still it is a single system with different functions such as: safety integrity function (SIF), alerting function (SIF alerts), a data exchange function, an engineering function, maintenance functions, operation functions and some more functions at detailed level such as diagnostics and sequence of event reporting.
Each of these functions could be targeted by the threat actor as part of the plan, some are more likely than others. All these functions are very well documented, for risk we need to conduct what is called Consequence analysis determining failure modes. A proper understanding of which functional deviations are of interest for a threat actor includes not only an ability to recognize a possible Consequence but also distinguish how functions can fail. These failure modes are basically the 16 sub-categories of the three main categories LRP, LoA, LoC.
The threat actor team will consider what functional deviations are required for the attack and how to achieve this, the defense team should consider what is possible and how to prevent it. If the defense team of the Natanz uranium enrichment plant had conducted Consequence analysis, they would have recognized the potential for causing mechanical stress on the bearings of the centrifuge as the result of functional deviations caused by the variable frequency drive of the centrifuges. Using vibration monitoring would have recognized the change in vibration pattern caused by the small repetitive changes in rotation speed and would almost certainly have caused an earlier alert that something was wrong than the time it took now. The damage (Consequence severity / impact) would have been smaller.
If we jump back to the TRISIS attack we can say that the threat actor’s planned functional deviation (Consequence) could have been the Loss of Ability to execute the safety integrity function, so making the automation function vulnerable if a non-safe situation would occur.
Another scenario described in the TRISIS blog, is the Loss of Required Performance, where the logic function is altered in a way that could cause a flow surge when the process shutdown logic for the compressor would be triggered after having made the change to the logic.
A third Consequence discussed was the Loss of Confidentiality that occurred when the program logic was captured in preparation of the attack.
Every ICS has many functions, I extended the diagram in the PRM blog with some typical functions implemented in a new greenfield plant.
Quite a collection of automation functions, and this is just for perhaps a refinery or chemical plant. I leave it to the reader to find the acronyms on the Internet, there you can also value the many solutions and differences there are. The functions for the power grid, for a pipeline, for a pulp and paper mill, oil field, or offshore platform differ significantly. Today’s automation world is a little bit more diverse than SCADA or DCS. I hope the reader realizes that ICS has many functions, different packages, different vendors, many interactions and data exchange. And this is just the automation side of it, for security risk analysis we also need to include the security and system management functions, also these induce risk. Some institutes call this ICS SCADA, I need to stay friendly but really a bad term. The BPCS (Basic Process Control System to give one acronym away for free) can be a DCS or a SCADA, but an ICS is a whole lot more today than DCS or SCADA in the last decade functionality exploded and I guess the promise of IIoT will even grow the number of functions.
Each of the functions in the diagram has a fixed set of for the threat actor interesting deviations (Consequences), these are combined with the threat actions that can cause the deviation (the cyber security hazard), these can be combined with the countermeasures and safeguards / barriers that protect the vulnerabilities and limit the Consequence severity to be able to estimate cyber security risk. resulting in a repository of bow-ties for the various functions.
A cyber security hazop (risk assessment) is not the process of a workshop where a risk analyst threat models the ICS together with the asset owner, that would become a very superficial risk analysis with little value considering the many cyber security hazards there are. Costly in time and scarce in result, a cyber security hazop is the confrontation between a repository of cyber security hazards and the installed ICS functionality, countermeasures, safeguards and barriers.
Consequence plays a key role, because Consequence severity is asset owner specific and depends on the criticality analysis of the systems / functions. Consequence severity can never be higher than the criticality of the system that creates the function. All of this is a process, called Consequence analysis, a process with a series of rules and structured checks that allow estimating cyber security risk (I am not discussing the likelihood part in this blog) and link Consequence to mission impact.
Therefore Consequence with capital C.
Maybe a bit much for a mid-week blog, but I have seen far too many risk assessments where Consequence was not even considered, or expressed in for mitigation meaningless terms as Loss of View and Loss of Control.
Since in automation Consequence is such an essential element we can’t ignore it.
Understanding the hazards, is attempting to understand the devious mind of the threat actor team. Only then we can hope to become pro-active.
There is no relationship between my opinions and publications in this blog and the views of my employer in whatever capacity.