This blog is about the in April published new ISA 99 standard, where risk and securing process control and automation systems meet. My blog is an evaluation of part ISA 62443 3-2: Security risk assessment for system design – April 2020).
I write the blog because I struggle with several of the methods applied by the standard and apparent lack of input from the field. The standard seems to ignore practices successfully used in major projects today and replaces these by in my opinion incomplete and inferior alternatives. Practices that for example are used for process safety risk analysis.
Since I executed in recent years several very large risk assessments for greenfield refineries, chemical plants and national critical industry, my technical heart just can’t ignore this. So I wanted to address these issues, because I believe the end result of this standard does not meet the quality level appropriate for an ISA 62443 standard. Very hard conclusion I know, but I will explain in the blog why this is my opinion.
When criticizing the work of others, I don’t want to do this without offering alternatives, so also in this case where I think the standard should be improved I will give alternatives. Alternatives that worked fine in the field and where applied by asset owners with a great reputation in cyber security of their plants. If I am wrong, hopefully people can show me where I am wrong and convince me the ISA 62443 3-2 can be applied.
I have been struggling with the text of this blog for 3 weeks now, I found my response too negative and not appreciating all the work done by people in the task group primarily in their spare time. So let me, before continuing with my evaluation, express my appreciation for making the standard. It is important to have a standard on this subject and it is a lot of work to make it. Having a standard document is always a good step forward, even if just for target practicing. Never the less I also have to judge the content, therefore my evaluation.
It is a lengthy blog this time, even longer than usual. But not yet a book, for a book it lacks the details on how, it has a rather quick stop and not a happy end. Perhaps one day when I am retired and free of projects I will create such a book, this time just a lengthy blog to read. People that conduct or are interested in risk assessments for OT cyber security should read it, for all others it might be too much to digest. Just a warning.
Because I can imagine that not everyone reads or has read the ISA 62443-3-2, I provide a high level overview of the various steps. I can’t copy the text 1 on 1 from the standard, that would violate ISA’s copy right. So I have to do it using my own words, which are in general a little less formal than the standard’s text. Since making standards requires a lot of word smithing, I hope I don’t deviate too much from the intentions of the original text. For the discussion I focus on those areas in the standard that surprised me most.
The standard uses 7 steps between the start of a risk assessment project to the end.
- First step is to identify what the standard calls the “system under consideration”. So basically making an inventory of the systems and its components that require protection;
- Second step in the proposed process is an initial cyber security risk assessment. This step I will evaluate in more detail because I believe the approach here is wrong and I question if it really is a risk assessment;
- Third step is to partition the system into zones and conduits. High level I agree with this part, though I do have some minor remarks;
- Fourth step asks the question if the initial risk found in the second step exceeds tolerable risk? Because of my issues with step 2, I also have issues with the content of this step;
- Fifth, if the risk exceeds tolerable risk a detailed risk assessment is required. I have a number of questions here, not so much the principle itself but the detail;
- In the sixth step the results are documented in the form of security requirements, plus adding some assumptions and constraints. Such a result I would call a security design / security plan. The standard doesn’t provide much detail here, so only a few comments from my side;
- And finally in the seventh step there is an asset owner approval. Too logical for debate.
High level these 7 steps represent a logical approach, it are the details within these steps that inspire my blog.
Let me first focus on some theoretical points considering risk, starting with the fundamental tasks that we need to do before being able to estimate risk and than check the standard how it approaches these tasks.
In order to perform a cyber security risk analysis, it is necessary to accomplish the following fundamental tasks:
- Identify the assets in need of protection;
- Identify the kind of risks (or threats) that may affect the assets;
- Determine the risk criteria for both the risk levels as well as the actions to perform when a risk level is reached;
- Determine the probability of the identified risk occurring;
- And determine the impact or effect on the plant if a given loss occurs.
Let’s evaluate the first steps of the standard against these fundamental principles.
Which assets are in need of protection for an ICS/IACS?
The ISA 62443-3-2 standard document doesn’t define the term asset, so I have to read the IEC 62443-1-1 to understand what according to IEC/ISA 62443 an asset is. According to the IEC 62443-1-1 standard the asset is defined as – in less formal words as used by the standard – an asset is a “physical or logical object” owned by a plant. The standard is here less flexible than my words because an asset is not necessarily owned by a plant, but to keep things simple I take the most common case where the plant owns the assets. The core of the definition is, an asset is a physical or a logical object. So translated into technical language, an asset is either equipment or a software function.
IEC 62443-1-1 specifies an additional requirement for an asset, it needs to have either a perceived or actual value. Both equipment and a software function have an actual value. But there is in my opinion a third asset with a perceived value that needs protection. This is the channel, the communication protocol / data flow used to exchange data between the software functions. Not exactly a tangible object but still I think I can call at minimum an asset if I consider it a data flow using a specific protocol.
Protection of a channel is important, cyber security risk is for a large part linked to exposure of a vulnerability. Vulnerable channels, for example a data flow using the Modbus TCP / IP protocol (this protocol has several vulnerabilities that can be exploited), induce risk. So for me the system under consideration is:
- The physical equipment (e.g. computer equipment, network equipment, and embedded devices);
- The functions (software, for example SIS (Safety Instrumented System) and BPCS (Basic process Control System). But many more in today’s systems);
- And the channels (the data flows, e.g. Modbus TCP/IP or a vendor proprietary protocol).
The ISA 62443-3-2 standard document doesn’t mention these three as explicitly as I do, but the text is also not in conflict with my definition of assets in need of protection. So no discussion here from my side. Before we can assess OT cyber security risk we need to have a good overview of these “assets”, the scope of the risk assessment.
The second step is an initial risk assessment, here I start to have issues. The standard asks us to identify the “worst case unmitigated cyber security risk” that could result from a cyber attack. It suggests to express this in terms of impact to health, safety, production loss, product quality, etc. This is what I call mission risk or business risk. The four primary factors that induce this risk are:
- Process operations;
- Process safety;
- Asset integrity;
- Cyber security;
Without process safety many accidents can occur, but even a plant in a perfect “safe” state can impact product quality and production loss. So we need to include process operations as well. Failing assets can also lead to production loss, or even impact to health. Asset integrity is the discipline that evaluates the maintenance schedules and required maintenance activities. Potential failures made by this discipline also add to the business / mission risk. And then we have cyber security. Cyber security can alter or halt the functionality of the various automation functions, and can alter data integrity, and even compromise the confidentiality of a plant’s intellectual property. All above are elements that can result in a loss and so induce mission risk. But cyber security can do more than this, it can make implemented safety integrity functions (SIF) for process safety ineffective, it can misuse the SIF logic to cause loss. Additionally cyber security can also cause excessive wear of process equipment not accounted for by asset integrity. Cyber security is an element that influences all.
For determining initial risk, the approach taken by the standard is to use the process hazard analysis (PHA) results as input for identifying the worst case impacts. And additional to this the standard requests us to consider information from the sector, governments, and other sources to get information on the threats.
I have several issues here. The first, can we have risk without likelihood? The PHA can define a likelihood in the PHA sheet, but this likelihood has no relationship with the likelihood of facing a cyber attack. So using it would would not result in cyber security risk.
Another point is that if we consider the impact, the PHA would offer us the worst case impact based upon the various deviations analyzed from a process safety perspective. But this is not necessarily the worse case impact from a business perspective.
See the definition for process safety: “The objective of process safety management is to ensure that potential hazards are identified and mitigation measures are in place to prevent unwanted release of energy or hazardous chemicals into locations that could expose employees and others to serious harm.”
The PHA would not necessarily analyze the impact of product quality for the company. Pharmaceutical business and food and beverage business can face major impact if its products would have a quality issue, while their production process can be relatively safe compared to a chemical plant.
If we would take the PHA as input, will it address the “worst case impact”? PHA will certainly be complete for the analysis of the process deviations (the structured approach of the PHA process enforces this) and complete for the influence of these deviations on process safety, so should also be complete for the consequences too if all process deviations are examined.
But the PHA might not be complete when it comes to the possible causes, because PHA generally doesn’t consider malicious intent. So there might be additional causes that are not considered as possible by the PHA.
Causes are very important because these result from functional deviations as result of the cyber attack. It is the link between the functional deviation and its physical effect on the production process that causes the process deviations and its consequence / impact.
If it is not guaranteed that the PHA covers all potential causes, the reverse path from “worst case impact” in the PHA to the “cyber security functional deviation” that causes a process deviation is interrupted. This interruption doesn’t only prevent us from getting a likelihood value, but also prevents us from identifying all mitigation options. Walking the event path in the reverse direction is not going to bring us the likelihood of the process deviation happening as result of a cyber attack, so we don’t have risk.
If I follow the methodology of analyzing risk using event paths, I have the following event path:
A threat actor executes a threat action to exploit a vulnerability resulting in some desired consequence, this consequence is a functional deviation causing a process deviation with some process consequence.
The process consequence has a specific impact for the business / mission, such as production loss, potential casualties, a legal violation, etc. This impact (expressed as loss) creates business risk.
The picture shows that the risk path between a cyber security breach resulting in mission impact requires several steps. The standard jumps right to the end (business impact) not considering intermediate consequences, by doing this the standard ignores options to reduce risk. But more on this later.
So as an example of an event path: A nation sponsored organization (threat actor) gains unauthorized access into an ICS (threat action) by capturing the access credentials of an employee without two factor authentication (vulnerability), modifying the range of a tank level instrument (functional deviation), causing a too high level in the tank (process deviation) with as potential consequence overfilling the tank (process consequence) resulting in a loss of containment of a potentially flammable fluid causing toxic vapors requiring an evacuation resulting in potentially multiple casualties and an approximate $500.000 production and cleaning cost loss (business impact).
The likelihood that this event path, scenario, happens as result of a cyber attack is determined by the likelihood of the cyber security failure. The PHA might have considered a similar process deviation (High level in the tank) during the analysis, perhaps with as cause mentioned a failed level transmitter, and reached the same process consequence as potential loss of containment. All very simplified, don’t worry we do apply additional controls.
But the likelihood assigned to this process safety hazard would have been derived from a LOPA (Layers Of Protection Analysis) assignment based on an initiating frequency assignment and an IPL (Independent Protection Layer) reduction factor and MTBF factors of the instrument to estimate a SIL required for the safeguard. There is no relationship between the likelihood of the process deviation in the PHA and the cyber security likelihood.
Important is to realize that not all risk estimates are necessarily linked to a loss. Ideally this is so, because the loss justifies the investment in mitigation. But in many cases we can use what are called risk priority numbers, a risk score based on likelihood and severity for ranking purposes. A risk priority number is often enough for us to decide on mitigation and prioritizing mitigation. For design we only need business risk if the investment needs to be justified. The risk priority numbers already show the most important risk from a technical perspective.
However my conclusion is that the standard is not approaching the initial risk assessment by estimating cyber security risk, it seems to focus on two other aspects:
- What is the worst impact (Expressed as a loss);
- and what does the threat landscape look like.
Neither of them providing a risk value. I fully understand the limitation, because so early in a project there is not enough information available to estimate risk at a level of detail showed above.
So in my opinion an alternative approach is required, and such alternatives exist. Better they have been widely applied in the industry in recent years, but for some reason are ignored by the task group and replaced by something questionable in my opinion.
So how did projects resolve the lack of information and still get important information for understanding the business impact and threats to protect against? Following activities were performed in projects:
- Create a threat profile by conducting a threat assessment;
- Conduct a criticality assessment / business impact analysis.
What is a threat profile? The objective of a threat profile is to:
- Identify the threat actors to be considered;
- Identify and prioritize cyber security risk;
- Align information security risk and OT security risk strategies within the company.
Identifying and prioritizing cyber security risk is done by studying the threat landscape based upon various reliable sources, such as information from a local CERT, MITRE, FIRST, ISF, and various commercial sources. These organizations show the developments in the threat landscape explaining the activities of threat actors and what they do.
Threat actors are identified and their relevance for the company can be rated based on criteria such as: intent, origin, history, motivation, capability, focus. Based on these criteria a threat strength and likelihood can be estimated. Various methods have been developed, for example the method developed by ISF is frequently used.
Based on the information on threat actors and their methods and their focus, a threat heat map is created. So the asset owner can decide on what the priorities are. This is an essential input for anyone responsible for the cyber security of a plant, but also important information for a cyber security risk assessment. Threat actors considered as relevant, play a major role in the required risk reduction for an appropriate protection level. If we don’t consider the threat actor we will quickly over-spend or under-spend on risk mitigation.
In the ISA 62443-3-2 context, the target security level (SL-T) is used as the link toward the threat actor. Because ISA security levels link to motivation, capability, and resources of the threat actor, a security level also identifies a threat actor. The idea is to identify a target security level for each security zone and use this to define the technical controls for the assets in the zone, specified as capability security levels (SL-C) in the IEC 62443-3-3. The IEC 62443-3-3 provides the security requirements that the security zones and conduits must meet to offer the required level of protection.
The ISA 62443-3-2 document does not explain how this SL-T is defined. But if the path is to have an SL-T per security zone, then we need to estimate zone risk. Zone risk is something very different from asset risk or threat based risk. The ANSSI standard provides a method to determine zone risk and uses the result to determine a security class (A,B, C) something similar as the ISA security levels.
I would expect a standard document on security risk assessment to explain the zone risk process, but the standard doesn’t. Because the methodology actually seems to continue with an asset based risk approach I assume the idea is that the asset with the highest risk level in a zone determines the overall zone risk level. This is a valid approach but more time consuming than the ANSSI approach.
For a standard / compliance based security strategy a zone risk approach would have been sufficient. The use of SL-T and SL-C and reference to IEC 62443-3-3 seems to suggest a standard’s / compliance based security approach.
How about the other method in use, the criticality assessment. What does it offer? The criticality assessment helps to establish the importance of each functional unit and business process as it relates to the production process. Illustrating which functions need to be recovered, how fast do they need to be recovered and what their overall importance is from both a business perspective as well as from a cyber security perspective. Which are two totally different perspectives. An instrument asset management system might have a low importance from a business point of view, but because of its connectivity to all field instrumentation it can be considered an important system to protect from a cyber security perspective.
So executing a criticality assessment as second step provides the criticality of the functions (importance and impact do correlate), and it provides us with information on the recovery timing.
OT cyber security risk should not exclusively focus on the identification of the risk related to the threats and how to mitigate it. Residual business risk is not only influenced by preventative and detective controls but also by the recovery potential and speed, because the speed of recovery determines business continuity and lack of business continuity can be translated to loss.
ISA 62443-3-2 doesn’t address the recovery aspect in any way. As the risk assessment is used to define the security requirements, we can not ignore recovery requirements. NIST CSF does acknowledge recovery as an important security aspect, I don’t understand why the ISA 62443-3-2 task group ignored recovery requirements as part of their design risk assessment.
It is important to know the recovery point objective (RPO), recovery time objective (RTO) and maximum tolerable downtime (MTD) for the ICS / IACS. Designing recovery for a potential cyber threat, for example a ransomware infection, differs quite a lot from data recovery from an equipment failure.
In a plant with its upstream and down stream dependencies and storage limitations, these parameters are important information for a security design. See below diagram for the steps in a typical cyber incident. Depending on their detailed procedures, these steps can be time consuming.
The criticality assessment investigates Loss of Ability to Perform over a time span, for example how do we judge criticality after 4 hours, 8 hours, a day, a week, etc… This allows us to consider the dependencies of upstream and downstream processes, and storage capacity. But apart from looking at loss of ability to perform, loss of required performance and loss on confidentiality are also evaluated in a cyber security related criticality assessment. Specifically the loss of required performance links to the “worst impact” looked for by the ISA 62443-3-2.
We also need to consider recovery point objectives (RPO) and understand how the plant wants to recover, are we going to use the most recent controller checkpoint or are there requirements for a controller checkpoint that brings the control loops into an known start-up state.
We need to understand the requirements for the recovery time objective (RTO), this differs for cyber security compared to a “regular” failure. For cyber security we need to include the time taken by the incident response tasks, such as containment and eradication of the potential malware or intruder.
It makes a big difference in time if our recovery strategy opts for restoring a back-up to new hard drives, or if we decide to format the hard drives before we restore (and what type of reformatting would be required). Other choices to consider are back-up restore over the network from one or more central locations, or restore directly at computer level using USB drives. All of this and more has time and cost impact that the security design needs to take into account.
The role of the criticality assessment is much bigger than discussed here, but it is an essential exercise for the security design and as such for the risk assessment that also requires us to assess consequence severity. Consequence severity analysis is actually just an extension of the criticality analysis.
Another important aspect of criticality analysis is that it includes ICS / IACS functions not being related to any of the hazards identified by the PHA. For our security design we need to include all IACS systems, not limit our selves to systems at Purdue reference model level 0, 1 and 2.
A modern IACS has many functions, all ignored by the ISA 62443-3-2 standard if we start at the PHA. IACS includes not only the systems at the Purdue level 0, 1 and 2, (systems typically responsible for the PHA related hazards) but also the systems at level 3. Systems that can still be of vital importance for the plant. The methodology of the standard seems to ignore these.
Another role for the criticality assessment is directly related to the risk assessment. An important question in a large environment is do we assess risk per sub-system or for the whole system. Doing it for the whole system (IACS) with all its sub-systems (BPCS, SIS, MMS, …) and a large variation of different consequences and cross relations, or do we assess risk per function and consolidate these results for comparison in for example a risk assessment matrix.
The standard seems to go for the first approach, which in my opinion (based upon risk assessments for large installations) is a far too difficult and complex exercise. If we keep the analysis very general (and as a result superficial, often at an informal gut-feeling-risk level) it is probably possible, but the results often become very subjective and we miss out on the benefits that other methods offer.
As the base for a security design that is resilient to targeted attacks and its subsequent risk based security management (this would require a risk register) of the complete ICS/IACS, I believe it is an impossible task. All results of a risk assessment need to meet the sensitivity test, and the results need to be discriminitive enough to have value. This requires that methods applied, prevent any subjective inputs that steer the results into a specific direction.
If we select a risk assessment approach per function to allow for more detail, we have to take the difference in criticality of the function into account when comparing risk of different functions. This is of minor importance when we compare risk results from SIS (Safety Instrumented System) or BPCS (Basic Process Control System), because both are of vital importance in a criticality analysis. But we will already see differences in results when we would compare BPCS with MMS (Machine Monitoring System), IAMS (Instrument Asset Management System) or DAHS (Data Acquisition Historian System).
As I already mentioned in my previous blog, risk assessment methodologies have evolved and the method suggested by ISA 62443-3-2 seems to drive toward an approach that can’t handle the challenges of today’s IACS. Certainly not for the targeted attacks where threat actors with “IACS specific skills” ( SL 3, SL 4). To properly execute a risk assessment in an OT environment requires knowledge on the different methods for assessing risk, their strong and weak points, and above all a clear objective for the selected method.
So to conclude my assessment of the second step of the ISA 62443-3-2 process, “Initial risk assessment”, I feel the task group missed too much in this step. The result offers too little, is incomplete and actually providing very little useful information for the next steps, among which the “security zone partitioning”.
Now lets look at the zone partitioning step, to see how the standard and my field experience aligns here.
First of all what is the importance of security zone partitioning with regard to security risk assessment?
- If we want to use zone risk we need to know the boundaries of the security zone;
- For asset risk and threat based risk we need to know the exposure of the asset / channel and connectivity between zones over conduits is an important factor.
So it is an important step in a security risk assessment, even for none security standard based strategies. Let’s start with the overview of what project steps the standard specifies:
- Establish zones and conduits;
- Separate business and ICS/IACS assets;
- Separate safety related assets;
- Separate devices that are temporarily connected;
- Separate wireless devices;
- Separate devices connected via external networks.
Looks like a logical list to consider when creating security zones, but certainly not a complete list. The list seems to be driven by exposure, which is good. But there are other sources of exposure. We can have 24×7 manned and unmanned locations, important for yes / no a session lock of operator stations, so a security characteristic. We have to consider the strength of the zone boundary, is it a physical perimeter (e.g. network cable connected to the port of a firewall), a logical perimeter (e.g. a virtual LAN), or do we have a software defined perimeter (e.g. a hypervisor that separates virtual machines on virtual network segments).
The standard, and as far as I am aware none of the other ISA standards, consider virtualization. This is a surprise for me because virtualization seems to be core for the majority of greenfield projects, and even as part of brown field upgrade projects virtualization is a frequent choice today. Considering that the standard is issued April 2020, and ISA has a policy to refresh standards every 5 years, this is a major gap because software zone perimeters for security zones is an important topic.
Considering that virtualization is a technology used in many new installations, and considering the changes that technology like APL and IIoT are bringing us. Not addressing these technologies in a risk based design document that discusses zone partitioning makes the standard almost obsolete in the year it is issued.
Perhaps a very hard verdict, but a verdict based on my personal experience where 4 out of 5 large greenfield projects were based on virtualized systems a subject that should have been covered. Ignoring virtualization for zone partitioning is a major gap in 2020 and the years to come.
There is a whole new “world” today of virtual machine hosts, virtual machines, hardware clusters, and software defined perimeters. And this has nothing to do with the world of private clouds or Internet based clouds, virtualization is proven technology for at least 5+ years now, conquering ICS/IACS space with increasing speed. It should not have been missed.
The 62443-3-2 standard is not very specific with regard to the other subjects in this step, so little reason for me to criticize the remainder of the text. If I have to add a point, then I might say that the paragraph on separating safety and non-safety related assets is very thin. I would have expected a bit more in April 2020, plenty of unanswered design questions for this topic.
The next step the standard asks us to do is “risk comparison“, we have to compare initial risk (step 2) with residual risk, the risk after the zone partitioning step. There is the “little” issue that neither in the initial risk assessment step, nor in the zone partitioning step we estimate risk. Nor do we establish anywhere risk criteria, important information when we want to compare risk.
Zone partitioning does change the risk, we influence exposure through connectivity, but there is no risk estimated in the partitioning step. Perhaps the idea is to take the asset with the highest risk in the zone and use this for the SL-T / SL-C and identify missing security requirements, but this is not specified and would be an iterative process because also asset risk depends among others on exposure from connectivity.
I had expected something in the partitioning task that would estimate / identify zone risk and assign the zone a target security level using some transformation matrix from risk to security level. The standard doesn’t explain this process, it doesn’t seem to be an activity of the zone partitioning task, it doesn’t refer to another standard document for solving it. Which is an omission in my opinion.
But ok no problem, if we can’t accept the residual risk we are going to do the next step, the detailed risk assessment and keep iterating till we accept the residual risk. So perhaps I must read this more as a first step in an evaluation loop. Doesn’t take away that if I decide first time I am happy with the residual risk, I will have nothing else than what the initial risk assessment produced, and this was not cyber security risk.
The fifth step is the detailed risk assessment. Let’s see what we need to do:
- Identify cyber security threats;
- Identify vulnerabilities;
- Determine consequences and impact;
- Determine unmitigated likelihood;
- Determine unmitigated cyber security risk;
- Determine security level target to link to the IEC 62443-3-3 standard.
If the unmitigated risk exceeds tolerable risk we need to continue with mitigation. Following steps are defined:
- We need to identify and evaluate existing countermeasures;
- We are asked to reevaluate the likelihood based on these existing countermeasures;
- Determine the new residual risk;
- Compare residual risk against tolerable risk and reiterate the cycle if the residual risk is still too high;
- If all residual risk is below the tolerable risk, we document the results in the risk assessment report.
To do all of the above we need to have risk criteria, these are not mentioned in the standard neither is mentioned what criteria there are. The standard seems to adopt the three risk levels often used for process safety: acceptable risk, tolerable risk, and unacceptable risk.
Unfortunately this is too simple for cyber security risk. When process safety risk is unacceptable, an accepted policy is too stop and fix it before continuing. For tolerable risk there would be a plan in place on how to improve it when possible.
Cyber security doesn’t work that way, I have seen many high cyber security risks, but only in a few cases noted that plant managers were willing to accept a loss (production stop) to fix it. An example where loss because of security risk was accepted was when Aramco and 2 weeks later Qatar gas were attacked by the Shamoon malware. At that time the regional governments instructed the plants to disconnect the ICS/IACS from their corporate network, this induces extra cost. We have seen similar decisions caused by ransomware infections, plants pro-actively stopping production to prevent an infection propagating.
Cyber security culture differs from process safety culture, this translates into risk criteria and the action-ability of risk. The less risk levels, the more critical the decision. As result, plants seem to have a preference for 5 or 6 risk levels. This allows them a bit more flexibility.
Tolerable risk is the “area” between risk appetite and risk tolerance. Risk appetite being the level we can continue production without actively pursuing further risk reduction, and risk tolerance being the limit above which we require immediate action. Risk criteria are important, in the annex ISA 62443-3-2 provides some examples in risk matrices. A proper discussion on risk criteria is missing. Likelihood levels, severity levels, impact levels, importance levels all need to be defined. It is important that if we say this is a high risk, all involved understand what this means. Also actions need to be defined for a risk level, risk needs to result into some action if it exceeds a level.
The standard limits itself to business risk, impact expressed as a loss. Business risk is great to justify investment but does very little for identifying the most important mitigation opportunities because it is a worst case risk. It is like saying if I am walking in a thunderstorm I can die from being struck by lightning, so I no longer need to analyze the risk using a zebra to cross the road.
Another point of criticism I have is why only risk reduction based on likelihood reduction is considered. The standard seems to ignore addressing opportunities on the impact side for taking away consequences? Reduction on the consequence side has proved to be far more effective. In the methodology chosen by the standard this is not possible because the only impact / consequence recognized is the ultimate business impact. The consequences leading to this impact, the functional deviation in IACS functions, are not considered.
The three step process described in above block diagram of risk and shown in more detail in my previous blog as the “event path” doesn’t exist for the ISA adopted method. Though it are the countermeasures and safeguards we use to reduce the cyber security risk.
For example there are many plants that do not allow the use of the Modbus TCP/IP protocol for switching critical process functions using PLCs that depend on Modbus communication. If such an action is required they hardwire the connection to prevent being vulnerable for various Modbus TCP/IP message injection and modification attacks. This is a safeguard taking a way a very critical consequence such as an unauthorized start or stop of a motor, compressor or other by injecting a Modbus message.
Another point we are asked to do is to determine the security level target (SL-T). Well assuming there is some transformation matrix converting risk into security level (not in the ISA 62443-3-2) we can do this, but how?
It is a repeating issue, the standard doesn’t explain if I need to use zone risk like ANSSI does, or determine zone risk as the asset with the maximum risk in a zone. And when I have risk what would be the SL-T?
Once we have an SL-T we can compare this with the security level capability (SL-C) to get the security requirements from the IEC 62443-3-3. ISA 62443-3-2 seems to be restrict to the standard based security strategy. A good point to start but not a solution for critical infrastructure.
Another surprise I have is the focus on the unmitigated risk, why not include the existing countermeasures immediately? Why are we addressing risk mitigation exclusively by addressing likelihood. Where do we include the assets to protect (equipment, function, channel)? I think I know the answer, the risk methodology adopted doesn’t support it. The task group either didn’t investigate the various risk estimation methodologies available or worked for some reason toward applying a methodology that is not capable of estimating risk for multiple countermeasures / safeguards. Which is strange choice because in process safety this methodology is successfully applied through LOPA.
So how to summarize this pile of criticism? First of all I want to say that this is the standard document I criticize the most, maybe because it is the subject that comes closest to my work. There are always small points where opinions can deviate, but in this case I have the feeling the standard doesn’t offer what it should bring, it seems to struggle with the very concept of risk analysis. Which is amazing because of the progress made by both science and asset owners in recent years.
Because risk is such an essential concept for cyber security I am disappointed in the result, despite all the editing of this blog and the many versions that were deleted I think the blog still shows this disappointment.
It is my feeling that the task group didn’t investigate the available risk methodologies sufficiently, didn’t study the subject of risk analysis, and they didn’t seem to compare results of the different methods. They aimed for one method and wrote the standard around it. That is a pity because it will take another 5 years before we see an update and 5 years in cyber security are an eternity.
There is no relationship between my opinions and references to publications in this blog and the views of my employer in whatever capacity. This blog is written based on my personal opinion and knowledge build up over 42 years of work in this industry. Approximately half of the time working in engineering these automation systems, and half of the time implementing their networks and securing them.
Author: Sinclair Koelemij
OTcybersecurity web site