The Ukraine crisis will almost certainly raise the cyber security risk for the of rest Europe. The sanctions imposed on Russia demand an increased awareness and defense effort for securing the OT systems. These sanctions will hurt and shall undoubtedly become an incentive for an organized revenge from very capable threat actors. What could be more effective for them than cyberattacks at a safe distance.
I think all energy-related installations such as for example port terminals, pipelines, gas distribution, and possibly power will have to raise their level of alertness. Until now, most attacks have focused on the IT systems, but that does not mean that IT systems are the only targets and the OT systems are safe. Attacking the OT systems can cause a much longer downtime than a ransomware attack or wiping disk drives would, so such an attack might be seen as a strong warning signal.
Therefor it is important to bolster our defenses. Obviously we don’t have much time, so our response should be short term, structural improvements just take too much time. So what can be done?
Let’s create a list of possible actions that we could take today if we want to brace ourselves against potential cyber attacks:
Review all OT servers / desktops that have a connection with an external network. External including the corporate network and partner networks. We should make sure that these servers have the latest security patches installed. Let’s at minimum remove the well known vulnerabilities.
Review the firewall and make certain they run the latest software version.
Be careful which side you manage the firewall from, managing from the outside is like putting your front door key under the mat.
Review all remote connections to service providers. Such connections should be free from:
Open inbound connections. An inbound channel can often be exploited, more secure remote access solutions poll the outside world for remote access requests preventing any open inbound connections.
Automatic approvals on access requests, make sure that every request is validated prior to approval for example using a voice line.
Modify your access credentials for the remote access systems, they might have been compromised in the past. Use strong passwords of sufficient length (10+) and character variation. Better is of course to combine this with two-factor authentication, but if you don’t have this today it would take too much time to add it. Would be a mid-term improvement, this list is about easy steps to do now.
Review the accounts that have access, remove stale accounts not in use.
Apply the least privilege principle. Wars make the insider threat more likely to happen, enforcing the least privilege principle will raise the hurdle.
Ensure you have session time outs implemented, to prevent that sessions remain open when they are not actively used.
Review the remote server connections. If there are inbound open ports required make sure the firewall restricts access as much as possible using at minimum IP address filters and TCP port filters. But better would be (if you have a next generation firewall in place) to add further restrictions such as restricting the access to a specific account.
Review your antivirus to have the latest signature files, the same for your IPS vaccine files.
Make certain you have adequate and up-to-date back-ups available. Did you ever test to restore a back-up?
You should have multiple back-ups, at minimum 3. It is advised to store the back-ups on at least 2 different media, don’t have both back-ups online accessible.
Make sure they can be restored on new hardware if you are running legacy systems.
Make sure you have a back-up or configuration sheet for every asset.
Hardening your servers and desk tops is also important, but if you never did this it might take some time to find out which services can be disabled and which services are essential for the server / desk top applications. So probably a mid-term activity, but reducing the attack surface is always a good idea.
Have your incident response plan ready at hand, and communicated throughout the organization. Ready at hand, meaning not on the organizational network. Have hardcopies available. Be sure to have updated contact lists and plan to have communications using non-organizational networks and resources. (Added by Xander van der Voort)
I don’t know if I missed some low hanging fruit, if so please respond to the blog so I can make the list more complete. This list should mention the easy things to do, just configuration changes or some basic maintenance. Something that can be done today if we would find the time.
Of course, our cyber worries are of a totally different order than the people in Ukraine are now experiencing for their personal survival and their survival as an independent nation. However the OT cyber community in Europe must also take responsibility and map out where our OT installations can be improved / repaired in a short time, to reduce risk.
Cyber wars have no borders, so we should be prepared.
And of course I shouldn’t forget my focus on OT risk. A proper risk assessment would bring you an insight in what threat actions (at TTP level) you can expect, and for which of these you already have controls in place. In situations like we are in now, this would be a great help to quickly review the security posture and perhaps adjust our risk appetite a bit to further tighten our controls.
However if you haven’t done a risk assessment at this level of detail today, it isn’t feasible to do this short term therefore it is not in the list. All I could do is going over the hundreds of bow-ties describing the attack scenarios and try to identify some quick wins that raise the hurdle a bit. I might have missed some, but I hope that the community corrects me so I can add them to the list. A good list of actions to bolster our defenses is of practical use for everyone.
I am not the guy that is easily scared by just another log4j story, but now I think we have to raise our awareness and be ready to face some serious challenges on our path. So carefully review where the threat actor might find weaknesses in your defense and start fixing them.
There is no relationship between my opinions and references to publications in this blog and the views of my employer in whatever capacity. This blog is written based on my personal opinion and knowledge build up over 43 years of work in this industry. Approximately half of the time working in engineering these automation systems, and half of the time implementing their networks and securing them, and conducting cyber security risk assessments for process installations since 2012.
Whenever I think about an ICS perimeter, I see the picture of a well head with its many access controls and monitoring functions. In this mid-week blog, I have chosen an easy digestible subject. No risk and certainly no transformers this time, but something I call the “Classic ICS perimeter”.
What is classic about this perimeter? The classic element is that I don’t discuss all those solutions that contribute to the “de-perimeterization” – if this is a proper English word – of the Industrial Control Systems (ICS) when we implement wireless sensor networks, virtualization, or IIoT solutions.
There are many different ICS perimeter architectures, some architectures just exist to split management responsibilities and some architectures add or attempt to add additional security. I will discuss in this blog that DCS and SCADA are really two different systems, with different architectures, but also different security requirements.
When I started to make the drawings for this blog, I quickly ended up with 16 different architectures, some bad some good, but all exist but my memory might have failed too. The focus in the blog is on the perimeter between the office domain(OD) and the process automation network (PAN). I will briefly detail the PAN into its level 1, level 2, and level 3 segments, see also my blog on the Purdue Reference Model. Internal perimeters between level 2 and level 3 are not discussed in this blog because of the differences that exist between different vendor solutions for level 1 and level 2 or interfacing with legacy systems.
Different vendors often differ in level 2 / level 1 architecture, different implementation rules to follow to meet vendor requirements and different network equipment. To cover these differences would almost be a second blog. So this time a bit more a focus on IT technology than my normal focus on OT cyber security. More the data driven IT security world than the data plus automation action driven OT cyber security world.
Maybe the first question is, why do we have a perimeter? Can’t we just air-gap the ICS?
We generally have a perimeter because data needs to be exchanged between the planning processes in the office domain and the operations management / product management functions in the PAN (see also my PRM blog) and sometimes engineers want remote access into the network (see the remote access blog). When the tanker leaves Kuwait, the composition data of the crude is available and the asset owner will start its planning process. Which refinery can best process the crude, what is the sulfur level in the crude, and many more. Ultimately when the crude arrives, and is stored into tanks in the terminal, and forwarded to the refinery to produce the end product, this data is required to set some of the parameters in the automation system. Additionally production data is required by the management and marketing departments, custody metering systems produce records on how much crude has been imported, environmental monitoring systems collect information from the stacks and water surface to report that no environmental regulations are violated.
Only for very critical systems, such as for example nuclear power, I have seen fully isolated systems. Not only are the control systems isolated, but also the functional safety systems remain isolated. Though also in this world more functions become digital, and more functions are interfaced with the environment.
More common is the use of data diodes as perimeter device in cases where one way traffic (from PAN to OD) suffices. And also in this world we see compromises by allowing periodic reversal of the data flow direction to update antivirus systems and security patches. But by far, most systems have a perimeter based on a firewall connection between the OD and the PAN, the topic of this blog.
I start the discussion with three simple architecture examples.
Architecture 1, a direct connection between the OD and the PAN.
If the connection is exclusively an outbound connection, this can be a secure solution for less critical sites. Though any form of defense in depth is missing here, if the firewall gets compromised the attacker gets full access to the PAN. A firewall / IPS combination would be preferred. Still some asset owners allow this architecture to pass outbound history information in the direction of the office domain.
Architecture 2, adding a demilitarized zone (DMZ).
A DMZ is added to allow the inspection of the data before the data continuous on its path to functions in the PAN. But if we do this we need to actually inspect this data, just forwarding it using the same protocols adds a little security (hiding the internal network addresses of the PAN) but only if we use a different range and not just continue with private IP address ranges like the 10.10 or 192.168 range.
Alternatively the PAN can make data available for the office domain users by offering an interface to this data in the DMZ. For example a web service providing access to process data. But we better make sure it is a read only function, not allowing write actions to the control functions.
The theoretically ideal DMZ only allows inbound traffic. For example a function in the PAN sends data to a DMZ function, and a user or server in the OD collects this data from the DMZ. Or in the reverse direction. Unfortunately not all solutions offer this possibility, in those cases inbound data from the OD needs to continue toward a PAN function. In this situation we should make certain that we use different protocols for the traffic coming in the DMZ and the traffic going from the DMZ to the PAN function. (Disjoint protocols)
The reason for the dis-joint protocol rule is to prevent that a vulnerable service can be used by a network worm to jump from the OD into the PAN. Typical protocols where this can happen are RDP (Microsoft terminal server), SMB (e.g. file shares or print servers), RPC (e.g. RPC DCOM used by classic OPC), and https (used for data exchange).
If the use of disjoint protocols is not available, an IPS function in the firewall becomes very important. The IPS function can intercept the network worm attempting to propagate through the firewall by inspecting the traffic for exploit patterns.
Another important consideration in architectures like this is how to handle the time synchronization of the functions in the DMZ. The time server protocol (e.g. NTP) can be used for amplification attacks in an attempt to create a denial of service of a specific function. An amplification attack happens when a small message triggers a large response message, if we combine this with spoofing the source address of the sender we can use this for attacking a function within the PAN and potentially overloading it. To protect against this, some firewalls offer a local time server function. In this case the firewall synchronizes with the time server in the PAN and offers a separate service in the DMZ for time synchronization. So there is no inbound (DMZ to PAN) NTP traffic required, preventing the denial of service amplification attack from the DMZ toward a PAN function.
Architecture 3, adding an additional firewall.
Adding an additional firewall prevents that if the outer firewall is compromised, the attacker has direct access into the PAN. With two firewalls, breaching the first firewall gives access to the DMZ, but a second firewall is there to stop / delay the attacker. This moment needs to be used to detect the first breach by monitoring the traffic and functions in the DMZ for irregular behavior.
This delay / stop works best if the second firewall is of a different vendor / model. If both would be from the same vendor, using the same operating software, the exploit causing the breach on the first firewall would most likely also work for the second. Having two different vendors delays the attacker more and increases the chance on detecting the attacker trying to breach the second firewall. DMZs create a very strong defense if properly implemented. If this is not possible we should look for compensating controls, but never forget that defense in depth is a key security principle. It is in general not wise to rely on just one security control to stop the attacker. And please don’t think that the PRM segmentation is defense in depth enough, there are very important operations management functions at level 3 that are authorized to perform critical actions in the production management systems at level 2 and level 1. Know your ICS functions, understand what they do and how they can fail. It is an essential element in OT cyber security it is not just controlling network traffic.
A variation on architecture 2 and 3 is shown in the next diagram. Here we see for architecture 4 and architecture 5 two firewalls in series (Orange and red). This architecture is generally chosen if there are two organizations responsible for the perimeter. For example the IT department and the plant maintenance department, each protecting their realm. Though the access rules for the inbound traffic are the same for both firewalls in architecture 4 and 5, this architecture can offer a little bit more resilience than architecture 2 / 3 because of the diversity added if we use two different firewalls.
Architecture 6, adds a second internal boundary between level 3 functions (operations management) and level 2 / level 1 functions (production management).
For the architectures 1 to 5 this might have been implemented with a router between level 3 and level 2 in combination with access control lists. Firewalls can offer more functionality, especially Next Generation Firewalls (NGFW – strange marketing invention that seems to hold for all new firewall generations to come) offer the possibility to limit access based on user and specific applications or proxies allowing for a more granular control over the access into the real-time environment of the production management functions.
Sometimes plants require Internet access for perhaps specialized health monitoring systems of the turbine or generator, or maybe remote access to a support organization. Preferably this is done by creating a secure path through the office domain toward the Internet, but it happens that the IT department doesn’t allow for this or there is no office domain to connect to. In those cases asset owners are looking for solutions using an Internet provider, or a 4G solution.
Architecture 7 shows the less secure option to do this if the DMZ also hosts other functions. In that case architecture 8, with a separate DMZ for the Internet connection, is preferred because the remote connectivity is kept separate from any function in DMZ 1. This allows for more restricted access filters on the path toward the PAN and reduces the risk for the functions in DMZ 1. The potential risk with architecture 7 is that the firewall that connects to the Internet is breached and gets access to the functions in the DMZ, potentially breaching these and as a next step gaining access to the PAN. We should never immediately expose the firewall with the OD perimeter to the Internet, also here two different firewalls improve security preferably only allowing end to end protected and encrypted connections.
The final 4 DCS architectures I discuss briefly are more as example for an alternative approach.
Architecture 9, is very similar to architecture 6 without the DMZ. The MES layer (Manufacturing Execution Systems) hosts the operation management systems. This type architecture is often seen in organizations where the operation management systems are managed by the IT department.
This type of architecture also occurs when there are different disciplines “owning” the system responsibility, for example a team for the mechanical maintenance “owning” a vibration monitoring function, another team “owning” a power management function and the motor control center functions, and maybe a 3rd group “owning” the laboratory management functions.
In this case there are multiple systems, each with its own connection to the corporate network. In general splitting up the responsibility for security often creates inconsistencies in architecture and security operations and as such a higher residual risk for the investment made. Sometimes putting all your eggs into one basket is better, when we focus our attention on this single basket.
Architecture 10 is the same as architecture 9 but now with a DMZ allowing additional inspections. Architecture 11 is an architecture frequently used in larger sites with multiple plants connected to a common level 3 network. There is a single access domain that controls all data exchange with the office domain, hosts various management functions such as back-up, management of the antivirus systems, security patch management, and remote access.
There are some common operations management functions at L3 and each plant has its own production management system connected through a firewall.
Architecture 12 is similar but the firewall is replaced by a router filtering access to the L2/L1 equipment. In smaller system this can offer similar security, but like discussed a firewall offers more functions to control access.
An important pit fall in many of these architectures, is the communication using classic OPC. Due to the publications of Eric Byres allmost 15 years ago, and the development of the OPC firewall there is a focus on the classic OPC protocol not being firewall friendly. This because of the wide dynamic port range required by RPC DCOM for the return traffic. Often the more important security issue is the read / write authorizations of the OPC server.
Several OPC server solutions enable read / write authorizations server wide, which results in also exposing control tags and their parameters not required for implementing the automation function required.
For example sometimes a vibration management system with write access to the control function to signal a shutdown of the compressor because of excessive vibrations, also permits this system to approach other process tags and their parameters. Filtering the traffic between the two systems in that case doesn’t provide much extra security if we have no control over the content of the traffic.
Implementation of OPC security gateway functionality offers more protection in those cases. Limiting which process tags can be browsed by the OPC client, which process tag / parameter can be written to and which can be read from.
Other improvements are related to OPC UA where solutions exist that support reverse connect, so the firewall doesn’t require inbound traffic if communication is required that crosses the perimeter.
So far high level some common ICS architectures when DCS is used for the BPCS (Basic Primary Control System) function, the variation in SCADA architectures is smaller, I discuss the 4 most common ones.
The first SCADAs were developed to centralize the control of multiple remote substations. In those days we had local control (generally with PLCs and RTUs. IEDs came in later times) in the substation and needed a central supervisory function to oversee the substations.
A SCADA architecture generally has a firewall connecting it with the corporate network and a firewall connecting it with the remote substations. This can be a few substations, but also hundreds of locations in the case of pipelines where block valves are controlled to segment the pipe in case of damaged pipelines. Architecture 13 is an example of such an architecture. The key characteristic here is that the substations have no firewalls. Architecture 13 might be applied in the case if the WAN is a private network dedicated for the task to connect to the substations / remote locations.
Substation architecture varies very much depending on the industry. A substation in the power grid has a different architecture, than a compressor substation for a pipeline, or a block valve station segmenting the pipeline, or a clustering of offshore platforms.
Architecture 14 is an architecture where we have a fall back control center. If one control center fails, the fall back center can take over the primary control center. Primary is perhaps a wrong word here, because some asset owners periodically swap between the two centers to verify its operation.
The two control centers require synchronization, this is done by the direct connection between the two. It depends very much on the distance between the two centers how synchronization takes place. Fall back control centers exist on different continents many thousands of miles distance.
Not shown in the diagram but often implemented is a redundant WAN. If the primary connection fails the secondary connection takes over. Sometimes a G4 network is used, sometimes an alternative WAN provider.
Also here diversity is an important control, implementing a fall back WAN using a different technology can offer more protection – a lower risk.
Architecture 15 is similar to architecture 13, with the difference of the firewall at the substations, this when the WAN connections are not private connections. The complexity here is the substation firewalls in combination with the redundancy of the WAN connections. Architecture 16 adds the fall back control center.
Blogs have to end, though there is much to explain. In some situations we need to connect both the BPCS function and the SIS function to a remote location. This creates new opportunities for an attacker if not correctly implemented.
A good OT cyber security engineer needs to think bad, consider which options an attacker has, what the attack objective can be. To understand this it is important to understand the functions because it are these functions and their configuration that can be misused in the plans of the threat actor. Think in functions, consider impact on data and impact on the automation actions. Network security is an important element, but with just looking at network security we will not secure an industrial control system.
There is no relationship between my opinions and publications in this blog and the views of my employer in whatever capacity. this blog is written based on my personal opinion and knowledge build up over 42 years of work in this industry. Approximately half of the time working in engineering these automation systems, and half of the time implementing their networks and securing them.