Tag: OT cyber security

Are Power Transformers hackable?

There is a revived attention for supply chain attacks after the seize of a Chinese transformer in the port of Houston. While on its way to a US power transmission company – Western Area Power Administration (WAPA) – the 226 ton transformer was rerouted to Sandia National Laboratories in Albuquerque New Mexico for inspection on possible malicious implants.

The sudden inspection happens more or less at the same time that the US government issued a presidential directive aiming for white listing vendors allowed to supply solutions for the US power grid, and excluding others to do so. So my curiosity is raised and additionally triggered by the Wall Street Journal claim that transformers do not contain software-based control systems and are passive devices. Is this really true in 2020? So the question is, are power transformers “hackable” or must we see the inspection exclusively as a step in increasing trade restrictions.


Before looking into potential cyber security hazards related to the transformer, let’s first look at some history of supply chain “attacks” relevant for industrial control systems (ICS). I focus here on supply chain attacks using hardware products because in the area of software products, Trojan horses are quite common.

Many supply chain attacks in the industry are based on having purchased counterfeit products. Frequently resulting in dangerous situations, but generally driven by economic motives and not so much by a malicious intent to damage the production installation. Some examples of counterfeits are:

  • Transmitters – We have seen counterfeit transmitters that didn’t qualify for the intrinsic safety transmitter zone qualifications specified by the genuine product sheet. And as such creating a dangerous potential for explosions in a plant when these products would be installed in zone 1 and zone 0 areas with a potential for the presence of explosive gases.
  • Valves – We have seen counterfeit valves, where mechanical specifications didn’t meet the spec sheet of the genuine product. This might lead to the rupture of the valve resulting in a loss of containment with potential dangerous consequences.
  • Network equipment – On the electronic front we have seen counterfeit Cisco network equipment that could be used to create a potential backdoor in the network.

However it seems that the “attack” here is more an exploit of the asset owner’s vulnerability for low prices (even if they sound ridiculously low), in combination with highly motivated companies trying to earn some fast money, than an intentional and targeted attack on the asset integrity of an installation.

That companies selling these products are often found in Asia, with China as the absolute leader according to reports, is probably caused by a different view / attitude toward patents, standards and intellectual property in a fast growing region and additionally China’s economic size. Not necessarily a plot against an arrogant Western world enemy.

The most spectacular example of such an incident is where counterfeit Cisco equipment ended up in the military networks of the US. But as far as I know, it was also in this case never shown that the equipment’s functionality was maliciously altered. Main problem was a higher failure rate caused by low manufacturing standards, potentially impacting the networks reliability. Never the less also here a security incident because of the potential for malicious functionality.

Also proven malicious incidents have occurred, for instance in China, where computer equipment was sold with already pre-installed malware. Malware not detectable by antivirus engines. So the option to attack industrial control systems through the supply chain certainly exist, but as far as I am aware never succeeded.

But there is always the potential that functionality is maliciously altered, so we need to see above incidents as security breaches and consider them to be a serious cyber security hazard we need to address. Additionally power transformers are quite different from the hardware discussed above, so a supply chain attack on US power grid using power transformers is a different analysis. If it would happen and was detected it would mean end of business for the supplier, so stakes are high and chances that it happens are low. Let’s look now at the case of the power transformer.


For many people, a transformer might not look like an intelligent device. But in today’s world everything in the OT world becomes smart (not excluding the possibility we ourselves might be the exception), so we also have smart power transformers today. Partially surfing on the waves of the smart hype, but also adding new functions that can be targeted.

Of course I have no information on the specifications of the WAPA transformer, but it is a new transformer so probably making use of today’s technology. Since seizing a transformer is not a small thing, transformers used in the power transmission world are designed to carry 345 kilo volts or more and can weigh as much as 410 tons (820.000 lb in the non-metric world), there must be a good reason to do so.

One of the reasons is of course that it is very critical and expensive equipment (can be $5.000.000+) and is built specifically for the asset owner. If it would fail and be damaged, replacement would take a long time. So this equipment must not only be secure, but also be very reliable. So worth an inspection from different viewpoints.

What would be the possibilities for an attacker to use such a huge lump of metal for executing a devious attack on a power grid. Is it really possible, are there scenarios to do so?

Since there are many different types of power transformers, I need to make a choice and decided to focus on what are called conservator transformers, these transformers have some special features and require some active control to operate. Looking at OT security from a risk perspective, I am more interested in if a feasible attack scenario exists – are there exposed vulnerabilities to attack, what would be the threat action – then in a specific technical vulnerability in the equipment or software that make it happen today. To get a picture of what a modern power transformer looks like, the following demo you can play with (demo).

Look for instance at the Settings tab and select the tap position table from where we can control or at minimum monitor the onload tap changes (OLTC). Tap changers select variable turn ratios to either increase or decrease the turn ratio to regulate the output voltage for variations on the input side. Another interesting selection you find when selecting the Home icon, leading you directly to the Buchholz safety relay. Also look at the interface protocol Goose, I would say it all looks very smart.

I hope everyone realizes from this little web demo, that what is frequently called a big lump of dumb metal might actually be very smart and containing a lot more than a view sensors to measure temperature and level as the Wall Street Journal suggests. Like I said I don’t know WAPA’s specification, so maybe they really ordered a big lump of dumb metal but typically when buying new equipment companies look ahead and adopt the new technologies available.

Let’s look in a bit more detail to the components of the conservator power transformer, being a safety function the Buchholz relay is always a good point to start if we want to break something. The relay is trying to prevent something bad from happening, what is this and how does this relay counter this, can we interfere?

A power transformer is filled with insulating oil to insulate and serve as a coolant between the windings. The Buchholz relay connects between the overhead conservator (a tank with insulating oil) and the main oil tank of the transformer body. If a transformer fails, or is overloaded this causes extra heat, heated insulating oil forms gas and the trapped gas presses the insulating oil level further down (driving it into the conservator tank passing the Buchholz relay function) so reducing the insulation between the windings. The lower level could cause an arc, speeding up the process and causing more gas pressure, pressing the insulating oil even more away and exposing the windings.

It is the Buchholz relay’s task to detect this and operate a circuit breaker to isolate the transformer before the fault causes additional damage. If the relay wouldn’t do its task quick enough the transformer windings might be damaged causing a long outage for repair. In principal Buchholz relays, as I know them, are mechanical devices working with float switches to initiate an alarm and the action. So I assume there is not much to tamper with from a cyber point of view.

How about the tap changer? This looks more promising, specifically an on load tap changer (OLTC). There are various interesting scenarios here, can we make step changes that impact the grid? When two or more power transformers work in parallel, can we create out-of-step situations between the different phases by causing differences in operation time?

An essential requirement for all methods of tap changing under load is that circuit continuity must be maintained throughout the tap stepping operation. So we need a make-before-break principle of operation, which causes at least momentary, that a connection is made simultaneously to two adjacent taps on the transformer. This results in a circulating current between these two taps. To limit this current, an impedance in the form of either resistance or inductive reactance is required. If not limited the circulating current would be a short-circuit between taps. Thus time also plays a role. Voltage change between taps is a design characteristic of the transformer, this is normally small approximately 1.25% of the nominal voltage. So if we want to do something bad, we need to make a bigger step than expected. The range seems to be somewhere between +2% and -16% in 18 steps, so quite a jump is possible if we can increase the step size.

To make it a bit more complex, a transformer can be designed with two tap changers one for in phase changes and one for out of phase changes, this also might provide us with some options to cause trouble.

So plenty of ingredients seem to be available, we need to do things in a certain sequence, we need to do it within a certain time, and we need to limit the change to prevent voltage disturbances. Step changers use a motor drive, and motor drives are controlled by motor drive units, so it looks like we have an OT function. Again a bit larger attack surface than a few sensors and a lump of metal would provide us. And then of course we saw Goose in the demo, a protocol with issues, and we have the IEDs that control all this and provide protection, a wider scope to investigate and secure but not part of the power transformer.

Is this all going to happen? I don’t think so, the Chinese company making the transformers is a business, and a very big one. If they would be caught tampering with the power transformers than that is bad for business. Can they intentionally leave some vulnerabilities in the system, theoretically yes but since multiple parties (the delivery contains also non-Chinese parts) are involved it is not likely to happen. But I have seen enough food for a more detailed analysis and inspection to find it very acceptable that also power transformers are assessed for their OT security posture when used in critical infrastructure.

So on the question are power transformers hackable, my vote would be yes. On the question will Sandia find any malicious tampering, my vote would be no. Good to run an inspection but bad to create so much fuss around it.


There is no relationship between my opinions and publications in this blog and the views of my employer in whatever capacity.


Sinclair Koelemij

The Purdue Reference Model outdated or up-to-date?


Is the Purdue Reference Model (PRM) outmoded? If I listen to the Industrial Internet of Things (IIoT) community, I would almost believe so. IIoT requires connectivity to the raw data of the field devices and our present architectures don’t offer this in an easy and secure way. So let’s look in a bit more detail to the PRM and the IIoT requirements, to see if they really conflict or can coexist side by side?


I start the discussion with the Purdue Reference Model, developed in the late 80’s. Developed in a time that big data was anything above 256 Mb and the Internet was still in its early years, a network primarily used by and for university students. PRM’s main objective was creating a hierarchical model for manufacturing automation, what was called computer integrated manufacturing (CIM) in those days. If we see the model as it was initially published we note a few things:

  • The model has nothing to do with a network, or security. There are no firewalls, there is no DMZ.
  • There is an operator console at level 1. Maybe a surprise for people who started to work in automation in the last 20 years, but in those days quite normal.
  • Level 4 has been split in a level 4A and a level 4B, to segment functions that directly interact with the automation layers and functions requiring no direct interface.
  • There is no level 0, the Process layer is what it is a bunch of pipes, pumps, vessels and some clever stuff.

Almost a decade later we had ANSI / ISA-95 that took the work of the Purdue university and extended it with some new ideas and created an international standard that became very influential for ICS design. It was the ANSI / ISA-95 standard that named the Process level, level 0. But in those days level 0 was still the physical production equipment. The ISA-95 standard says following on level 2, level 1, and level 0 : ” Level 0 indicates the process, usually the manufacturing or production process. Level 1 indicates manual sensing, sensors, and actuators used to monitor and manipulate the process. Level 2 indicates the control activities, either manual or automated, that keeps the process stable or under control.” So level 0 is the physical production equipment, and level 1 includes the sensors and actuators. It was many years later that people started using level 0 for the field equipment and their networks, all new developments with the rise of Foundation Fieldbus, Profibus, the HART protocol, but never part of the standard. It was probably the ISA 99 standard that introduced a new layer between level 4 and level 3, the DMZ. It was the vendor community that started giving it a level name, level 3.5. But level 3.5 has never been a functional level, it was an interface between two trust zones for adding security. Though often the way it was implemented contributed little to security, but it was a nice feeling to say we have a demilitarized zone between the corporate network and the control network. So far the history of the Purdue Reference Model and ISA-95 and their contribution to the levels. Now lets have a look at how a typical industrial control system (ICS) looks without immediately using names for levels.


From a functional viewpoint we can split a traditional ICS architecture in 5 main parts:

  • Production management system, which task it is to control the automation functions. Typical domain of the DCS and SCADA systems. But also compressor control (CCS), power management systems (PMS) and the motor control center (MCC) reside here. basically everything that takes care of the direct control of the production process. And of course all these systems interact with each other using a wide range of protocols, many of them with security short comings;
  • Operations management system, which task it is to optimize the production process, but also to monitor and diagnose the process equipment (for example machine monitoring functions such as vibration monitoring), and various management functions such as asset management, accounting and reconciliation functions to monitor the mass balance of process units, and emission monitoring systems to meet environmental regulations;
  • Information systems is the third category, these systems collect raw data and transform it into information to be used for historical trends or to feed other systems with information. The objective here is to transform the raw data into information and ultimately information into knowledge. The data samples stored are normally one minute snapshots, hourly averages, shift averages, daily averages, etc. An other example of an information system is custody metering system (CMS) for measuring product transfer for custody purposes.
  • The last domain of the ICS is the Application domain, for example for advanced process control (APC) applications, traditionally performing advisory functions. But overtime the complexity of running an production processes grew, response to changes in the process became more important so advisory functions were taking over the task of the process operator immediately changing the setpoints using setpoint control or controlling the valve with direct digital control functions. There are plants today that if APC would fail, the production is no longer profitable or can’t reach the required product quality levels.
  • Finally there is the Business domain, generally not part of the ICS. in the business domain we mange for example the feed stock and products to produce. It is the decision domain.
Functional model of a chemical plant or refinery

The production management systems are shown in the light blue color, the operations management, information management, and application systems in the light purple color. The model seems to comply with the models of the ANSI / ISA-95 standard. However this model is far from perfect, because an operations management function as vibration monitoring, or corrosion monitoring also have sensors that connect to the physical system. And an asset management system requires information from the field equipment. If we consider a metering system, also part of the information function, it is actually a combination of a physical system and sensors.


And then of course we have all the issues around vendor support obligations. Asset owners expect systems with a very high level of reliability, vendors have to offer this level of reliability in a challenging real-time environment by testing and validating their systems and changes rigorously. Than there is nothing worse than suddenly having to integrate with the unknown, a 3rd party solution. As a result we see systems that have a level 2 function connected to level 3, in principle exposing a critical system more and making the functionality rely on more components so reducing reliability.


So many conflicts in the model, still it has been used for over 20 years, and it still helps us to guide us in securing systems today. If we take the same model and add some basic access security controls we get the following:

At high level this represents the architecture in use today, in smaller systems the firewall between level 2 and 3 might miss, some asset owners might prefer a DMZ with two firewalls (from different vendors to enforce diversity), the level 4A hosts those functions that interface with the control network and generally different policies are enforced for these systems than required at level 4B. For example asset owners might enforce the use of desktop clients, might limit Internet access for these clients, might restrict email to internal email only, etc. All to reduce the chance on compromising the ICS security.


Despite that the reference model is not supporting all of the new functions we have in plants today – for example I didn’t discuss wireless, virtualization, and system integration topics at level 2 – the reference model still helps structure the ICS. Is it impossible to add the IIoT function to this model? Let’s have a look at what IIoT requires?


In an IIoT environment we have a three level architecture:

  • The edge tier – The edge tier is for our discussion the most important, this is where the interface is created with the sensors and actuators. Sometimes existing sensors / actuators, but also new sensors and actuators.
  • The platform tier – The platform tier processes and transforms the data and controls the direction of the data flow. So from that point also of importance for the security of the ICS.
  • The enterprise tier – This tier hosts the application and business logic that needs to provide the value. It is external to the ICS, so as long as we have a secure end to end connection between the platform tier and the enterprise tier ICS security needs to trust that the systems and data in this tier are properly secured.

The question is can we reconcile these two architectures? The answer seems to be what is called the gateway-mediated edge. This is a device that aggregates and controls the data flows from all the sensors and actuators providing a single point of connection to the enterprise tier. This type of device also performs the necessary translation for communicating with the different protocols used in ICS. The benefits of such a gateway device is the scalability of the solution together with controlling the entry and exit of the data flows from a single controllable point. In this model some of the data processing is kept local, close to the edge, allowing controlled interfaces with different data sources such as Foundation Field Bus, Profibus, and HART enabled field equipment. To implement such an interface device doesn’t require to change the Purdue reference model in my opinion, it can make use of the same functional architecture. Additionally new developments such as the APL (Advanced Physical Layer) technology, the technology that will remove the I/O layer as we know it today, will fully align with this concept.


So I am far from convinced that we need a new model, also today the model doesn’t reflect all the data flows required by today’s functions. The very clear boundaries we used to have, have disappeared with the introduction of new technologies such as virtualization and wireless sensor networks. Centralized process computer systems of the 70s, have disappeared becoming decentralized over the last 30 years and are now moving back into centralized data centers where hardware and software have lose ties. Today these data centers are primarily onsite, but over time confidence will grow and larger chunks of the system will move to the cloud.

Despite all these technology changes, the hierarchical structure of the functions hasn’t change that much. It is the physical location of the functions that changes, undoubtedly demanding a different approach to cyber security. It are the cyber security standards of today that are dated on the day they are released. The PRM was never about security or networking, it is a hierarchical functional model for the relationship between functions which is as relevant today as it was 30 years ago.

We have a world of functional layers, and between these layers data is exchanged, so we have boundaries to protect. As long as we have bi-directional control over the data flows between the functional layers and keep control over where the data resides, we protect the overall function. If there is something we need to change it are the security standards, not the Purdue reference model. But we need to be careful with the security of these gateways as the recent OSI PI security risk has learned us.


There is no relationship between my opinions and publications in this blog and the views of my employer in whatever capacity.


Sinclair Koelemij

TRISIS revisited


For this blog I like to go back in time, to 2017 the year of the TRISIS attack against a chemical plant in Saudi Arabia. I don’t want to discuss the method of the attack (this has been done excellently in several reports) but want to focus on the potential consequences of the attack because I noticed that the actual threat is underestimated by many.

The subject of this blog was triggered after reading Joe Weiss’ blog on the US presidential executive order and noticing that some claims were made in the blog that are incorrect in my opinion. After checking what the Dragos report on TRISIS wrote on the subject, and noticing a similar underestimation of the dangers I decided to write this blog. Let’s start with summing up some statements made in Joe’s blog and the Dragos report that I like to challenge.


I start with quoting the part of Joe’s blog that starts with the sentence: “However, there has been little written about the DCS that would also have to be compromised. Compromising the process sensors feeding the SIS and BPCS could have led to the same goal without the complexity of compromising both the BPCS and SIS controller logic and the operator displays.” The color high lights are mine to emphasize the part I like to discuss.


The sentence seems to suggest (“also have to be compromised”) that the attacker would ultimately also have to attack the BPCS to be effective in an attempt to cause physical damage to the plant. For just tripping the plant by activating a shutdown action the attacker would not need to invest in the complexity of the TRISIS attack. Once gaining access to the control system at the level the attackers did, tens of easier to realize attack scenarios were available if only a shutdown was intended. The assumption that the attacker needs the BPCS and SIS together to cause physical damage is not correct, the SIS can cause physical damage to the plant all by it self. I will explain this later with a for safety engineers well known example of an emergency shutdown of a compressor.


Next I like to quote some conclusions in the (excellent) Dragos report on TRISIS. It starts at page 18 with:

Could This Attack Lead to Loss of Life?

Yes. BUT, not easily nor likely directly. Just because a safety system’s security is compromised does not mean it’s safety function is. A system can still fail-safe, and it has performed its function. However, TRISIS has the capability to change the logic on the final control element and thus could reasonably be leveraged to change set points that would be required for keeping the process in a safe condition. TRISIS would likely not directly lead to an unsafe condition but through its modifying of a system could deny the intended safety functionality when it is needed. Dragos has no intelligence to support any such event occurred in the victim environment to compromise safety when it was needed.


The conclusion that the attack could not likely lead to the loss of life, is in my opinion not a correct conclusion and shows the same underestimation as made by Joe. As far as I am aware the part of the modified logic has never been published (hopefully someone did analyze) so the scenario I am going to sketch is just guessing a potential objective. It is what is called a cyber security hazard, it could have occurred under the right conditions for many production systems including the one in Saudi Arabia. So let’s start with explaining how shutdown mechanisms in combination with safety instrumented systems (SIS) work, and why some of the cyber security hazards related to SIS can actually lead to significant physical damage and potential loss of life.


A SIS has different functions like I explained in my earlier blogs. A little bit simplified summary, there is a preventative protection layer the Emergency Shutdown System (ESD) and there is a mitigative layer, e.g. the Fire & Gas system detecting fire or gas release and activating actions to extinguish fires and to alert for toxic gases. For our discussion I focus on the ESD function, but interesting scenarios also exist for F&G.

The purpose of the ESD system is to monitor process safety parameters and initiate a shutdown of the process system and/or the utilities if these parameters deviate from normal conditions. A shutdown function is a sequence of actions, opening valves, closing valves, stopping pumps and compressors, routing gases to the flare, etc. These actions need to be done in a certain sequence and within a certain time window, if someone has access to this logic and modifies the logic this can have very serious consequences. I almost would say, it always has very serious consequences because the plant contains a huge amount of energy (pressure, temperatures, rotational speed, product flow) that needs to be brought to a safe (de-energized) state in a very short amount of time, releasing incredible powers. If an attacker is capable of tampering with this shutdown process serious accidents will occur.


Let’s discuss this scenario in more detail in the context of a centrifugal compressor, most plants have multiple so always an interesting target for the “OT certified” threat actor. Centrifugal compressors increase the kinetic energy of for example a gas into a pressure so a gas flow through pipelines is created either to transfer a product through the various stages of the production process or perhaps to create energy for opening / closing pneumatic driven valves.

Transient operations, for example the start-up and shutdown of process equipment, always have dangers that need to be addressed. An emergency shutdown because there occurred in the plant a condition that demanded the SIS to transfer the plant to a safe state, is such a transient operation. But in this case unplanned and in principle fully automated, no process operator to guard the process and correct where needed. The human factor is not considered a very reliable factor in functional safety and is often just too slow. SIS on the other hand is reliable, the redundancy and the continuous diagnostic checks all warrant a very low failure on demand probability for SIL 2 and SIL 3 installations. They are designed to perform when needed, no excuses allowed. But this is only so if the program logic is not tampered with, the sequence of actions must be performed as designed and is systematically tested after each change.

Compressors are controlled by compressor control systems (CCS), one of the many sub-systems in an ICS. The main task of a compressor control system is anti surge control. The surge phenomenon in a centrifugal compressor is a complete breakdown and reversal of the flow through the compressor. A surge causes physical damage to the compressor and pipelines because of the huge forces released if a surge occurs. Depending on the gas this can also lead to explosions and loss of containment.

An anti surge controller of the CCS continuously calculates the surge limit (which is dynamic) and controls the compressor operation to stay away from this point of danger. This all works fine during normal operation, however when an emergency shutdown occurs the basic anti surge control provided by the CCS has shown to be insufficient to prevent a surge. In order to improve the response and prevent a surge, the process engineer has two design options called a hot bypass method or a cold bypass method recycling the gas to allow for a more gradual shutdown. The hot bypass is mostly used because of its closeness to the compressor which results into a more direct response. Such a hot bypass method requires to open some valves to feed the gas back to the compressor, this action is implemented as a task of the ESD function. The quantity of gas that can be recycled has a limit, so it is not just opening the bypass valve to 100% but opening it with the right amount. Errors in this process or a too slow reaction would easily result into a surge, damaging the compressor, potentially rupturing connected pipes, causing loss of containment, perhaps resulting in fire and explosions, and potentially resulting in casualties and a long production stop with high repair cost.


All of this is under control of the logic solver application part of the SIS. If the TRISIS attacker’s would have succeeded into loading altered application logic, they would have been capable of causing physical damage to the production installation, damage that could have caused loss of life.

So my conclusion differs a bit, an attack on a SIS can lead to physical damage when the logic is altered, which can result in loss of life. A few changes in the logic and the initiation of the shutdown action would have been enough to accomplish this.


This is just one example of a cyber security hazard in a plant, multiple examples exist showing that the SIS by itself can cause serious incidents. But this blog is not supposed to be a training for certified OT cyber terrorist so I keep it with this for safety engineers well known example.

Proper cyber security requires proper cyber security hazard identification and hazard risk analysis. This has too little focus and is sometimes executed at a level of detail insufficient to identify the real risks in a plant.

I don’t want to criticize the work of others, but do want to emphasize that OT security is a specialism not a variation on IT security. ISA published a book “Security PHA Review” written by Edwar Marsazal and Jim MgClone which addresses the subject of secure safety systems in a for me far too simplified manner by basically focusing on an analysis / review of the process safety hazop sheet to identify cyber related hazards.

The process safety hazop doesn’t contain the level of detail required for a proper analysis, neither does the process safety hazop process assume malicious intent. One valve may fail, but multiple valves at the same time in a specific sequence is very unlikely and not considered. While these options are fully open to the threat actor with a plan.

Proper risk analysis starts with identifying the systems and sub-systems of an ICS, than identifying cyber security hazards in these systems, identifying which functional deviations can result from these hazards, and than translate how these functional deviations can impact the production system. That is much more than a review of a process hazop sheet on “hackable” safeguards and causes. That type of security risk analysis requires OT security specialists with a detailed knowledge on how these systems work, what their functional tasks are, and the imagination and dose of badness to manipulate these systems in a way that is beneficial for an attacker .

Sinclair Koelemij