By Lesley Carhart, Daniel Michaud-Soucy, and Reid Wightman

Industrial controls system security can be a daunting topic to learn about. We recently took to Twitter to ask the information security and IT communities what their burning questions about the topic were. In this blog, we selected and summarized 10 intriguing and popular questions. We’ve provided multiple perspectives on each one, based on our own experiences and our daily work at Dragos.

Today’s questions were answered by Lesley and Daniel (Principal Threat Analysts), and Reid (Senior Vulnerability Researcher). If you have additional questions about ICS security we didn’t answer in today’s blog, feel free to tweet us at @DragosInc!

1. What are the key differences between IT and OT security operations skill sets? (Thanks @netbroom)

Reid: Primarily: protocols, and objectives. On the protocol side, most control systems use old protocols which lack basic security features. A lot of those issues become moot, however, when we consider objectives. For IT systems, typical attacker objectives are for data. For OT systems, data is less interesting, physical impact is the goal. So learning how (and especially IF) physical impacts are possible via a control system compromise is a key question. Answering that question often requires putting on an engineer’s hat.

Lesley: Two other noteworthy distinctions are the commonness of legacy equipment, and a lack of host-based security tools. With regards to the former, even familiar Windows systems must frequently remain in operation far longer in ICS environments than their IT counterparts. Maintenance windows may be infrequent. Vendor software may only support one specific patch level, or the device’s operating system may be embedded. This means it’s common to come across hosts which are unpatched and/or out of support and which absolutely cannot be upgraded in the near term. Detection and external mitigations become key.

Secondly, in IT security, we typically have the luxury of modern host security tools like Endpoint Detection and Response (EDR). In most ICS environments, passive monitoring policies, legacy systems, and extremely lightweight embedded devices make this difficult at best. Security team members have to be able to function without many of the ubiquitous time-saving tools they’ve come to rely on over the last decade.

Daniel: Working within the bounds of operations and the mission of the business. The OT folks are the ultimate rulers of what happens in their environment and will understandably prioritize process uptime, reliability thus sometimes sacrifice certain IT style functions in doing so. Often, we’ve had success in demonstrating the operational upside of a security tool or process first, then the security benefits. For example, identifying PLC misconfigurations rather than actually finding bad guys. Overall, understanding that the process is paramount and how you can help operators do their jobs is a big one.

2. Why have I been told not to scan ICS environments during assessments, are they really that fragile, and is there a way to do routine assessments safely? (Thanks @jurph, @calderpwn, @Jontler, @tab2space, @Jay3141592653)

Reid: Modern control systems are pretty resilient against basic scanning. You still shouldn’t scan a system without knowing what is on the network, though: old PLCs, old computers, and ancient software are around, sometimes. If you want to scan online systems, two methods work well: 1) use authenticated scanning for PC systems, and 2) look to systems with redundancy or failover, and scan the backup system. If a system is both very old and has no redundancy/hot spare, it probably isn’t smart to scan it — but that itself is an assessment finding: it represents a single point of failure for the network that is both required for operations and fragile with no recovery path.

Daniel: In addition, keep in mind the goal of your scan. Is it to identify all ports and services? Is it to look for misconfigured applications? Tune your scan to the specific goal, look for specific things rather than casting a wide net. Understand what the scanner does on the network, what type of traffic it puts out, what potential load it can have on a device with a weak network stack. Wireshark is very useful for this. If your goal is asset identification, it may be beneficial to use an ARP sweep approach which puts very little strain on the network infrastructure. Also, keep clear lines of communication with operations: they should know what is going on, when it starts, when it should end, from what IP address, potential risks, how to reach you, etc..

Lesley: Open communication is key. There will be lots of scenarios where you are absolutely forbidden from scanning an operational ICS network. That doesn’t mean you can’t actively assess the systems at all. Discuss options like scanning engineering lab environments, or conducting scans while systems are offline for unrelated maintenance. Have clear and open dialogue with system operators prevents nasty surprises and involves them in the security process, instead of security being something that happens unpleasantly to them.

3. Should my security team personnel be learning and performing both IT and OT security? (Thanks @MrHenryBosch)

Lesley: In an ideal world, a security team with substantial ICS assets would have dedicated specialists in industrial security. In reality, this is a fairly rare skill set, so it’s more realistic to keep ICS security subject matter experts on retainer while cross-training IT security personnel in ICS systems operation as much as possible. It’s equally important to build good relationships between your ICS system engineers / operators and the security team. Understand that their objectives and concerns are not necessarily yours and be as diplomatic as possible in discussing security measures.

Daniel: I had a good discussion just last night about this with a friend who is responsible for the support of transmission operations at a large utility company. His team supports the network, the server hardware, the operating systems, as well as the EMS application itself. There are tasks that don’t require strong OT knowledge like configuring network switches and updating Windows servers. However, the closer you get to the EMS application and the process itself, the more it is important to understand how the application interfaces with the devices, process and the real-world physical components.

4. What ICS threats and specific verticals keep you up at night, and why? (Thanks @fladamd)

Lesley: There are a lot to choose from, but I pick transportation logistics. Our modern society is dependent on food, fuel, medicine, and other critical resources transparently getting across the planet just-in-time and in the correct quantities. Most metropolitan areas are not close to self-sufficient, and air, rail, truck, and sea transportation have become highly automated. The Maersk NotPetya incident was a great example of the cascading chain reaction that a failure in just one logistic system can cause. There are plenty of non-cyber examples of how dire the impacts of fuel and food shortages can become.

Daniel: This is somewhat of a common answer but we take the water industry for granted. Water treatment plants are some of the most out-of-date and under-funded systems we’ve seen from a cybersecurity perspective. Thankfully, the pride in delivering good, clean drinking water keeps the systems running as good as they are. I hope to be able to help secure more and more of these types of industrial processes by providing low-cost/effort solutions for asset owners to take on as a “better than nothing” step.

As well, any threats acted upon that would result in a huge knee-jerk reaction from a regulation and compliance perspective. This would effectively strangle the industry and potentially, in the long run, cause more headaches and pain than the initial process impacts.

5. I want to learn more about ICS security. How can I do this, particularly in a way that isn’t cost-prohibitive? (Thanks @r4v5, @julesjblanco, @neurovagrant, @FrankMcG, @D4rthDelirium)

Daniel: Let’s keep it free: YouTube! Both S4 and CS3STHLM have large playlists of recorded talks from previous iterations of their conferences. There are a ton of topics covered from hardware hacking to geopolitics to overviews of recent attacks against ICS infrastructure. Grab some popcorn, pick a particular area of interest and enjoy the show.

If you work for an asset owner on the IT side of the house and you’re interested in ICS, take a box of the donuts to the control room. Interface with the operators, ask them about their daily routine, what issues they face, what causes friction for them. You’ll certainly learn a lot about the physical process but also start to build a relationship with them, which is invaluable in a lot of ways.

Another way is to download a PLC simulation software and play around with it. Read up on ladder logic, design it in the software, see how it works. Play with the virtual I/O. Try to break things. 🙂

Reid: Some more resources that can help you get started for cheap: The Raspberry Pi and “PiFace Digital IO” adapter can be pretty nice. There are free ladder logic programming tools for this setup, including the CoDeSys and ProConOs ladder logic runtimes–basically your Raspberry Pi will be a very inexpensive PLC so you can learn how PLC programming works. You can also play with free HMI software: there are node-red (JavaScript) modules for writing very quick and dirty Modbus HMIs. There are also some free HMIs that speak Modbus and other protocols like ENIP, Integraxor and VTScada are two popular ones for learning (although, I don’t see either of them much in production).  With these you can build yourself a little controller with engineering workstation, and HMI, the core components of any operation, and at least see what the protocols look like.

If you have a little more money, Rockwell’s Micrologix 1100 and Schneider Electric’s Modicon M221 PLCs are great. Both are pretty cheap on eBay, and both can be programmed with free software from the vendor. Note that you might find other models of PLC on ebay and other surplus sites for cheap, too, but you have to do your research on how they are programmed — a lot of times you’ll find PLCs require special software for programming them, and that software might be a few thousand dollars. So, find PLCs that can be programmed with free software, or at least demo software, before spending money on the PLC itself.

6. What does ICS security monitoring actually entail? What traffic and devices may be monitored, and how is this typically done? (Thanks @whereIsTheSpai, @Sentry_23)

Lesley: We frequently discuss ICS environments in terms of the Purdue Model. While this model is by no means perfect, it is a good way to understand how these systems (and consequently their monitoring) are laid out. At the lowest levels of the model, there are devices which send simple digital and analog electrical signals such as sensors and controllers. As the level increases upwards to distributed management and then operator systems, the devices involved use more complex protocols and IT technologies are more prevalent. A single ICS network can use several protocols, and security events can hypothetically occur in any.

Monitoring and security at higher levels looks fairly familiar–lots of Windows systems are in use, plus TCP/IP-encapsulated protocols. So, each “level” requires substantially different security approaches and technologies. Almost all of this monitoring has to be done passively, and use of host monitoring outside of basic logging typically only occurs at higher levels.

Daniel: A good strategy to get started is understanding what data sources are available. What devices generate logs and other valuable information in my environment? Human-machine interface (HMI) Windows event logs, firewall logs, process data historian alarms, Syslog data from remote terminal units (RTUs), etc. are good examples of data sources to look for. Next, understanding how long the data is stored. Historian data often goes back several months if not years. It’s also possible that our RTU Syslog data is simply streamed and not captured in a centralized location for any length of time. The last step has to do with, in the context of detection and response, understanding what questions the data we collect can answer. Does the data I collect allow me to confirm if a third party vendor accessed my environment a month ago? Can I correlate this activity with configuration changes on my RTU? Was there a service ticket open for this change? This strategy can help identify gaps in visibility and prepare the organization for any events as well as proactive engagements.

7. What’s the deal with air gapping? Is it effective, is it in use, and how are adversaries bypassing it? (Thanks @R3dT0p, @mattdkerr)

Reid: In practice, air gaps only exist on secure government networks and nuclear systems. Control systems networks always have some sort of external connection, whether it is emergency remote access for engineers, or production data that is moved out to a company’s financial team. Intrusions in the ICS space have relied upon the remote the access that engineers use.

Lesley: “Air-gapping” is a constantly misused term, especially in the era of switch VLANs. In most commercial environments we see some degree of DMZ between the ICS network and the corporate network and internet, but the hardening of this really varies. Sometimes there are a plethora of firewall exceptions and remote access methods permitted through. It’s easy to hate on anything that breaks segmentation as a security person, but operators sometimes truly have a legitimate need to access systems remotely. It’s always a measured trade-off between risk and security. Smart adversaries frequently abuse these authorized breaks in air-gapping and segmentation, so there’s a serious need to monitor and control them.

Then there are the environments with next-to-no segmentation at all, and those are a big problem.

Daniel: Myth!

8. How do I learn how an industrial system in my environment functions when I don’t have any useful documentation? (Thanks @DaveSec3)

Lesley: Complicated problem. Some vendors post documentation freely available on the internet, but others restrict it to a customer portal. Yet further systems are legacy and no longer supported. A bit of creative Google searching for manuals may or may not help. I’d say that foremost, building good relationships with senior ICS operators and engineers is really crucial. Shadow them if you can, and try to fully understand the high level processes being performed. Ask questions like, “what would your worst day ever look like,” and “what system conditions could cause it?”

Daniel: Start with understanding the industrial process. Whether oil refining, electric transmission, pharmaceutical manufacturing or water treatment, I think that understanding the steps taken in that mission is important. It’ll give you an idea for the types of systems, devices, sensors, and actuators that are in use. Once you get a grasp on that, think about the systems that read data from and control those systems. For example, if you want to measure the pressure in a pipeline, you can have someone read a pressure sensor along the pipe. What if your pipeline stretches from Mexico to Canada? Well, that approach won’t work, so you need a system of systems to gather the information and report it. Now, how do these systems communicate? Industrial protocols like Modbus or DNP3. Along the way, the building blocks that make up your industrial environment should come together and start to make sense. The engineers and operators are incredibly knowledgeable folks who can help you understand the industrial process they run on a day-to-day basis (and maybe even give you a tour!) 

9. What questions should a disaster recovery or emergency management professional ask about the ICS devices in their environment? (Thanks @Tegyrius)

Lesley: Ask for a high-level, functional overview of the processes involved in the industrial environment. Try to gain an understanding of how those processes fit into one another and into your organizational mission. Ask operators about failure points in the process–what could cause steps down the line to fail? Are those points monitored, and how? Are they redundant? From a technology perspective, ask the age of the devices in the ICS network, whether they are still under any warranty or support contract, and how they are maintained, replaced, and updated. While the ICS lifecycle is typically much longer than the IT lifecycle, there should still be fleshed-out plans for replacing failed equipment, maintaining support, and upgrading the system with current security patches or mitigations.

For example, Windows XP in ICS is still Windows XP, and the security risk it poses should be mitigated through an upgrade (ideally) or through isolation and security monitoring (more commonly).

10. Is there a way cybersecurity academics or researchers can assist in the ICS space? (Thanks @Noura_7N)

Lesley: ICS is a broad space full of many different pieces of equipment from various vendors and eras. So yes, there’s tons of potential for research! I’d point to the lack of publicly-available tools to perform forensic analysis (memory, storage, configuration, or logs) on lower-level devices like controllers that are not running consumer operating systems. There are also gaps in protocol dissectors for more obscure ICS protocols in various verticals (although quite a few common ones are built into Wireshark).

Ethically, I’d steer clear from publicly-disclosed research into exploits impacting critical infrastructure systems. While it can certainly be beneficial to conduct this research in labs and report vulnerabilities to vendors, the lifecycle for deploying mitigations in many operational environments is pretty long (or nonexistent for legacy devices!), and public disclosure could potentially pose a lot of risk to society for years to come. You’re dealing with clean water reaching houses and equipment operators not dying, not IoT light bulbs.