2021. What an interesting year. With the world turned upside down by a pandemic that seemingly had its sights set on...
Why You Must Harden Your NonStop DevOps Processes
I normally begin my articles with a story. This time, the story is too painful to tell or to read, but you can just search on “malware”, “breech”, and “lawsuit” to know what the story is.
DevOps and DevSecOps have been hot topics for the past few years with good reason. Putting a sound architecture for Continuous Integration (CI), Continuous Testing (CT), and Continuous Deployment (CD) pays off very clearly in reducing time to market, improving auditability, and reducing error rates. DevOps lets all stakeholders essentially speak the same language using the same technology in the pipeline. Operations gets the benefit of what Development does, and vice-versa; which also includes anyone in between.
There is one key wrinkle in this sharing of processes and technologies that has come to light and that relates to the use of common underlying hardware technology, as an example, the L-series and vNonStop x86 platform. Don’t get me wrong here, I’m not complaining. Costs are reduced, code generally ports more consistently, management is somewhat easier, so that’s all good. Having a shared instruction set reduces the cost of building code generators, which is also good. And we’re getting close an affordable personal NonStop (not yet though). But then, it also increases the reach of snippets of malware that can infect your code.
Here’s the nightmare scenario, assuming almost everything is done properly, of course:
- A developer makes a code change and unit tests it – it passes.
- They then commit and push their code to their git enterprise server and create a Pull/Merge request.
- The code is reviewed, approved, and merged into the integration branch. So far, so good.
- The build engine picks it up, builds it and puts the results in a repository with signatures.
- The code is tested, and the tests pass. Still good?
- The release is scheduled.
- The release is pulled from the repository and installed.
- And now we have a data breach. Wait what?
We have a simple but sound DevOps process, where everything looks on the up-and-up, check? Everything was reviewed and signed, and we can even verify the release signatures in production, so what is the problem here? What is the infection vector?
Let’s ignore all the things that humans have a direct hand in verifying in the process. That eliminates everything but step 4. But wait, that’s just the build phase. Exactly.
In this scenario, the build machine someone’s NSDEE workstation, not hardened, or pristine.
Using the legacy-style controls, when our NonStops ran on proprietary hardware, the build machines were dedicated proprietary machines that had clear separation of regions and security rules – typically our QA machines. Nothing was installed without explicit permission. The above scenario would not occur.
When cross compilation entered the mix, the instruction sets were sufficiently different that it was difficult, but possible to write targeted malware that could inject platform-specific code, but that was not easy. The path to infect proprietary objects was long and arduous. I don’t recall a case where it happened on our platform. But then, most organizations ran their release builds on their legacy machines, so the vulnerability disappeared.
Hardening is usually the process of securing a system by reducing its surface of vulnerability, which is larger when a system performs more functions; in principle a single-function system is more secure than a multipurpose one.[Wikipedia]
But now we have a very different situation. Suppose the build machine is someone’s desktop that drives a NSDEE build either manually or with a local Jenkins running. Or suppose it was a server that developers could access over a network share. The C cross-compiler’s code generator, being x86, could be infected with x86-based malware. How it happened would be a different topic, but spear-phishing is a typical method. Without knowing it, the build system is actually infecting signed production objects because the servers are not hardened. Production controls would not catch this situation. You could, of course, argue that the code infected by the malware would not work on the target platform, and you might (only might) be correct. You might also be very wrong, if the malware was targeted – an increasingly common situation.
What about virtual machines (VM) for your build machines? Just spin one up as needed. It’s a pristine and secure sandbox, right? Well, not necessarily. VMs are just software regions running on top of a specialized operating system, a hypervisor, that itself has access to files on the box. Each VM (let’s say for example, Ubuntu Linux) is protected from the others, but what protects the hypervisor itself (perhaps Gentoo) or a Docker container? If access to that system is not very well secured, then it could get infected like any other computer, and all VMs could then be infected as well, without anyone knowing it. So, don’t browse from your VM console.
Is there a guaranteed solution? Not really, but you can get close. Obviously, we trust the contents of a SUT coming from HPE, but that itself may be compromised if they’re not careful. We have lawyers to deal with that situation. But putting a concrete wall around your build machine and repositories is the best mitigator we have seen so far. This practice is called hardening, but there are no one-size-fits all products for this protection. You are building a biohazard shelter and putting the engine that builds your release assets, and the machine that stores your release assets, in that shelter, with a big giant hands-off sign.
DAS KOMPUTERMASCHINE IST NICHT FÜR DER GEFINGERPOKEN UND MITTENGRABEN!
We still prefer doing the final builds on NonStop, whether in GUARDIAN or OSS, to protect the integrity of the build from malware, which is still much harder to get through NonStop security (SAFEGUARD, OSS ACLs). But if you must, protect your build machine as best as you can. It is your responsibility to ensure that your DevOps pipeline has no infection vector that people who create malware can exploit. A lot of paranoia goes a long way in this business. And whatever you do, do not let developers or any human for that matter create the objects that get to production. You don’t know where their computers have been.