The slides in this presentation are available for download at bit.ly/cATO-slides
I’m Bryon Kroger, CEO of Rise8. Today, I’m going to be talking to you about one of our favorite topics at Rise8. Continuous ATO or as we like to call it, continuous risk management framework under ongoing authorization.
I’ll start by giving you a brief history of continuous authority to operate, an overview of some of the processes and technologies used to achieve continuous RMF, and then a recap of the most important points so you can get started in your own organization.
So your first question might be why am I talking to you about this? Well, I was one of the three co-founders of Kessel Run and served as COO for the first two years, scaling from one small team to over 500 people. In that role, I oversaw acquisitions, platform and application development, which gave me a great vantage point for how to build a software factory and how all three elements have to be in sync. But it also meant that when we were young and scrappy, I had to do the next most valuable thing that nobody else was doing. Continuous ATO was one of those things.
I stood on the shoulders of giants and had a ton of help from my team and from external teams and individual contributors. We implemented ongoing authorization for continuous delivery, wrote the playbook, pitched the idea to the new Cyber Innovation Office, led by Lauren, and operated under the first and only continuous ATO for the next year and a half.
I left Kessel Run two years ago and founded Rise8, where I’m helping other factories like the Space Force’s Kobayashi Maru, implement what I now call cRMF. Our team brings deep knowledge of cloud security, automation, and compliance in both the private and public sectors.
Today, I will bring together the history, expertise, and lessons learned to provide you with a comprehensive overview of continuous ATO.
Now I mentioned standing on the shoulders of giants quick plug for this talk, titled When DevOps Meets Regulation: Integrating Continuous with Government from Jez Humble, who wrote the book on continuous delivery literally, among other high profile books and studies in the DevOps community. He served under the newly formed 18F and was instrumental in what was one of the first high profile implementation of cloud common controls inheritance. This was an inspiration for my vision for continuous ATO and where the name comes from.
But he jokes about how shipping software isn’t rocket science, but the government has a digital launch checklist that could just as well be for space launches. But it gives a great overview of how they overcame this and built cloud.gov on open source Cloud Foundry and got push button deployments with a small number of application controls and templates for compliance documents. But it wasn’t perfect, and it still wasn’t continuous. There was a long lead time to assess and authorize systems, so my team and I dug into that digital launch checklist, particularly RMF, to see if we could make it continuous.
Before I talk about that, it’s worth emphasizing that continuous delivery isn’t a new DevOps thing. Not to mention, DevOps isn’t that new anymore. Continuous delivery comes from the Agile Manifesto principles. We can’t say we’re truly agile if we aren’t able to deliver continuously, but continuous delivery doesn’t mean we sacrifice quality and security. In fact, doing so is continuous breakage, and any delivery velocity will quickly fade.
Continuous delivery isn’t about accepting more risk, either. It’s actually an exercise in risk reduction. So we asked what if we could get valuable software released on demand with higher quality and reduce risk? You can! And the barrier wasn’t RMF.
I used to say it was a culture problem, but more on that in a minute. Bottom line: RMF makes it clear that you have significant flexibility in how the steps and tasks are carried out and that you should make it as effective, efficient, and cost effective as possible. It specifies that you can tailor control selection itself, and it makes this flexibility clear over and over again, including the use of automation across all steps, making it indistinguishable from the SDLC, and even a list of tips for streamlining, like using common controls, using cloud based systems, organizationally defined controls, re-use of artifacts and reducing complexity of the system itself and many more.
Now, as I said, I used to call this a culture problem where people didn’t want to accept risk. This isn’t about risk tolerance, though. The way we’re doing things today actually creates risk through cost of delay. Time is the largest vulnerability in our systems. People should be healthily risk averse, in my opinion. But any truly risk averse person would change this process. Fear of change is actually what’s stopping people or for any Dune fans out there, fear is the mind killer, but somebody somewhere has probably already done it.
I talked about 18F, but another org had taken some of that same work and made it more awesome. National Geospatial Intelligence Agency, or NGA, was marketing it as ATO in a day. They used the commercial version of Cloud Foundry PCF and provided common controls to all of their customers up through the application layer, much like 18F, but with even more controls and demonstrated on secret and top secret networks and their process was a bit better. They pulled categorization, implementation, and automated testing into the development phase, but control assessment was still done in an ATO in a day event that had to be scheduled months out. And if there was feedback, you had to wait until the next event. So it was serial and the assessment feedback loop was still not ideal. A big improvement, but we needed more.
In talking to those folks, the vision was there, but ultimately the assessors and authorizers didn’t trust developers, program leads, and even the documentation to an extent, and I can’t blame them since, looking back, they weren’t involved in any aspect of the SDLC until it was submitted for assessment and authorization, with rare exceptions. They also expressed that they weren’t staffed or resourced to do so. So we set out to address those concerns in the Kessel Run model.
Now, everybody immediately wants to jump to “trust the process”. In fact, you may have heard people refer to cATO as certifying the process and the people, even senior leaders. I have a difference of opinion. Our vision at the practitioner level was to bring together people, process, and technology and instill trust through involvement and transparency.
Now I call this my policy math slide. The idea is to identify all of the things that you need to make RMF compliant ATO happen in a DevOps paradigm. Then create trust and transparency around each. We’ve talked about inheritance, so I’ll just briefly recap here. The infrastructure and platform as a service are a prerequisite for DevOps outcomes in a high compliance organization. They provide the structure and opinionation that reduce total cost of ownership, reduce organizational and technical complexity, and focus effort above the value line. It’s worth noting that I still stand behind the decision to use PCF in 2017, despite the naysaying that has happened since. That structure and opinionation is why we were able to do what we did, not just with continuous ATO, but with capability development in general. If I had to do it all over again, I’d make the same choice.
Now, if I were to make that decision today, I would consider similar commercial offerings in the Kubernetes space like D2iQ, Tanzu, OpenShift, or others for most DoD problem sets. Now, the cost of subscriptions is only one element. You have to pay attention to the total cost of ownership. I’ve watched several organizations try to DIY the PaaS layer and the total cost of ownership, particularly FTE, for data operations and ongoing development, eats their lunch and dwarfs subscription costs of even expensive options like PCF.
Now, with a true PaaS in place, we free up resources and focus to actually be able to achieve the often elusive shift left on compliance and security. We employ test driven development, cyber-driven development, and user-centered design, all brought together in a centralized CI/CD pipeline that is government-owned and immutable to buy down risks, not just to cybersecurity, but also to quality. This allows us to confidently ship code continuously with high quality and reduce risk. That cyber driven development will be the focus of most of what I talk about today, where we instill trust in the process through involvement and transparency.
We actually brought security controls assessment representatives, or SCARS, into Kessel Run on a permanent basis, but they retain their independence as direct reports to the security controls assessor or SCA. They begin meeting with the teams as soon as the SDLC begins, as I’ll discuss. But as I said, it isn’t just process. We have to instill trust in the people and the enabling tech.
All our teams were government led teams in a government led facility. Every team was a balanced team consisting of product management, product design and product development, which creates a separation of concerns that are always in healthy tension. We trained those teams by pairing with them full-time 40 hours a week with industry until they were mature enough to operate on their own while retaining the pairing aspect. Developers were trained in extreme programing, which places heavy emphasis on test driven development, where before a single line of code is written, a test is written first. It also emphasizes paired programing, which creates two person integrity that we can enforce through control measures on commit and meet some code review requirements.
Further, we swap pairs daily with rare exceptions, so there is often more than two person integrity on any line of code and also no knowledge silos or lost knowledge when team members turn over, all of which are risk reduction mechanisms and help instill trust in assessors that developers are actually following the process. Not only does this ensure code quality, but it also helps reduce the amount of code written and keeps tech debt down by enabling low risk refactoring. This reduction in sprawl reduces attack surfaces up front and on an ongoing basis as functions that are no longer needed are deleted.
Finally, we implemented a plethora of technology that was innovative at the time, but you see all over the place now. Unfortunately, the technology has started to become almost the sole focus of the continuous ATO conversation and pipelines in particular. Pipelines aren’t magic, and they aren’t even half of the story. But how the tech enables the people in the process is very important. I’ll talk about some of these in particular, like SD Elements, but the key thing to note is that we gave our SCARS full access to backlogs, repos, scanning tool rule sets, dashboards like Thread Fix and administrator control over the security requirements management function in SD Elements. SD Elements also has training modules for each control implementation to reinforce the people aspect as it enables the process.
So ultimately, in April of 2018, we got the factory of people, process, and technology signed off with a continuous ATO. Notice that actually references ongoing authorization, not the cATO buzzword. And sure, it references a pipeline, but in reference to enabling feedback between security and dev teams – not as a magic bullet. It focuses instead on NIST 800 and supplemental guidance. It also sets forth the conditions for ongoing authorization, which are quite robust. Annual pen testing, continued SCAR involvement, scar real time access to all scanned results, vulnerability mitigation, and also leadership responsibilities around enabling teams through education and training.
So now I’d like to dive a little deeper into how we actually implemented RMF in a continuous fashion. It starts with controls inheritance. PaaS is a prerequisite for DevOps outcomes in the federal space. In today’s landscape, you hear a lot about Kubernetes. Kubernetes is not a PaaS. It is a platform for building platforms with Kubernetes on an infrastructure as a service you satisfy very few controls compared to something like what Kessel Run or Kobayashi Maru have implemented. Those PaaS implementations place less compliance burden on app teams so they can focus on releasing features.
Here’s a look at the overall process again. This time, I’ll step you through each. First, we start with security education. SCARS would meet with each team and give them a brief, much like this one after inception, required training, and overview of process and technology. The following week, we would cover the required documentation and I know what you’re thinking. How do you maintain ATO documentation? Well, I’d like to get away from eMass in the long run, but many of our clients still require it. But we go above and beyond with tools like SD Elements and well known risk and threat aggregators.
Now what is needed for an assessment? These documents provide several things. They are the body of evidence that was requested directly from our assessment teams and our authorizing official. You’re likely familiar with all of these. Like the plan of action, architecture diagrams…I’m showing you actual artifacts here just to give you an idea.
Then we have categorization and implementation. It starts with an hour meeting to assign the assessor and do other typical tasks for this phase. Traditional RMF categorizes systems according to CIA levels. Once you input your data types, you should understand how your system should be categorized. Once it’s categorized, controls are selected based on that categorization. We translated the application specific controls to OWASP ASVS tasks and then use ASBS levels to categorize the applications. We manage all of those tasks with a third party software called SD Elements. This software helps us be more effective and creates traceability across the entire SDLC for security controls. It’s really powerful. It starts with a project survey, and this is one reason why we ask for an architecture diagram. It makes this process much easier. This survey takes the teams’ inputs and pushes those ASVS tasks that are associated with the application into the application developers backlog. Those are then pushed directly into team backlogs. In this case, you see them in a team’s tracker backlog. We use a prescribed format for those comments to make it easier to review and approve. The SCAR has to mark those items complete.
Here you see an actual back and forth between a SCAR and the developer, something you don’t see every day. Now we get to the part that everyone usually focuses on: the pipeline. I’ll just note that without all of the things we’ve discussed so far, the pipeline doesn’t really answer the mail on RMF. The pipeline we used did security scans using our sonar cube, fortify, thread fix and even SD elements we check for dependencies, vulnerabilities and overall code coverage. The pipeline also performed unit, journey and integration testing, and enforced certain release engineering protocols.
Here’s an approximation of what the Kessel Run MVP pipeline looked like. I’m not going to go in-depth on this today. As I said, I think the pipeline is the table steaks, not the magic sauce. Throughout initial development, the security team does periodic control and scan reviews. The emphasis is on helping teams get to ready.
And finally, we get to the actual SAR. This is developed from all the previously gathered data from scans and controls implementation, and once all concerns are addressed, it is signed and teams have a thumbs up to go to prod.
OK, so to recap, start with common controls up through the PaaS layer, it’s a prerequisite for DevOps outcomes. Also, data capabilities around rotating, repaving and repairing at this layer are critical to high security and zero trust that allows you to move beyond security at release only and start shifting left where we can start applying controls, including scans, at every build, during development, and even during our hiring, onboarding and training processes for new team members. Then and only then can you truly enable continuous compliance on release and enable continuous monitoring, enabling a continuously secure quality software system in production.
Now you’ll need some policy math. First, understand what infrastructure and platform as a service bring to the table in terms of controls inheritance, configuration control, and risk reduction. High controls inheritance lowers the burden placed on application development teams and frees up time and focus for them to shift left on two really important items: testing and compliance.
We recommend test-driven development to the maximum extent practical where tests are written before code. We like using pairing when possible as well, which bakes in code reviews and two person integrity when executed well. We also pull compliance left by injecting controls. The app team is responsible for it directly into their backlog, providing a traceable means for assessors to spot check. And in utilizing user-centered design, we also ensure that the code that we write is only the code that we need.
Two really important closing points:
- Continuous ATO is technology-neutral, and any authorizing official can authorize it under ongoing authorization. This NIST 800-53 quote makes that very, very clear.
- And finally, this topic is great, but everyone has to pull their weight. Cyber is not the bottleneck. It’s just often the convenient scapegoat to enable what I’ve talked to you about today. You have to hit all aspects of JCIDS or the requirements, process planning, programing, budgeting and the acquisition system across functional areas of platform application and procurement. And it just opens up a whole new set of problems around how we generate and resource requirements, then validate what’s actually produced a result.
We’re helping our customers overcome these problems and have solutions for your organization. Visit our website and get in touch. We can’t wait to speak with you.