Software Delivery Performance and Mission Success

Summary:

Dive into the world of software delivery performance with Nathen Harvey's insightful talk, as he explores the vital role of software in driving mission success for technology-driven organizations. From the importance of DevOps practices to the transformative power of digital technology, this video is a must-watch for anyone interested in GovTech and digital transformation.

Transcript:

Nathen Harvey (00:18):

Good morning, and thank you so much for having me here today. I'm really excited to be here. So we're going to talk about just that: software delivery performance driving mission success. But we'll start with a question. What is your mission? What is the mission that you are on and how can delivering software help with that? The truth is, I don't know. I don't know what mission you're on. I do know that software and delivering software can help you get there. The truth is that we are all, both individually and with the teams that we work in, and the organizations that we're working in, we are technology organizations. I love this quote from a former CTO of Alaska Airlines. We are no longer an airline. We are a software company with wings. We're a software company. Every single one of us works in a software company or a software-driven technology-driven organization. And technology is what enables those great user experiences.

(01:16):

Whether your users are fighters on the battlefield or your users are asylum seekers trying to come into our country, technology helps drive that mission success. So how can you and your organization optimize value delivery through investments in technology and in technologists, the people that are doing the work? Well, the first thing that you're going to need is a way to assess how are things going today. You want to understand, are you able to deliver technology to your users in the way that makes sense? Let's see. We'll go forward not backwards here. Because when you want to deliver software, you should always be moving forward. We need a way to prioritize what's holding us back and where should we work on what should we focus on next?

(02:07):

And then finally, we need a way to get feedback on that. The truth is that as we're developing technology, here's the thing about code and technology. It's an experiment. In fact, every line of code that you write is a new experiment. Is that code doing what you expected it to do? Is it doing what you hoped it would do? We don't know. We need ways to compress those feedback cycles. And, in fact, what we need is organizational muscle and the institutional culture that allows us to get there. In short, what we need is a way to get better at getting better, and this is where DORA comes in. So everyone say hello to DORA. Hi DORA. You might not know what DORA is. I'll tell you about it very briefly here. So DORA is a research program that's been going on for nearly a decade now.

(02:59):

This research program has talked to over 36,000 professionals around the world. In every type of organization. We get input from 250-year old banks. I don't know if banks are 250 years old. They probably are. We get information from government agencies. We get information and insights from technology startups that are months old. In order to gather this data, we're really asking this question of "how do your teams work together today to deliver and operate software?" And in order to get that information, we ask them these questions. I'll let you just read all of these questions and think about your answers to them briefly. Okay? I know you can't read any of those questions. The questions are a little bit less important right now for our purposes. So what we do is we ask these questions and we go and we investigate different capabilities. Capabilities that teams might have.

(03:55):

I'll talk about those capabilities in a minute. But what we're able to do by gathering data on those capabilities is a predictive analysis. We can say that this particular capability can predict a particular outcome, or help impact a certain outcome. What sort of capabilities? Well, this is a little bit less of an eye chart. You can read some of the capabilities here. We talk about and research technical capabilities. What technical capabilities do teams need in order to deliver software? But it's beyond technical capabilities alone. We also look at process capabilities and probably most important, the culture of your organization. We know that the people and how they work together, the way that they collaborate drives, more than anything else, how well a team is able to achieve their mission. Each year, the research publishes a report. You can grab this year's report at this URL, DORA.dev/report.

(04:54):

Of course, we're going to make all the slides available to you later, and you can see me after the talk. I'll hand you not a copy of the report, but a way to get to it. What are some of our key findings over the years? Well, here's the thing. We talk about delivering software. How do we know if we're good at delivering software? What measures can we use? So throughout the research program, we've looked at four key measures that help you assess your team's software delivery performance. Now, I want to be very careful here. What we're talking about is software delivery performance. That's that moment in time where you've built something, and you're ready to get it to your users. There are lots of things that happen before you build the thing. You have to decide what things you're going to build. You have to understand who your users are. And what do they need?

(05:49):

Our research focuses primarily on that delivery process. Why there? Well, part of the reason that we focus there is because we've run into so many organizations where that is a bottleneck. And the reason it's a bottleneck oftentimes has to do with incentives. I want to tell you a quick story from very early on in my career. I was working at a supply chain software company and we went to an organization that cares a lot about their supply chain. This organization happened to be a consumer packaged goods organization and one of the goods that they packaged up for consumers, maybe something that some of you had this morning...breakfast cereal. You've all had breakfast cereal. You know what breakfast cereal is, right? So we went and we talked to the inventory manager, and we said, "hey, tell me about what do you care about when it comes to breakfast cereal?"

(06:42):

The inventory manager said, it's simple. What I care about is just-in-time delivery. A consumer goes into the grocery store, they take a box of cereal off of the shelf. I want to be delivering a new box of cereal to the back of that shelf, at that same moment. I want to keep my inventory low. I want to keep flow through the system. And then we went to the production floor, and we talked to the plant manager. We said, "what matters to you? How do you get a promotion? How does your team get a bonus?" Plant manager said, "oh, it's easy. We get paid by the tonnage of cereal we produce every quarter." The tonnage of cereal we produce. So imagine this, you've got an inventory manager who wants just-in-time delivery. You've got a production floor manager who gets paid by the tonnage of cereal they produce.

(07:31):

At the end of the quarter, what happens? Well, at the end of the quarter, if I'm the production manager and I'm running low against my quota, I have a choice. I can make puffed rice cereal, or I can make brand flakes with raisins. Which would you make? And maybe more importantly, have any of you ever had a raisin brand treat? Yeah, no. Me either. Rice Krispy treats. Absolutely. So look, those incentives were fighting against one another. We find the same thing often when it comes to delivering software. We have teams that are responsible for running that software, and we have teams that are responsible for building that software. Well, as a former system administrator, as a former operations person, I know the best way to keep my systems stable...accept no change, no changes. If you don't change what's running, I understand what's running and it will just keep working. Of course, there's a real problem with that.

(08:32):

Our customers, our mission, needs us to continually evolve, continually improve what we're delivering in terms of capabilities. So how do we know how well we're doing with software delivery performance? Well, to answer that question, DORA has looked at four key metrics that measure that software delivery performance. And we ask these four questions: from the time a line of code or a configuration change is committed to your version control system, how long does it take for that change to make it into production? And how frequently are you updating those production systems? And when you do, how frequently do those changes create a failure or an incident? Something that you have to address immediately? And when you do have to address something immediately, how long does it take to restore service? We call these the four key measures of software delivery performance. Two of them are for throughput, that deployment frequency and your change lead time, and two for stability, your change fail rate, and your failed deployment recovery time.

(09:34):

Taken together, we can use these four measures to measure our software delivery performance for any type of technology. It doesn't really matter how that technology makes it into production. Do you have to load some technology onto a helicopter and fly it out to a ship? We can measure that using these four performance measures. Are you continuously delivering code into the cloud and making it available to your users? We can measure that here. And these four measures also have to have shared accountability. We can't turn to half of the room and say those throughput measures, those are yours. Those stability measures, those are yours. When we do, it's the same as deciding how much tonnage of cereal should we be producing today? Let's not do that. One of the things that's incredible that we've found over the years of this research, and this research has been going on for about a decade now, is that stability and velocity are not trade-offs of one another.

(10:35):

In fact, in order for your changes to be stable, they have to be fast. And when they're fast, they will be stable. It's a virtuous cycle here. And look, sometimes you might be thinking, "well, we're a highly regulated government agency; we are very risk averse." The truth is, there will be a time when you have to move fast. There's security vulnerabilities around every corner. Security vulnerabilities that are in production that we didn't know about until this morning. We have to be able to move fast to address those, and if we don't have the organizational muscle to do so, we're not going to be able to act well. So the truth is, that through stability, we get velocity, and through velocity we get stability. It's a virtuous cycle. Now, when we look at these four software delivery metrics, what do we see about them? Remember I said that we do a predictive analysis.

(11:33):

One of the things that we can predict is that these four software delivery metrics are predictive of better organizational performance. In the commercial world, that might mean better profit. In the government world, it might mean happier constituents. It might mean that people can get their applications for visas processed faster. But not only does it lead to better organizational outcomes, we find that it also is predictive of better well-being for the people on your team. Less burnout, less rework, higher job satisfaction. All of these are things that you want. But if these are how we measure software delivery performance, and the good things that we get from them, another question or maybe the next question is how do we get better? How do we actually improve software delivery performance? And that's where our research into these capabilities comes in.

(12:29):

And we look at things like technical capabilities. What technical capabilities must your team have? And this research is program and platform agnostic. So we don't tell you go out and buy a tool, and if you buy this one tool, you're going to achieve all of the glory when it comes to software delivery performance, because of course, we all know that that's not true. But what we do need is a capability. A capability like automated tests. What does your test automation capability look like? Remember, every line of code is an experiment. How do we get feedback on that experiment as fast as possible? Automated tests is one way that you'll get there. So there were technical capabilities that we look at, of course. But we do look beyond technical capabilities as well. We look at things like process capabilities. Now, think about the change approval process.

(13:22):

I've written a line of code. It's been through all of its automated tests. It's ready...it's ready to go to production. Well, sometimes we have to hand that change off to an external board, maybe a change approval board, who has no real insights into how do the systems actually work. That change approval board was put in place for good reason. We want to reduce risk. And by reducing risk or by looking at each of the changes that we're about to roll out, maybe we'll have a better idea on the risk profile of those changes, and we can slow down those big risky changes. But what our research has shown time and time again, is that those external bodies that have to approve those changes...one, they slow things down. No surprise there. But two, they increase risk. They actually make your changes less stable when they land in production. I was working with one organization who they were looking specifically at their change approval process. And they went, just like we heard earlier, they went and they looked at the policies.

(14:29):

What are the policies within this organization? In this particular organization, they had this policy that said, "any change that's about to go to production must sit in the version control system for at least six days." End of policy. Why? Why does it have to sit for six days? I don't know. I had a theory though. Did anyone run the New York marathon yesterday? Yeah, me either. It was yesterday though, right? Yeah. So do you know what those marathon runners didn't do on Saturday? Probably run a marathon, right? And so it's just like this code that we're about to put into production. It's going to have its marathon moment. It's going to interact with real users. We want it to rest. We want it to rest for six days. It's got to take a break. Maybe carb load. I don't know. I'm not a runner.

(15:24):

But here's what they found in that organization. They found that changes that waited from the time the developer was done and said, "I'm committed to getting this into production." If they start the clock there, if it was more than three days, the chances of that change actually causing an incident went up something like 600 fold.

(15:48):

Their policy was directly working against the stability of their changes. Once they gathered this data and had that introspection, they looked at that, they were then able to change their policy and say, look, this is slowing us down and making our changes more risky. We need to eliminate this. We need to find ways to streamline that. But it gets worse. One of the things that we found in our research is that usually a change approval process like that...as a developer, I have no idea what it takes to get my change from committed, to approved, and then deployed. And in fact, because I don't know what that process looks like, and, typically my interaction with that process is varied, highly varied...this change went through with no problem. This change, which I know to be less risky, got blocked and sent back to me...that increases burnout on the team.

(16:46):

As a developer, I don't know if my change is going to be approved. It's harder for me to leave that change at work. I think about that when I go home. I worry about that more often. That's not good for our organization. It's not good for our performance, and it's certainly not good for the individuals on our team. But beyond technical and process, we also look at cultural capabilities. You'll see on here, there's a thing here that says generative organizational culture. Now, I'll tell you in a minute about this great sociologist named Ron Westrum who looked into culture in lots of different organizations. I credit Ron Westrom for being a futurist. He named one of his cultures a generative culture. He knew generative AI was coming, and so he thought, let's call this culture generative. Of course, generative in this case means really performance oriented. But we find that those cultures really drive a lot of our ability to achieve these outcomes and drive those measures.

(17:49):

So, how do we apply this research? Well, it starts with thinking like, "this is how things are done around here." How many of you have heard, or maybe even said this?Okay, you only have to raise your head if you've heard this, because I know no one in this room has ever said, "well, this is just how things are done around here." So when I work with teams, I love to hear that. And then what I'd love to dig into next is what happens if something breaks? Because understanding what happens when something breaks, gives you really great insights into, "hey, do y'all...?" No, of course. That was a joke. I'm sorry. I'm sorry. It's a ridiculous question to ask "if" something breaks, and I...apologies to Bryon because he said "no theater." That was a little bit of theater. When something breaks, when something breaks.

(18:47):

So what happens in your organization when something breaks? How do you respond? I see this service is broken. "Oh, it's not my fault. Works on my machine. Go talk to, I don't know...it's probably the networking team. Go talk to them." Do you have scapegoating in your organization? And finger pointing? Or maybe not that, but "man, that networking team is so bad, and I'm pretty sure it's that one person on the networking team...we should get them out of here. Justice must be served." Is that how you respond to failure? Unfortunately, you've seen this before. I know it's not where you work today, but you've seen this before.

(19:26):

But a different response. "Hey, let's investigate the system. This thing failed. This system never ceases to surprise me. There's always something new lurking around the corner. What can we learn from that?" This approach to failure really exposes what is the culture like within your organization?

(19:48):

And when we talk about culture, I'm not talking about free yogurt at breakfast and foosball tables. I'm talking, and I'm going to quote Westrum here: "Culture is defined as the organization's pattern of response to problems and opportunities it encounters." How does your organization respond to problems and opportunities? This is really important. And what Westrum did, he has done research in a ton of safety critical organizations. He looks at emergency rooms, nuclear power plants. He investigated and worked with the people who created the Sidewinder missile. He identified this topology of organizational cultures. He's identified these three different types of organizations. You have the power-oriented or the pathological organization. The rule-oriented and the performance-oriented. So pathological, bureaucratic, and generative. And I think it's important as you look at this...I would imagine first, that you can see your team today. You can see the team that you work on right up in here.

(20:57):

Now, it is probably also true that your team does not fit squarely in one of those columns. Think of this as kind of a continuum. But understanding where you fit, maybe gives you some insights into things that you can start to improve. I also want to pay particular attention to this last row here. What is the focus of the leader in each one of these types of organizations? The leader in that pathological or power-oriented organization? It's all about me. We're going to do things my way, and whatever we do, I'm going to grow in my power and stature within this organization. And I'm going to make choices that allow that to happen. In the bureaucratic organization, the leader is really focused on departmental turf. And as Westrum says, "doing things by the book - whatever the policy says, we're going to follow the policy, we're going to follow it to a "T" all the time, every time."

(21:54):

And as Westrum points out, one of the challenges in this type of organization, is that not all leaders are reading from the same book. So we're going to do things by the book. Whose book? My book, not your book. My book. But it's that performance-oriented or that generative organization where we see the best success. Where responsibilities are shared. Where new ideas are welcomed. Where when something fails, we ask that question, "what can we learn from this failure? What can we learn from that?" So how do you change your culture?

(22:30):

How do you change culture? Well, there's this old saying, it is easier to act your way into a new way of thinking, than it is to think your way into a new way of acting.

(22:44):

So how do we change culture? We change culture by changing the tools. We change culture by changing how people work. Think about this. It looks like most of you will remember this time. There was a time not too long ago, where, let's say I wanted to write the strategic plan for 2024. I might create a document and I will call it "Strategic Plan 2024." And I want to get feedback from, well, not all of you, but let's say seven or eight of you in the audience. So I might take that document that I've written and attach it to an email and send it out to seven or eight of you asking for feedback and input. And what's the result of that? Well, now I get seven or eight, let's be honest. I get five or six emails back with feedback, and they've all changed the document slightly.

(23:34):

And they all come back with things like "Strategic Plan_Lisa," "Strategic Plan_Jerome,"..."Strategic..." And then I merge them all together and it's "Strategic Plan _final," and I send it out for one final review and we end up with "final_final" and "final October..."And it's a mess, right? And this is a terrible way of collaborating, right? We don't see each other's input at all, but we change the tooling. We move to something like online docs. Now, when we move to online documentation, we can all open up the document at the same time. We can collaborate and add suggestions and comments in real time, in that document. And it doesn't even have to necessarily be in real time. It can still be asynchronous collaboration. But the point is, that we're all seeing what everyone else is injecting into that document at the same time. We have increased collaboration within our organization.

(24:30):

We've changed how knowledge is shared. That is fundamentally changing the culture of our organization. And how did we do that? Well, it wasn't by showing up and saying, "October is culture innovation month. We're all going to innovate our culture, start collaborating more." That's not how we did it. We used new tools, and those tools helped us with that better collaboration. We find the same thing in our research. Actually, to be quite honest, in our research, what we find is that as you increase some technical capabilities, your culture improves. What we can't see in our data, we can't tell is if the technical capabilities that are driving the culture or is it the culture that's driving the technical capabilities? We think the answer is "yes," but our data and the way that we collect it doesn't allow us to definitively say. So, for a moment, just come with me on a journey on, it's the tools that are driving that culture.

(25:28):

Of course, there is something about the mindset there that matters as well. So what sort of things help drive our culture? I'm just going to take a look at two of the different capabilities that we look at. Two of the technical capabilities that drive culture. Version control. I don't know if you've ever done this. Sorry, let me rephrase that. I'm sure no one in this room has ever done this, but let me tell you about another job I had in my career, earlier. If we wanted to change some code, we would log into the production server over SSH, and we would change the code, and then we would refresh the page, or rerun the script, and see what happened. And it was great. It was great. We did sometimes get a little nervous, so we invented version control. And every time before we wanted to change a file, we would copy that file to file name.bak, and then we were safe.

(26:23):

And we were safe. It's not safe. Eventually, we actually grew up, started using professional practices and version control tools. And version control is one of those foundational things. Every piece of your code, every piece of your configuration, you should be able to rebuild your platform, your technology, your software, from version control. We should be able to rebuild everything from version control. All of our application code, dependencies, configuration...all of that must be in there. And this version control gives us a great way to collaborate. And if you think about your own journey with different version control systems, think about the cultural differences between something like Subversion, which is not a distributed system, to something like Git, which is a distributed version control system. And some of you that have never heard of Subversion and think, I'm just up here being subversive. I'll tell you a little bit about it.

(27:18):

There's a central thing, and that's where all of the code lives, and you would only ever have a copy for yourself of whatever you checked out. With Git, we're now distributing all of the code, all of the entire version control system is available and distributed to everyone. Think about what that does in terms of trust. Again, in terms of collaboration. Changing the tools can help change the culture. Although, I will be honest, one of the things that was terrible about Subversion...it was so painful to create branches. And when you created a branch, it was so painful to reintegrate that branch. Maybe that was a good thing. Git fundamentally changed how we did that. It makes it very easy. Branches are cheap and lightweight. But that inexpensiveness of those branches maybe causes us to keep them around too long. So, maybe we have to think about other ways of working when it comes to version control, specifically thinking about continuous integration.

(28:23):

Continuous integration is one of those things where, first, we've all heard the term before, but we might not all have a common understanding of what that word even means. Continuous integration means that we have an automated build process that's going to build our code every time a line of code is checked in. We have a suite of automated tests, and we're not allowing branches to live for a long time. We want to get those branches closed and merged back into our mainline as fast as possible. This changes our culture. And here's the most important thing about continuous integration. And I think one of the hardest things for us as a team and as an organization to truly grasp, and truly put into practice. This first one, agreeing that fixing a broken build takes priority over any other work. So when I'm working in a continuous integration way, I have automated tests.

(29:24):

Those tests get run every time a line gets changed in my code repository. And if those tests fail, we have to have an organizational, or at least a team commitment, that what we're going to do is we're going to stop other work. We're going to fix those broken tests. And of course, by fixing those broken tests, what I don't mean is commenting out those tests or removing those tests. Now the build is green again, let's continue forward. And this is one of those things that, in this room, we can probably all nod our head and agree like, yes, of course if the build breaks, we're going to stop and fix it. It's easy to agree to in principle. It is very difficult to do in practice. And the way that you do it in practice is well, it's just by that, by practice. By doing this over and over and over again. There are a bunch of ways for you to measure continuous integration as well.

(30:20):

And this is where our research program really tries to help you. We want to look at those four key metrics to help you understand how well are you performing when it comes to software delivery performance? But we want to go beyond that. Those four metrics are just kind of the surface. What you need to do, what you want to do as a team is set a baseline. How are we doing today? And then we want to look at those capabilities, whether they're technical, process, or cultural capabilities, and try to determine what's holding us back. And we want to improve that thing. We're looking for the constraint in the system.

(31:00):

So we're going to give you some tools that can help you with that. On DORA.dev, you'll find the DORA Quick Check, which is exactly as the name says. It allows you and your team to answer those questions, those software delivery performance questions. For example, for the primary application or service that you work on, what is the lead time for changes?

(31:24):

That is changes go from committed, all the way through to deployed in production. And again, as I said earlier, if we start with committed, there's a lot of things that happen over here, on this other side. Those things are absolutely important. We have to be able to measure those as well. But we find in our research, and in our experience, that it's that delivery process that is a significant bottleneck in many organizations. It's a good place to start. Figure out what is your lead time? What is your deployment frequency? From there, we also help you start to assess some of the capabilities. The quick check itself will help you assess capabilities like continuous integration, like the culture of your organization, and whether or not you're using loosely coupled architecture. From there, you can dig into a bunch of capability guides that will help you and your team ask better questions. Ask better questions about how are we doing with this particular capability?

(32:24):

What advice can we get from the collective? How can we measure that capability so that we can start to improve? And speaking of improving, the one thing that I know is certain, no one can improve alone. So I would invite you to join our DORA Community of Practice. This is an open community where we have researchers, practitioners, and leaders all coming together to discuss this idea of getting better at getting better. But I want to leave you with one thought. As you leave this room today. As you go back to your colleagues, I want you to go back with this message: "We will get it right this time."

(33:04):

Oh, I love the enthusiasm, but I'm sorry to tell you, you're probably not going to get it right this time. In fact, as we do anything, anything new, we're unlikely to get anything right the first time. And so this really is about adopting and embracing a mindset and a practice of continuous improvement. Use DORA to understand how you're doing today. Use those capabilities to understand what might be holding you back, and go change something about that thing, that capability. We have bad tests. Let's try to improve our tests. But then let's use DORA to assess...is improving our tests, helping our software delivery performance? Because at the end of the day, what we really want is a happy team. A happy team that's productive, that's building software, and delivering that software in a way that matters. We're delivering that software out to our users to help drive mission, success, happy customers, whatever your mission is...

(34:16):

And then maybe one final, final thought. We're back to those docs. What comes next? What comes next after we start this improvement? Well, the truth is, we have a lot of AI capabilities coming in. There are going to be things beyond AI that come in next. I don't know exactly what comes next, but I know that as a team and as an organization, if you focus on learning more, and learning faster, and adopting a learning mentality, that's what's going to help you build resilience and succeed in the future. With that, I'm at time. Thank you very much. My name is Nathan Harvey. I'm a developer advocate. I'll be around for part of the day. Come see me. Thank you.