Govtech Platforms Don't Have to Suck
Summary:
Bryon Kroger of Rise8 takes the stage at Prodacity to unravel the mysteries behind Govtech platforms and how they can be improved. If you've ever felt the pain of navigating platforms for software delivery in the federal government, you're not alone.
Transcript:
Bryon Kroger (00:19):
Today I am going to talk to you about Govtech platforms and how they don't have to suck. Which implies that they do. And I think most of you probably share that experience...that using platforms to deliver software in the federal government can sometimes be a real pain. And as fun as that is, I wanted to just start right in on that and platform strategy. But I'm going to take a good bit of time to actually recap the CNCF platform's Working Group white paper. If you haven't read this, I highly encourage it. It is a brilliant resource. I'm going to hit the high points of it, the why behind platforms, defining it, some of the key attributes and challenges we see, how you actually measure success of your platform, and then the capabilities that you need to be looking for and how to have a strategy around those critical capabilities.
(01:07):
So, first got to start with "why?" Why platforms? But before I do, we have to talk about DevOps, right? We did the state of DevOps report and platforms are related to DevOps. I'll get there. But what is DevOps? It is not throwing the grenade over the wall. I gave you this definition last year for those of you that are returning, I'll probably recap it every year. This is a definition from John E. Vincent. He says, "Here's the secret. I'll tell you exactly what DevOps means. DevOps means giving a crap about your job enough to not pass the buck. DevOps means giving a crap about your job, enough to want to learn all the parts and not just your little world. Developers need to understand infrastructure. Operations people need to understand code. People just need to freaking work with each other and not just occupy space next to one another."
(01:57):
That's from John E. Vincent. And that applies more broadly, not just to dev and ops, but to every function of the organization. Stop just occupying space next to each other and start working together. Now, looks something like this...hashtag hug ops. It's really important to start building empathy and trust with those teammates crossing the bridge. And before I close on DevOps, I do have to note something. There's a lot of things that are getting inserted into DevOps, DevSecOps, DevSecTestOps, DevSec reliability or SRE ops, all kinds of things. Stop trying to make that happen. It's just not going to happen. This is what I do to dev stars, but I think DevSecOps, I just wish people would stop saying that word. I don't condone gun violence, but I did choose violence against DevSecOps today. Alright? DevOps is actually a culture, right? I think that it's a promise of cross-functional collaboration across the board. And what does this have to do with platforms? Well, inspired by those cross-functional cooperation, that's promised by DevOps, platforms have really, and platform engineering in particular has started to emerge as an explicit form of that cooperation in the enterprise. But each time that you tackle a constraint, you run into an equilibrium problem. It is like whack-a-mole. You address one constraint...new constraint pops up. And that is the dilemma of building platforms - the changes keep coming and you have to try to get to a state of equilibrium in your ecosystem.
(03:43):
And then finally, I think that we need to start thinking of ourselves in these roles as gardeners, not as architects. So, I want you to think about planting seeds as we go through this talk about platforms. Not "oh, this is really great. I'm going to architect a platform just like this in my enterprise." You're going to plant seeds, you're going to pull the weeds. You have to provide the right environment for your platforms to grow. You need good soil. Along the way, focus on sustainability and equilibrium, not architecture. And that's the kind of culture that DevOps and platforms enable and require.
(04:23):
So, according to the CNCF white paper, they identified several things that investing in platforms can get you. The first is reducing cognitive load on product teams, and accelerating product development and delivery. So this is a fairly obvious one, but I'm always shocked at the number of people that have their development teams that are supposed to be developing features for users, spending half of their time or more managing Kubernetes clusters.
(04:48):
Turns out that's not super valuable add. Your users do not care at all about your Kubernetes clusters. They just need you to ship the application. And so, reducing the cognitive load on product teams could mean providing abstractions, such as those that would be above the Kubernetes layer, to orchestrate your Kubernetes for you as a service, as an application team. I'm just hitting that as an API, and I'm able to start deploying my product. Improve reliability and resiliency of products. This one's really important. Again, if teams are really focused on feature delivery, week over week, month over month, then they get distracted by things that would revolve around reliability and resiliency. So, we want to make sure that our platforms are baking in that level of reliability and resiliency and that developer teams can just focus on app-level resiliency, app-level reliability.
(05:48):
Accelerate product development and delivery by reuse, sharing platform tools and knowledge across many teams in the enterprise.
(05:57):
So as you build those abstractions, you can imagine if you did not have them, every single team, if you have say 200 development teams in your organization, like Veterans Affairs, air Force organizations, that's like a pretty small scale actually for them. Now, every abstraction, the work that goes into that, if you don't have it available, multiply that by 200, or 500, or a thousand. It's...everybody's doing the same work over and over and over again. And that's waste.
(06:28):
You can also reduce the risk of security, and regulatory and functional issues in products, through GRC in your platform. Rob's going to talk a lot more in depth about that next, but there's a lot of things that you can do here. And in fact, a lot of the things that you're told are reasons you can't do DevOps in the enterprise, are precisely the things that you can build right into your platform.
(06:48):
And then finally, enabling a cost-effective and productive use of services. And I say public clouds here...there's an interesting thing I'll get to later, but I do want to point out that most of us, even though we say we're using public cloud, like "oh, we go buy AWS, we go buy Azure..." it's GovCloud, and those are physically segregated servers. And I'll talk about...the game is being changed on that in the federal government right now. But for all intents and purposes, I would not call those public clouds. They're actually private clouds at that point. They're for whole of government. So it's very big private cloud, but it's still private nonetheless. And you can't really achieve the economies of scale you're looking for when you have physical segregation. Okay...so what is a platform? A lot of people use that word. I don't think it means the same thing that most people do.
(07:41):
So I'm going to give you a few different definitions. From the white paper, they say a platform for cloud-native computing is an integrated collection of capabilities that are defined and presented according to the platform's users needs, not according to engineering wanting to build really cool engineering things, like "how cool can I make this Kubernetes cluster?" It's about what your application developers, in this case, actually need. It's a cross-cutting layer. It gives a consistent experience across the enterprise, and it is a consistent user experience for using and managing those capabilities, hopefully self-service and on demand through the use of APIs.
(08:19):
So who here is familiar with Team Topologies? Anybody? ...Got some folks, great! So it's a great book no matter where you sit in your organization, I very highly recommend it if you're in software. They identify four fundamental team topologies and those interact with one another in three main ways.
(08:38):
So you've got your stream aligned teams there in yellow. Think of those as primarily application teams. They deploy their capabilities consuming platform team capabilities in the blue at the bottom. And then there are complicated subdomains, when you have really complex integrations, as well as facilitating teams. We're not going to talk about those, but I think one thing that I do want to point out is in your platform journey, there are going to be new services that the platform needs. And a lot of times the best way to build them is in this collaboration model that's shown here, where the app teams actually work with the platform team, together, to build the thing that they need. And then as that thing becomes needed by other teams, it gets spun out into its own service. It gets offered as a service later on. And so that evolution can be really important because a lot of times platform teams become completely overwhelmed with the needs of the applications in the enterprise.
(09:34):
A lot of times enterprise platforms are forced to accommodate a wide range of capabilities all at once, very early on in their journeys. So, using team topologies, the team at Atlassian says that platform teams create capabilities that can be used by numerous product teams with little to no overhead. They minimize resources and cognitive load on that team. And, they create a cohesive experience. So you can see almost no matter where you look across the industry, the definition is pretty much the same. But I bet if I walked into any one of your organizations, there would be very few people that would say any of these things about what the platform is supposed to provide. And so, I wanted to get us all aligned on that definition, and I hope it's something that you can take back to your organizations as well.
(10:19):
So, I'll skip Martin Fowler's definition today. Platform maturity. I want to talk about really quickly. So according to CNCF, there are many use cases that the enterprise could meet with platforms, and it might progress something like the following here. So maybe at first, when you're starting in your journey, they can just provision capabilities on demand. And you would be shocked at the number of organizations that we walk into, where people can't provision capabilities on demand and immediately use them to run systems. And I'm sure you're all very familiar with that.
(10:53):
Then, later on, they might be able to provision services spaces on demand...use them to run pipelines. This is an interesting pattern that we see in government - a lot of people are moving to very centralized pipelines and they're not accessible on demand. It very much breaks the feedback loop. So the more that you can get into the stage of having those be available as templates or things that can be run on demand by developer teams, otherwise they become centralized dependencies that can cause failure for teams and block their path to production. And then administrators of third party software can provision required dependencies. This is one that we really struggle with in the federal government today. And then being able to provision complete environments. So as you can see, this is very much a journey, and I just point this out because so many teams are asked to stand up a platform inside of federal agencies, and they're expected, and we'll talk about DIY in a minute, but they're expected to build the platform themselves versus buy a commercial solution.
(11:54):
And they're expected to meet the needs of the entire enterprise overnight. And it is very much a journey to build these platforms. And every commercial platform that's available went through this journey at some stage with thousands of developers. And then finally, product developers and managers can actually observe all of the things on the platform. And observability is a huge function and I'll talk about how important it is in the federal space and what some of the things that we need to do with it.
(12:21):
Now, we've got to get some key attributes, that may be a pop culture reference that's lost on some folks, that's DJ Khaled, "major keys to success," but key attributes that affect the success of a platform is, number one, platform is a product. You have to think of your platform as a product just like any other product. And sometimes people tell me they're doing platform as a product, and I go in and I say, "can I talk to your product manager?"
(12:46):
And they're like, "well, we don't have one, but there's a senior engineer over here." And I'm like, "okay, well how about your UX researcher - your service designer?" And they're like, "what are those? Those for app teams, right?" No, you should actually do this exact thing with your platform. User experience is just as important as product management. When you're building out a platform, you need to have really good documentation and onboarding. I know when we all move to Agile, we're like no documentation. We prefer working software over documentation. I'll just call your attention then, in the Agile Manifesto, it says the things on the right are still important, we just care about the things on the left more. So yes, we care more about working software than comprehensive documentation, but especially when you're building a centralized platform that becomes a dependency for every single team, you need documentation and a good onboarding experience.
(13:37):
It needs to be self-service. And finally, it does need to reduce cognitive load for users. And there are so many platforms today that actually increase the cognitive load for users. It's actually harder to deploy to production. It's pretty wild to see. And then, it has to be optional and composable. And this is something that's kind of a lesson that I learned over time. When we started Kessel Run, we started out on Pivotal Cloud Foundry, which was probably the most structured opinionated platform ever, and very much like a Heroku-style experience, you needed to be a cloud-native app really to leverage that platform well. And it worked for us at Kessel Run, but when we wanted to start expanding to the enterprise, most of the enterprise users couldn't use our functionality.
(14:20):
Now that doesn't mean you shouldn't have a platform like A PCF in your enterprise, it's going to really depend on your needs. But ultimately, the wider variety of needs you're trying to serve, the more optional and composable your platform should be, and then it needs to be secure by default.
(14:36):
So, some of the jobs that a platform team is responsible for...researching platform user requirements. So don't just have UX researchers, make sure you're actually doing that and building out a feature roadmap around your UX. Marketing, evangelizing, advocating for the platform. This one gets lost a lot. If it's hard enough to get product managers and product designers on a platform team, oftentimes when you're building, or even buy building, inside the federal government...a platform, there's an aversion to marketing and evangelism. I'm telling you, if you're going to offer this as an enterprise service, you really need strong strategic communications, marketing, and evangelism just like the commercial companies that are trying to sell you platforms. And maybe you use one of theirs and combine it with other technologies.
(15:24):
Whatever you ultimately compose, be prepared to market and evangelize that. And then, you have to manage and develop interfaces for using and observing those capabilities. Observability becomes a really important function as you are trying to figure out what to do next. Every single stage you have to figure out what do users need next? You can go get qualitative interviews, but a lot of times the data is going to tell you more than you could ever get from a month's worth of user research.
(15:52):
Some challenges that we see with some of those jobs, platform teams treating those like products and developing them together with users. Huge problem here in the federal government. My plea to you is to put product managers and product designers on your platform teams. Please, please, please. Platform teams also need to carefully choose their priorities and initial partner teams. A lot of times enterprise platforms get mandated and everybody's told to use them all at once.
(16:20):
That's a really bad strategy. It will probably cause your platform team to fail. And we've seen this with several organizations in the Air Force and across Department of Defense. I've seen it in civilian agencies as well, and that's because they can't meet all their needs at once. Creates a really bad user experience, creates a death spiral too, of constantly responding to issues from users, onboarding issues, runtime issues, and teams can never get ahead and get back into the mode of developing features that those people need.
(16:50):
And then this one, getting the support of enterprise leadership by showing impact on value streams. A lot of teams get their leadership to mandate their platform, which becomes antithetical to being able to respond to user demands. If you're already mandated, then what incentive do you have to go talk to users and learn more about them? So, the first thing that you can do for the business is show how you're helping to get things into production faster, hopefully cheaper, and really impacting those value streams that your mission or your business depends on.
(17:30):
And so, some of the ways that we can reduce that cognitive load on the platform team, because they face the same challenges as application teams, is building the thinnest viable platform over implementations from managed service providers. So I would tell you, and I will get to the math on this, buy first, hopefully rent first if you can. We'll talk about that challenge in the federal government, and then where you have differentiation for your mission, start with thin viable layers.
(17:55):
Leverage open source frameworks and toolkits for all of those other things that you have to do. There's a never ending stream of things that you have to do when you're building a platform. And so for those things, really important to leverage open source and toolkits, templates, and those types of things. And then ensure that your platform teams are staffed appropriately for their domain and number of customers. 300 was a really fun movie, but it's not a great way to run your platform organization as it turns out.
(18:26):
So how do we measure platform success? User satisfaction and productivity...you could do things like active users and retention, net promoter scores...highly encouraged. It's an easy lightweight way to find out how your platform and your enterprise is doing. And then metrics for developer productivity. Really like the space framework. Nicole Forsgren has a lot of great talks on that online that you can find. But it really helps take those DORA metrics and contextualize them and make sure that you're using them appropriately.
(18:54):
And then, organizational efficiency. So latency from fulfillment. So, for a database or a test environment, latency to build and deploy a brand new service into production. And then time for a new user to submit their first code changes to their product. That'll really give you a good idea of what your onboarding experience is like.
(19:13):
Product and feature delivery...Nathen covered these, the DORA metrics, so I'm not going to go into those.
(19:20):
I think we hit those pretty hard. I do just want to point out something about reliability. If you dig into the State of DevOps Report, reliability practices have a non-linear, super linear, impact on organizational performance. It's very much a virtuous cycle. And the other way you can look at that...reliability, SRE, is really about eliminating toil at the end of the day. And toil is not like tech debt, it's not something you can pay off later. Toil creates a death spiral. Like one that you just cannot get out of because you're constantly onboarding new users, and it just eats you alive. So I would say invest in reliability practices early and often.
(19:59):
Capabilities of the platform. I'm going to breeze through these quickly in the interest of time because I want to talk about the cost model and how it impacts your platform strategy and government organizations, but we have things like web portals that are important.
(20:12):
This is how we interface with the underlying capabilities. So you can't just build the capabilities. You need to create really great ways to interface with them. APIs. And then environment and project templates and documentation search. These capabilities in particular get left out a lot. And people build really great underlying platforms, and not great ways to interact with them, and it becomes one of the toilsome burdens that starts your death spiral. So definitely don't overlook interfaces.
(20:38):
Ways to automate: build, test and deliver; provide the environments and resources that teams need; observe workloads, infrastructure, so basic things like compute, network store, data, and messaging capabilities; and then we need to be able to identify and authorize users and services buying services to workloads via secrets...
(20:59):
Some of you're new to platform, and I'm going to get to why I'm showing you all this. Scan artifacts and enforce policy; store artifacts and registries in repos...who had no idea that there was that much going on in your platform?
(21:11):
Is anybody surprised by that? Usually leaders are very surprised by this. They're like, "go get me a platform." And they don't realize that there's a lot of stuff. It's like the new ESB. So, enterprise service buses. We would do crazy architecture diagrams, and then somebody got smart and they're like, "let's put all the complexity in this box and we'll call it the enterprise service bus." And that's what platforms are today. There is a lot going on in platforms. And people, it's like a black box that people hide a ton of complexity inside. And for this reason, people drastically under resource their teams. And when government is doing government cost estimates, they fail to account for I would say 80% of the costs. And here's why:
(21:54):
That top layer...I'm going to be very generous here...it's going to take way more than this, but just so people don't try to call my bluff or something, I'm going to use really low numbers. I'm going to say that that is going to take about a 12-person team to manage all of those interfaces. Probably double that. But if I use a blended rate of $175 an hour, that's $4.3 million just to run those teams. What about the other ones? It's going to take me 18 to run those critical capabilities, $6.5 million and it's going to take even more to run the rest of the capabilities putting me at about $8.7 million. That's quite a bill. And $20 million, to be exact. And there's a lot of assumptions in here.
(22:39):
First of all, that's labor only. That doesn't include any of the licenses. Maybe you're going to use free open source. I'll get to why you might rethink that later. But first assumption, is that you can actually hire this talent at all. And the government really struggles, even on contracts at $175 an hour, you're not going to get the people that can build this. I used $175 so that nobody, it was like, "that's a lot of money, Bryon," but probably looking at $250 an hour as a starting point for really competent people in the platform layer. And when you start getting up into the Kubernetes side of the house, you should be paying closer to $300 to $350. So, at an average rate of $175, probably not. And it also assumes that they're empowered and productive, which you know that once they get on contract, they're going to wait for an ID card so that they can log into your systems. That'll take a couple months. Then they got to get all their accounts, your onboarding experience, we talked about that...
(23:31):
And then, they try to ship things and they have to wait for the change approval board, and it just goes on and on. They're not very productive. And these teams are autonomous and require no management. I didn't factor in a management layer. You probably should, and you should probably add at least 15% for SEPM. Government folks know that, but that's senior engineering and program management. Going to take you to about $22 to $23 million, and then a five-year timeline or more.
(24:00):
So, one thing that I think is important, is to talk about value here. So we talk about cost, schedule and performance in federal acquisition regulation all the time. There's another side. That's like one side of an equation. Value is performance divided by the product of schedule and cost, right? That is value. I'm going to use a slightly different version of this now. This comes from Alex Hormozi, but instead of performance, we're going to break that down into the dream outcome.
(24:32):
The performance that you want and the likelihood of achievement. That's important. And your costs include a time delay as well as your effort and sacrifice, only, some of which is money. The money is the least of your worries here, I promise. So, I think a couple things is that when we look at our ability, our likelihood of achieving all of those things, those 13 critical capabilities, all the attributes, the way that we have to measure them, our likelihood of doing that with this contract vehicle that I've proposed, is very low. The time delay, I said five years at least, it's probably going to be way more than that. Effort and sacrifice, that's way up. We got the cost in dollars. It's not the total cost of ownership. I really only costed out day one. Now we have day two operations that we're concerned with, all the licensing, team turnover, all of those things.
(25:23):
And then it's a really high amount of effort and focus on the government's part. You are now focusing on the platform instead of the capabilities that you're trying to deliver to Veterans, to the energy sector, to warfighters... And so, folks, DIY is a low value endeavor for you until this equation changes. Until you can get access to all the talent, and you can do it in a cost-effective way. And all of those things...just don't DIY. It's a really, really bad journey. Now, you can do some things to change that, if I just can't talk you out of DIY. What I would say is, at Rise8, we charge a way higher rate than $175. I'll just say that. I make no apologies for it. But that can lead to a corresponding increase in cost, but a disproportional increase in the likelihood of achievement.
(26:19):
So, just paying a little bit more, like paying, let's just say 50% more, I'll even do like 30% more, is going to get you a level of talent that, and there's good research on this...is about a 4x improvement. So that increase in salary translates to 4x more performance on the individual level. And then if you can hire a contractor that knows how to organize those people into effective teams, that's when you can get to 10x. There's no 10x engineers, but there are 10x teams, and that is going to then lead to a disproportionately lowered time delay. And so now you can increase your value proposition. So if I can't talk you out of DIY, at least put out really strong rates. And I'm telling you $250 to $350 is where you should be playing, and that still might not get you what you need.
(27:09):
So what's the solution? Don't build when you can buy. Don't buy when you can rent. We have trouble renting, I'll get to that. But a buy example: this is a DOD customer that I'm familiar with. They bought platform licenses. So instead of building their own platform, they were spending about four and a half million a year in platform licenses. They had an eight person platform ops team to run that platform once it was installed, 2.9 million using the rates from before. And then they did a secure release team. So we need to path to prod for this and that can't come out of the box because we're a little bit bespoke. Okay, great. So $2.2, a total of $9.6 million, 50% cost savings right out of the box. That's conservative. A high likelihood of success and a six month time to value. And they did do it. So a hundred percent success rate. No, that's not how that works.
(28:03):
Then the best part about this is, I would say, I don't want to say low stress, but moderate stress compared to the high stress of DIY that we see, and a moderate level of effort below the value line. Platform is below your value line. Your value line is serving capabilities to your users, not building platforms.
(28:22):
So, I want to do a hypothetical rent example though. If we had a true PaaS model, something I want to point out is Google did something that's pretty awesome, they stood up to the government for years on physical segregation for GovCloud. And everybody thought they were crazy. I kind of thought they were crazy. But they stood their ground, and in the end, they convinced the government to move to a logical segregation model all the way up to the secret level, probably top secret in the future. And what that means is now government workloads are running on true public cloud, with logical segregation.
(28:59):
It's secure. But that means that we now have the economy of scale. Google didn't have to go out and buy servers dedicated to the government and teams dedicated to the government. And so, I think we'll see all the vendors move in that direction, but we all owe Google a huge thank you for doing that because it came with a huge cost of delay, by the way. The opportunity cost - it caused them to enter the market almost three years later than everybody else, but they stood their ground. And I think that we'll all be better for it.
(29:28):
But, so let's just say now that we can do platform subscriptions instead of a buy, which means that it's a managed service offering. I'll use Anthos since we're talking about Google just to give them a little love. So let's just say we use Anthos. Now we can get rid of that eight-person FTE platform ops team because coming to us as a managed service offering.
(29:48):
So now we're at a 67% conservative cost savings. Renting is even better than buying. It has a much higher likelihood of success, immediate time to value, very low stress, and effort below the value line. And then there's an interesting thing that unlocks. I mentioned economy of scale, but now we can achieve economy of scale in the platform layer too, not just at the infrastructure layer. So maybe the cost of those subscriptions in the future, as we invest more in this, goes down to $3 million, and now we're at a 75% cost savings. So you can see, don't build when you can buy, don't buy when you can rent.
(30:25):
Now I want to talk about a team, it's actually the original team that I talked about that did a buy and deploy. They decided that that platform we just can't do...we're locked into the vendor. We got vendor lock. We need to build our own Kubernetes platform. And they tried. They have now spent four years and over a hundred million dollars trying to migrate from a working COTS platform to a DIY Kubernetes platform. The platform is still not in production. Not a single workload in production. Nothing. Incredible stress, and effort below the value line. In fact, the platform organization grew to be as large as the application side of the organization. And it's diverted all of their money and focus from delivering mission applications, and the warfighters are losing. Don't be that guy.
(31:11):
So, I mentioned total cost of ownership. I'm not going to go into as much detail on this one, but I at least want to give you an overview. Just maybe a quick sidebar here. And that's that ,there's a cost to develop, and that's to develop the platform itself, which you can get rid of if you buy one or rent one.
(31:30):
And then there's a cost to develop on the platform, which is a function of how much abstraction that it provides. So if your platform provides more abstraction for developers, provided that they can use those abstractions and they're not actually causing them pains because they need other things, so executed well I will say, that will be a cost savings multiplied by the number of teams that are using the platform.
(31:53):
So a lot of times when people are doing cost of ownership comparisons, they're not accounting for the cost of development on the platform later. There's the cost to operate, and that's all services and workloads. So all the platform services, all the application workloads from day zero to two. Hopefully everybody's familiar with these terms, they're more common now, but most of the government cost estimates I saw in the early days of Kubernetes, like five years ago when it was getting hot in government, people would stand up a cluster and a hello world app and they would be like, alright, "I'm going to extrapolate costs from this."
(32:25):
That does not work well. It is not a linear scaling pattern whatsoever. Day one is getting it installed and running. Day two is all of that operational maintenance, patching, support and operations. Those are what eat everybody's lunch. Day two. Day two wrecks everybody. So make sure that you're costing those out. And then the cost of compliance. And it's the sum of all the team's costs that have to do compliance. In the federal government, you can inherit controls, common controls inheritance model, and what that means, Rob's going to talk about it in detail next, but it means that just like the cost to develop on the platform, if you can abstract controls away from application teams, now you can multiply that toil by the number of teams, and that's a cost savings on the compliance side of the house. And the compliance side of the house is not only a cost in dollars, it also accounts for the largest cost of delay.
(33:20):
The largest amount of delay in the deployment cycle in the federal government is ATO. Hands down. So definitely something to pay attention to. And then you have to account for all the licenses, subscriptions, other direct costs and FTEs. And I say all of them, because some government organizations started using government developers, which is great, I encourage that. But they also use it as a really good accounting gimmick on the books because now this platform over here is using all contractors, they're at $5 million, we're using all government, ours is "free." It's not free. It's still costing you money. And so you should probably use some sort of normalized rate for your government employees to at least track those costs.
(34:03):
Now, this is the current CNCF landscape. And at this, it's impossible to fit this on a slide anymore. I remember when it used to be much smaller.
(34:10):
I'm glad they provided the handy dandy CNCF landscape guide, which ironically claims that it's not that complicated. But to run all of that by yourself, DIY, it would take an army that you do not have. And so, this is a good rule of thumb here, that if you have a big organization or big scale, and low engineering maturity, which is most of federal government, you should really be looking at PaaS. PaaS is a prerequisite for DevOps outcomes in the federal space. That's my opinion. You could take it to the bank. I don't think we should be looking at end-to-end teams and SRE at this point in our journey. We have a lack of engineering maturity for that. And so PaaS is the way. Now whenever I talk about PaaS, people are like, "oh, Bryon, we've got the vendor lock." And I talked about a team that already went down that road, but I think this comic really sums up what I want to say, which is, "are you actually locked in or are you locked out?"
(35:14):
Locked out of mission outcomes, locked out of your DevOps journey...have you actually measured it? Is it IP lock? Which is I think everybody in government in particular has Oracle horror stories about getting sued. You're literally locked in. Or is it just a switching cost? Which I'll talk about in a minute. How high is it? Compare that to the alternative, which is are you locked out of your DevOps journey? Is there a cost of delay that you're eating? Also, your legacy has lock-in, right? Your free, open source has lock-in, all of your GOTS has lock-in. And in fact, some of the worst lock-in I've seen comes from GOTS, government off the shelf software. Every choice that you make has some degree of lock-in and DIY is the stickiest of all. It's the hardest to get out of. So I would say start with efficacy. Efficiency without efficacy is a total fool's errand, and efficiency comes second.
(36:16):
And then always manage your risks. And I'll talk about how to do that. If you want to know more about this topic, you've probably got somebody in your enterprise that talks about lock-in all the time and gives as the reason you shouldn't buy anything that you want to buy. Highly recommend this article upon Martin Fowler's blog. Gregory, or sorry, Gregor Hope wrote it. It's fantastic. And he gives this two by two of switching costs and unique utility, and makes the case for when you should actually accept lock-in, and when to exercise caution. Not going to go into the details today, but just know that you can measure this, and it's really important. And I'll show you some of the math. He also notes that switching cost has two components, the actual cost of the switch against the likelihood of the switch. So if you're not likely to switch and the cost is low, you can gain incredible agility by adopting commercial solutions.
(37:08):
So how does that look from a numbers perspective? From a risk perspective? As you invest upfront in avoiding lock-in, your liability exposure goes down, right? We want low liability. That sounds great. But then when you add back in those investment costs to the liability, you see that your returns diminish and eventually reverse. And so there's a real sweet spot here of finding what opinions you can accept from the commercial sector and what you need to do on your own. You'll want to reach between accepting this risky business, and over-investing in trying to mitigate lock-in. And then I always like to talk about the fact that free open source is free like a puppy. You get locked into raising it. If you don't have the resources to raise that puppy, don't bring it to your enterprise. It craps on everybody's work, even if it's just number one, you won't be able to convince everybody it's raining.
(38:03):
If you raise a puppy right though, it can be a great companion, right? And it can make life better. But you have to know what it's going to take. You have to go in eyes wide open. You probably don't want a hundred free puppies. You don't want the whole CNCF landscape, I promise. Don't be that person. Know what you can handle. Know when to hire trainers. Nowhere though, is this more true than in the platform space.
(38:22):
So, just to recap, we covered the five reasons why we need platform. So when done right, they help us deliver on the promise of early and continuous delivery of valuable software to our customers with high quality and reduced risk. We define the platform foundation. It's a foundation on which we can deliver high-quality product features, higher pace, reduced coordination costs, reduced risk. We talked about those major keys to platform and what platform teams should be focused on, some of the challenges that they run into, how to measure them, and the capabilities.
(38:59):
But I want to hit...just go back to the platform strategy and team topologies. This two by two is just a really good rule of thumb for most of govtech. Please use PaaS. PaaS is a prerequisite for DevOps outcomes. I just really want to emphasize that in the federal space. And you should either buy and operate, or rent. Another thing, remember that good engineering costs a lot, bad engineering costs even more. Hopefully, I convinced you that $175 a rate are not even enough. And I have seen platform and cloud contracts come in at $100 an hour rates, all across the enterprise. It's super common. Go check, ask your enterprise. Find out about their contract, and look at the rates, and you'll know why you're facing a lot of pain.
(39:48):
Another interesting thing though, as you're getting this talent, a lot of times through contracts, put your best talent on platform reliability, not on feature development. Really common that people are like, oh, we got our users. Yes, we love our users, but the best thing we can do for our users is put our best people on platform, and our most junior people on future development. So I put that out there.
(40:12):
And some parting thoughts, whether you buy or rent, you still need highly skilled operators and administrators or UX will suffer. So even if you're renting, there's still administration that has to be done. We use Google at Rise8 and there's a fair amount of administration that we have to do to administer even just our Google suite. And if again, you hire low quality people to do that, your front end will suffer, and it'll cause downstream impacts. It's hard to get your email account. Imagine how hard it is to get your cloud account. Your cloud can't suck either. I've written about this pretty extensively. Follow me on LinkedIn for a good time. But I think it's really important to emphasize everything that I've said today applies to the cloud layer as well. And we see the same challenges at the infrastructure layer. You also need a killer compliance process. You need some process improvement and automation in that order. This handsome fellow is going to talk to you about that in a minute. But before he does, get in touch with me, connect with me on LinkedIn.