From Operations Anti-Patterns, DevOps Solutions by Jeffery D. Smith
This article covers:
§ Defining DevOps
§ The CAMS model
It’s 11:30pm on a Friday evening, when John, the IT Operations manager, hears his phone ring. The ringtone is distinct, one which John has programmed to be instantly recognized as a call from the office. He answers the phone and on the other end is Valentina, one of the senior software developers at John’s office; there’s a problem in the production environment. In the last software release, there was additional functionality that changed how the application interacted with the database. Due to a lack of adequate hardware in the testing environments, the entire application couldn’t be tested prior to release. Around 10:30pm this evening, a scheduled task that only runs quarterly began executing. The job was missed during the testing phase and even if it wasn’t, there isn’t enough data in the test environment to create an accurate test. Valentina needs to stop the process, but she doesn’t have access to the production servers. She has spent the last 45 minutes searching through the company Intranet site to find John’s contact information. John is the only person Valentina knows who has the production access she needs.
Killing the scheduled task isn’t straightforward. The task usually runs overnight and wasn’t designed to be stopped midway through processing. Because Valentina doesn’t have production access her only alternative is to dictate a series of cryptic commands to John over the phone. After a few missteps, John and Valentina have finally managed to stop the task. The two plan to regroup on Monday to figure out what went wrong and how to fix it for the next quarter. Now both John and Valentina must stay on guard over the weekend in case the behavior repeats itself with another job.
Chances are this story feels familiar to you. Having production code which hasn’t been properly tested feels like a scenario that could have been avoided, like when it interrupts a team member on their off-time. Why is the testing environment insufficient for the needs of the development group? Why wasn’t the scheduled task written in such a way to make stopping and restarting the task a straightforward process? What’s the value of the interaction between John and Valentina if John is going to blindly type what Valentina dictates? Not to mention the two probably skipped the change approval process in the organization. Nothing raises the safety of a change like five people approving something they don’t understand!
The questions raised here have become common place enough that many organizations don’t think to examine them in detail. The dysfunction detailed is often accepted as inescapable due to the difference in roles between Development and IT Operations teams. Instead of addressing the core issues, organizations continue to heap more approvals, more process and tighter restrictions onto the problem. In the minds of leadership they’re trading agility for safety, but in reality, they’re getting neither. When was the last time you said “Thank goodness for change control!”? These negative and sometimes wasteful interactions between teams and processes are exactly what DevOps is attempting to solve.
What is DevOps?
These days the question “What is DevOps?” feels like something you should ask a philosopher more than an engineer. I’ll give you the story and the history of DevOps before presenting it with my definition. If you ever want to start a fight at a conference though, you can ask the “What is DevOps?” question to a group of five people, then walk away and watch the carnage. Luckily, you’re reading this and not talking to me in the hallway, and I don’t mind putting my definition out there and seeing what happens, but first, the story.
In 2007, a systems administrator by the name of Patrick Debois was consulting on a large data center migration project for the Belgium government. Patrick was in charge of the testing for this migration, and he spent a fair amount of time working and coordinating with both the development and operations teams. Seeing the stark contrast between how development and operations teams functioned, Patrick got frustrated and started thinking of solutions to this problem.
Fast forward to 2008, a developer by the name of Andrew Shafer attends the Agile Conference in Toronto. At the conference, he proposes an ad-hoc discussion session called “Agile Infrastructure.” Andrew received such poor feedback on his proposal, that he didn’t even attend the session himself. In fact, only a single attendee joined the session, Patrick Debois, but because Patrick was passionate about discussing this topic, he tracked Andrew down in the hallway of the conference where they had an extensive discussion about their ideas and goals. Directly out of those conversations, they form a group called the Agile Systems Administrator Group.
In June of 2009, Patrick Debois was back in Belgium, watching a live stream of the O’Reilly Velocity 09 conference. At this conference, two employees from Flickr, John Allspaw and Paul Hammond, gave a talk titled “10 deploys per day. Dev & ops cooperation at Flickr.” Patrick was moved by the talk and was inspired to start his own conference in Ghent, Belgium. He invited, developers and operations professionals to discuss different approaches to working together, managing infrastructure and rethinking the way the teams worked together. Patrick called this two-day conference, DevOps Days. A lot of the conversations about the conference was happening on Twitter, which limited the number of characters per message to 140. In order to save as many precious characters as possible, Patrick shortened the hashtag to be used on Twitter from #devopsdays to #devops, and with that, DevOps was born.
It’s been more than ten years since that fateful meeting. DevOps has moved beyond small web startups and has begun to penetrate larger enterprises. The success of DevOps brought the most cantankerous enemy of any movement; market forces.
According to LinkedIn talent solutions, in 2018 the most recruited job overall, not only in tech, was DevOps Engineer. Considering we’ve defined DevOps as a set of practices, it’s strange how a style of work quickly became a job title. You’ve never heard of an Agile Engineer, because it sounds silly. As transformational as DevOps is, it couldn’t escape market forces. With that much demand, the job title of DevOps has led to scores of candidates rebranding themselves as DevOps Engineers. Product marketers are looking to cash in on the DevOps craze. Simple products like metrics and monitoring get rebranded into “DevOps Dashboards”, further diluting the meaning of the word. With the market pulling the term DevOps in different directions, its meaning has splintered into different meanings for different people. I could spend the entire article arguing about what DevOps should and shouldn’t mean, but I’ll instead use the definition that I have proposed above. If you ever see me at a conference and want to see me go on a tirade, ask me what it’s like being a “DevOps Manager.”
What DevOps isn’t
Ironically it might be easier to define what DevOps isn’t rather than what it is. Thanks to market forces it will probably fall on deaf ears, but because this is my article, I figure I might as well go for it! For starters, it’s not about tools. If you’re reading this article hoping to learn about Jenkins or Docker or Kubernetes or AWS, you’re going to be sorely disappointed. Feel free to scream into the ether with your disdain.
DevOps isn’t about tools, but about how teams work together. Technology is definitely involved, but honestly the tools are less important than the people. You can install the latest version of Jenkins or sign up for CircleCI, but if you don’t have a solid test suite it’s useless. If you don’t have a culture where automated testing is valuable, the tool doesn’t provide value. DevOps is about people first, then process, then tools. You need the people on board and ready for change. Once the people are on board, they need to be involved and engaged with creating the process. Once a process is created, you now have the necessary input to pick the right tool! Many people focus on the tool first and try to work backwards from there. This is probably one of the top DevOps follies. You can’t choose a tool and then tell the people that they have to change all their processes. Our brains are wired to immediately be hostile to that type of approach. When tools are launched like that, the tool feels like it’s happening to them, not through them. This is a major difference in how people approach and accept new ideas. You must have buy-in. In addition, when you get excited about a new tool, you begin applying it to problems you never had. When you buy a new table saw, suddenly everything in your home becomes a construction project. It’s the same thing with software tools.
All this is to say that the major focus of this article and DevOps is about people and their interactions. Although I may reference specific tools here and there, the article tries to avoid giving specific examples around architecture. Instead what the examples focus on are capabilities, regardless of what tool provides that capability. The DevOps philosophy aims to place people first when addressing problems and keeping them top of mind throughout the transformation process.
CAMS, the pillars of DevOps
DevOps is structured around four pillars of attention and focus. Those pillars are Culture, Automation, Metrics and Sharing (CAMS) as it’s called for short. Recently some people, including Andrew Clay Shafer himself, have taken to calling it CALMS, with the L standing for Lean. I prefer the original version. I highlight this difference in case you hear the term in the wild. As illustrated in figure 1, the pillars for DevOps are crucial to holding the entire structure up.
Culture, Automation, Metrics and Sharing are all necessary for a successful DevOps transformation.
- Culture is about changing the norms by which your teams operate. It might be new communication patterns between teams, it might be new team structures completely. Cultural changes are dictated by the type of cultural problems you have. Don’t underestimate the value and impact of a company’s culture on its technology outcomes. Most problems are people problems, not technology.
- Automation isn’t only about writing shell scripts. It’s definitely part of it, but automation is about freeing engineers from the mundane. It’s about empowering people to do their jobs safely and autonomously. Automation should be used as an expression of your cultural views on how work gets done within your organization. Saying “automated testing is a cultural value” is one thing, but forcing the requirement through automated checks and merge requirements enforces that cultural norm. When implemented properly, it sets a new standard for how work is completed.
- Metrics are the way you tell if something is working. The absence of errors isn’t sufficient. Metrics are used as a cultural re-enforcer for how we evaluate our systems. It’s not enough for order processing to not produce errors, we should be able to show successful orders flowing through the system as well.
- Sharing is this idea that knowledge wants to be free! Humans often learn best when they’re teaching something to someone else. Sharing is about creating that—ready for it—cultural re-enforcer! Knowledge management is incredibly important in a world where we continue to build more and more complex systems.
That’s all for this article.