Many people have tried to define what DevOps is and it is challenging. There is no “manifesto” like in Agile. I personally like the CALMS acronym, but this also has some challenges. So, I typically start to talk about DevOps at a very high-level that it involves People, Process, and Products/Tools – and in that order intentionally. If the People and Process aspects of DevOps are broken or not addressed at all, then even amazing products/tools (Microsoft or not!) will only be minimally impactful to the business and you may not even be on a DevOps journey at all. No tool, product, or DevOps practice implemented alone makes you “DevOps”! For more on this, check out “the three musketeers” blog post.
All of this being said, there is a challenge in how to wrap your head around DevOps partially because of the lack of clarity in the definition, there are so many products and tools related to DevOps, and people/process discussions are hard to correlate to tools/products. For instance, how do you go from talking about things like the organizational structures, ITIL processes, and a no-post-mortem blame culture to a concept like “create a Chef recipe for a cloud service application”? Let’s even theoretically assume you’ve got the people and process aspects of DevOps sorted out, now where do you start?
Well I must state before you begin any DevOps journey, you have to assess where you are at first if you haven’t already. This might include one or more of the following:
- Gathering Metrics – like MTTR, MTTD, change/deployment rate, etc.
- A DevOps Self-Assessment – such as Microsoft’s DevOps Self assessment
- A consulting engagement – where they might do something such as creating value stream map
After this, DevOps practices (~200 level) can help bridge the gap between the high-level (~50 to 100-level) DevOps people/process discussions and the (~300 level) product/tool discussion. They can help to companies focus on specific areas of improvement and help scope the product/tool selection. DevOps practices are a collection of mostly well-known industry practices that can be implemented using a variety of products/tools (Microsoft or not). When you implement these practices they will give you some form of business benefit and help with an aspect of your DevOps journey. To be clear and re-iterate, just because you implement one or more of these DevOps practices – doesn’t mean “you’re now doing DevOps” if you haven’t addressed the people/process aspects first. For instance, developers on a feature team could implement Infrastructure as code for their dev/test environments in isolation and then “lob over the wall” the application to the operations team to deploy to production who doesn’t use the infrastructure as code at all.
More specifically, here are what I call fundamental DevOps practices in no particular order:
- Infrastructure as Code (IaC) – is the practice in which the techniques, processes, and tool sets used in software development are leveraged to manage the deployment and configuration of systems, applications, and middleware. A significant number of testing and deployment defects occur when developers’ environments defining the application and underlying infrastructure differ from testing and production environments. Standardizing these environment definitions, putting them under version control, and deploying and configuring the infrastructure and application automatically from the code in version control, yields immediate benefits in consistency, time savings, error rates, and auditability.
- Continuous Integration (CI) – is the practice of merging all working copies of developers code with a shared mainline, producing a new build upon code check-in. Ideally CI also involves libraries of Integration or unit tests, also automatically triggered based on new code being checked into source control. Once automated Integration tests are successfully completed a known good build of the software is produced.
- Automated Testing – is the practice where various tests such as load, functional, integration, and unit tests happen automatically either after you check in code (i.e. attached to CI) or some other means to fire off one or more tests automatically against a specific build or app. Manual tests can generally add value to your software, but this could be considered “waste” in a value stream since it slows down the process of delivering value to the customers and can become a significant bottleneck when your velocity of code changes increase. An automated test adds value in the value stream by efficiently ensuring the quality of code is increased, finding defects prior to going into the customers hands.
- Application Performance Monitoring/Management (APM) – is the practice of having visibility into key metrics about your application as well as alerts and logging about the health of your applications. These metrics, alerts and logging enable you to react in a timely manner to changing or business impacting conditions. In the ideal, these items are accessible via a variety of user-friendly interfaces that are easy to navigate and provide drill-downs to help facilitate taking action as well as root-cause analysis down to the line of code. While the user interfaces tend to be more targeted to Operations and business owners, having the right data requires collaboration with Development to appropriately instrument applications to deliver this data.
- Continuous Deployment (CD) – is the practice that usually comes after CI and can be implemented to push a new known good build to a single environment either automatically or via automation capabilities that an authorized user can schedule. Note: Some people state that continuous deployment should equal deployment all the way through to production, but then IMO this starts to make things really confusing (i.e. If I’m “continually deploying” after every check-in to a single QA environment this doesn’t count or how do I describe this? If I am continuously deploying to all environments leading to production am I then doing release management?). On a related note, Continuous Deployment and Continuous Delivery are also often confused partially because people tend to throw around those terms interchangeably (but incorrectly) and they could both be abbreviated as CD. In my opinion, Continuous Delivery should not be abbreviated as “CD” and it is a special combination of all of these DevOps practices working together: Continuous Integration + Continuous Deployment + Automated Testing + Release Management. By defining continuous delivery in this way, it still fits in line with the Continuous Delivery Wikipedia link and The Continuous Delivery book and allows more granularity and simplicity in definition and helps break apart the work towards implementing the entire chain of practices.
- Release Management – is the practice which provides the ability to automate deployment of new applications as well as changes to applications across managed environments. Release Management facilitates packaging these changes into known, documented releases that are deployed via workflow through pipelines of ordered release stages (Release/Deployment Pipelines). The pipelines enable approvals, traceability and rollback if required. Lastly, roles, responsibilities and access levels for various artifacts and actions can also be managed through Release Management.
- Configuration Management – is the practice for establishing and maintaining consistency of a product’s performance, functional and physical attributes with its requirements, design and operational information throughout its life.
Other DevOps practices which I’d subjectively say fall out of the “fundamental” bucket (although are important) and may be less known might include (in no particular order):
- Advanced Monitoring – is the practice of implementing continuous monitoring techniques beyond basic ICMP ping or HTTP 200 status code responses such as synthetic/active monitoring to ensure availability and performance. Synthetic monitoring records critical user interactions with an application and then continuously monitors to ensure the performance and availability of everything required to complete those user interactions.
- Capacity Management
- Feature Flags
- Self-Service Environments – is the practice where someone can go to some form of hosted interface/portal/site to request a new environment and then immediately upon request or approval, there is an automated process to provision the entire environment. This environment might be a production-realistic infrastructure with the latest build of software for instance.
- Automated Recovery (Rollback & Roll-Forward) – is the practice of being able to very quickly recover from failures by rolling back to a known good state of the infrastructure and application code or “rolling forward” by quickly identifying the issue and automating the complete deployment of the fix to the infrastructure or app code.
- Hypothesis Driven Development – is a way of thinking that can include a number of DevOps practices such as:
- Testing in Production – is a practice where you deploy new parts of your code to only a fraction of the total user base in your production environment.
- Fault Injection – is a practice where you try to break application code or highly-available infrastructure which supports your application. Most people know of this from Netflix’s Chaos Monkey and the Simian Army.
- Usage Monitoring/Telemetry – is the practice where you gather actual user data/telemetry on what features/capabilities people are actually using in production.
- A/B Testing (aka canary testing)
How do you determine which practices to focus on first? I did give my “Top 5” list for Ops, but truthfully it really depends on where you are at (remember the assessment piece) and what your goals are. I could describe more of the benefits of each practice and explain how some of these practices relate or rely on one another, but I’m trying to keep this blog post less than a novel :). What also might be useful for you to think about the practices is Gene Kim’s “3 ways”; all of these practices fit under one or more of these.
Still reading? Would enjoy getting your feedback on what you think via comment to this post… i.e. Like or dislike these being called “DevOps practices”? Missing some key practice? Think of things differently? Was this helpful for you?