The “traditional” Operations role is fading away with cloud computing being more widely adopted and organizations realizing the necessity to make a DevOps transformation to deliver software and services to their customers.
Think for a minute – what might OPs look like for a startup who is developing a cloud-based service and application which is rapidly growing? Do they have: On-premises servers? A manual process to spin up a VM-based application environment which takes days or weeks to provision? Tape backups? Likely none of this – of course.
So what. Our company is special and not a startup so I can keep my functions in my OPs role. Start your special excuse here: We have a very complex application with lots of dependencies. We have super crazy security/regulatory requirements. Guess what – for that new app your company is starting to create they have the ability to do it in the same way as a startup. Also, what happens if a startup competitor is shipping code from ideation to production 30x faster and completing those deployments 8,000x faster than you by operating in this way (see Puppet Labs 2013 & 2014 DevOps reports)? Oh and one more thing, you can apply DevOps principles to non-cloud applications too!
Alright, so what are the top newer skills an “evolved Ops” person will need to survive and thrive in a DevOps world? An evolved Ops person will definitely need to have plenty of soft skills to better interact with the business owners and developers, but for this article I’d like to focus on the technical. Traditional Ops technical skills like DNS, network monitoring, security, and scripting can still be useful even in the new world. The skills below are likely different or newer than a “traditional OPs” role. All of these skill areas are something which developers and operations should work together on with different levels of knowledge and responsibility.
- Configuration as Code – includes infrastructure configuration as code and application configuration as code. This means that only code is required to go from nothing to the infrastructure and application being fully provisioned and configured with the latest build/bits. There are a ton of different ways to accomplish this and it is possible to do this without cloud computing although it is far easier with cloud. I’d argue this is the most important skill because so many DevOps practices are built upon or made much easier with this foundation. The developer is going to have the depth knowledge on what they need for their application to work and the ops person is going to have expert knowledge on the infrastructure, scale, security/compliance, logging, and monitoring definitions required for production. Furthermore if you do this using technologies like Chef, Puppet, or PowerShell DSC you’d also be enabling configuration management practices to ensure that your environments are always exactly what you’ve defined them to be (declarative).Some example tools/technologies to enable this practice:
Ansible, Azure Resource Manager Deployment Templates, Azure xPlat CLI, Chef, Docker, Microsoft Azure Markup Language (MAML), Packer.io, Puppet, PowerShell, SaltStack, Vagrant.
- Application Architecture – An evolved Ops person would at a minimum understand how the fundamental architecture of the applications they are supporting work because they need to quickly and intelligently troubleshoot whether the problems which might arise are due to infrastructure or the app code itself to get things resolved. Furthermore, if they are responsible for the resilience of the application (i.e. scalability, fault-tolerance, performance) they should understand what infrastructure needs to be created, how it should be configured, and be able to put that back into the config as code. A rock star OPs person might find bugs with the way it works (i.e. find memory leaks or disk IO bottlenecks) or even go as far as create a fault injection system for the underlying infrastructure. Note: All this being said, I’m not saying OPs needs to be a full blown architect or understand every innate detail of how apps work.
- Self-Service Environments – is generally not a new concept in the IT world, but if it is even implemented in an organization it’s likely either not fully automated, takes a long time to actually get resources, or only the base infrastructure gets installed without the current build of the app on top of it. The infrastructure and application should be ideally provisioned in minutes or hours in a fully automated fashion. If config as code is already in place, this also becomes much easier to put a wrapper of automation and process around it. Self-Service environments helps developers more quickly develop and test software and control costs for the business.Some example tools/technologies to enable this:
Azure Automation, Azure Scheduler, System Center Orchestrator, System Center Service Manager, System Center Service Management Automation (SMA), Visual Studio Release Management
- Application Performance Monitoring – availability monitoring is a given and been around in the OPs world for a very long time, but really only scratches the surface for quickly getting to the root cause of outages. Performance monitoring has been around not quite as long and can help get to the line of code where a problem exists along with rich data like call stack information leading up to the issue. This can be a very powerful tool not only in production, but also in dev/test or performance testing environments.Some example tools/technologies to enable this:
Microsoft Application Insights, System Center APM, New Relic
- Release Management – An evolved OPs person should be helping in some capacity to enable continuous deployment within the organization. Once a developer checks in code, that should result in good build and if the build passes it should go into running environment(s) all the way until production. If continuous deployment has already been implemented, then they’d still need to understand the release management system for purposes of security/compliance/auditing, recovery mechanisms like rollback or roll-forward, and to make sure the system is highly available.
Some example tools/technologies to enable this:
Hudson, Jenkins, Team Foundation Server (TFS), Visual Studio Online (VSO), Visual Studio Release Management (on-premises or cloud)
Underlying Skills which happen as a result of the above:
- Source code control systems fundamentals (i.e. Git or TFVC)
If you’re helping to create configuration as code, then you need to check it into source code along side the application. Also, you’ll likely need to work with builds and things for your release pipeline.
- Software development lifecycle fundamentals
You know need to understand things like what is a build, sprint, scrum meeting, and be able to work with and file bugs back into whatever system your developers are using.
Where can you go to learn these skills? There are a TON of different places, many resources of which I have personally created, embedded links above, and via other blog posts here with more to come. Also, check out the 2014 DevOps summary post highlighting some of the most prevalent resources.