The tragedy of DevOps

Ever since I can remember, I have always had a drive for optimizations across my daily life. From how I do my groceries, to how I organize my work desk. Naturally, years ago when I first heard of DevOps, it immediately got my attention. It promises increased development speed and quality. So how come many organizations try it and fail to see any improvements. Sometimes they even get worse results.

The tragedy is that as engineers we tend to focus on the technical aspects of problems and ignore the non-technical ones. The reason is that many people don’t fully understand the DevOps Ways. Here are some of the most common misconceptions.

DevOps means a team of administrators who are focused on automation.
DevOps means automating everything.
DevOps means cool tools.
DevOps means working with cloud infrastructure.

The missing aspects

In my opinion, the non-technical aspects of DevOps are equally if not even more important than the purely technical. Let’s talk about those for a minute.

Local vs global optimization

Traditional IT development and operation have teams based on their domain of expertise. Their respective goals usually are focused on local optimizations. This means that they might apply improvements to their domain without thinking if it could lead to the degradation of another. This usually causes tension between teams.

It's just like Ivan Krylov’s Fable “Swan, Pike & Crawfish”. Teams start to pull in opposite directions and get nowhere.

Original image from obrazovaka.ru.

Consider a business that aims to provide a stable and secure service (or product) with fast feature development. Now let's see how that translates to the teams.

Operations

The main goal for the Operations team is to achieve high resiliency and availability.

Operations might choose to limit the number of changes to production due to stability concerns. As we all know, the most common cause of incidents can be traced to changes.

Development

Fast feature delivery is the main goal of the Development team.

Code quality might be lowered for the sake of speed. Bugs can creep to production due to lack of (or limited) testing. The team can also lose interest in how the system behaves in production.

Security

High security standards are the goal of the Security team.

Security might choose to implement some controls with a minimal benefit without considering how it might affect other teams. This directly translates to a negative impact on the business.

What does that mean?

Each team has a goal that is in line with the business. But each team's local optimization leads to global degradation.

How should we address this?

Usually, you will hear how ‘you need to break the silos’ and ’implement cross-functional teams’. That is not necessarily needed. It all boils down to having a common understanding of what the business goals are, how your decisions affect the other teams (and thus the business), and trust.

Feedback loops

Feedback loops are something that can be seen in all aspects of life. Some scientists even believe that they are the basis of consciousness.

How do feedback loops fit in DevOps?

No one is perfect, and that is normal. It becomes a problem if you stop trying to improve yourself.

This translates to systems as well. To improve a system, you need to observe it, identify weak points, and make enhancements. This is what is called the Plan-Do-Check-Act or Deming cycle.

Let's see two examples. One with a missing feedback loop and another that is ineffective.

Missing feedback loop

Consider a typical siloed organization, test and production environments have different architecture. Even if they are set similarly they often drift with time. This is caused usually by manual changes that are not applied to all environments. This leads to entropy which sooner or later will bite you.

The Operations team might apply some web server configuration change after a production incident. This change should be communicated to development and applied to the test environment. If this is not done, it can easily mask new issues, that will be encountered later in production.

Ineffective feedback loop

The Development team finishes a new feature and functional tests were successful in the test environment. The new feature is deployed to the production environment and then Operations notice a large increase in the database response times which impacted the business.

The issue was detected in production, which means that the development team got feedback after the issue was visible to clients and already causing trouble for the business. Such feedback loop is ineffective. The development team should have been aware of the database response time increase before the feature reached production.

How should we address this?

You need to increase the flow of information between teams. This will allow you to identify and fix problems sooner.

Conclusion

The non-technical aspects are the starting point for a successful DevOps transformation. The automation and tools are just there to help and don't make a difference by themselves.

There is no single way to implement DevOps in an organization. If you want to learn more about this topic, read these books to understand DevOps, and successfully apply it.