DevOps Teachings of COVID-19

COVID-19 or Corona Virus Disease 2019, has become a world pandemic because the RNA strain of this virus is relatively new to humans and there is no possible vaccine available. But we are here not to talk about the virus itself. We are going to explore a few things that I think a DevOps engineer would be able to relate to.

Pandemic

I have read multiple articles that ask the question of why the World Health Organization (WHO) did not declare pandemics sooner. Think about it like this, when a bug in your production system is reported by one user, the severity of that trouble ticket will not reach to critical right away, there needs to be a validation period (no matter how small) to analyze the report and move forward. So maybe the reason WHO waited is to make sure the severity and effect of the virus.

Quarantine

So what do we do when we get a bug report that is affecting end users? The first thing to think about is to stop users from using the software so that the affected users are fewer. Lock Down/Suspend usage (of) the software for all users. This is equivalent to Quarantine. Govt. is putting lockdown and quarantine to avoid further spread of the virus just like a good DevOps engineer would suspend the services.

C/FR

I have also seen people talking about the fatality rate of this virus being low. In my opinion, doesn’t matter how low the fatality rate is, fatality is unexpected. 0.0001% fatality is still 100% for the victim and victim’s family.

Let me give you an example of the same thing regarding software. Let’s consider that one of the end-user has reported that while using ATM withdrawal, the account balance always falls short 1 cents per transaction. For simplicity let’s just consider that this only happens once the account balance reaches 100 euro or a multiple of 100 euro. 1 cent is 0.0001% of 100 euro. That’s not much. but would you let this pass? Because when you think about 30 million users out of 83.73 million, (population of Germany) each having 200 euros (imaginary number) in the bank account, losing 0.0001%. This sums up to a huge number. So no matter how small the number seems the cumulative effect on our society and as well as on the organization is far worse than the number shows.

R&D

Now at this point, you might be thinking suspending service because of a bug is extreme. There are other ways to handle this, like try to hot patch, live update rolling to a new version, etc. I agree, but in the case of COVID-19 it all failed and the virus kept spreading. If, as a team leader or product owner you think that the bug can be fixed with a hot patch then it’s fine but if the number of impacts is bigger then it’s safe to suspend the service and take a bit of downtime to properly handle the bug.

So, how do we handle a bug? Let’s look at the COVID-19 again, to determine the reason and its adverse effects on the human body knowing a few things is crucial, like the origin of the virus, the patient zero, the symptoms it showed. So how does all this fit in a software bug?

To determine the cause of any software bug, we debug the source code to find the originating line(s) of source code and then determine the effects of those particular lines of codes in the whole ecosystem of the software life cycle. We also take into account what the user experienced while the software introduced the bug. e.g. have you seen the blue screen? Error message? What sort of error message did it show? Did the screen freeze? etc. So, the origin of the bug is determined, the patient zero and the original symptoms have been analyzed. so what now?

Isolation

One vital step towards solving the bug or the virus is to be able to isolate the virus/bug. The scientist who creates a cure for any virus first tries to isolate the virus so that they can create medicine that won’t hurt the human body. In software source code, developers debug the same way, finding the bug and isolating the lines of codes from the system, and see how to solve the issue without affecting the system.

Time

Creating a vaccine, in this case, a fix for the introduced bug should be straightforward and simple. Isn’t it? we isolate it and fix the bug. Sometimes, it’s not that simple to fix the bug or kill the virus without harming the system. So it takes time to determine the right course of action and to decide which components to use. In the case of COVID-19, it’s important to find out the most feasible material that can bind to the protein spikes and in case of a software bug, it’s important to make sure to find the correct functions and logic to solve the bug without breaking other logics in the code eco-system.

Now, I hope we understand why it’s sometimes time-consuming to find a fix for the software bug. Sometimes we encounter a bug that changes the way we thought about the whole software architecture. COVID-19 has changed the way we look at our work and daily lifestyle.

Lifestyle

So, what to do? Some of WHO’s (World Health Organization) recommendations include that we a. wash our hands with soap for at least 20 seconds b. Cover our mouth while we sneeze or cough c. distance ourselves from each other d. stay home (no frequent visit to friends and no party). All of these can be considered as personal lifestyle habits. Social distancing is new and difficult but a timely measure to prevent the spread of the disease.

So how this helps in the DevOps chain of software development? Think about these habits as personal habits of developer and operations individuals. Developers need to write more tests so that pushed source code can’t infect the rest of the system. Operations or DevOps should put more strict policies and quality gates to make sure that all the corner cases are met before starting a new build. Keep all of the involved individuals from different teams informed about the current state of the build.

Solidarity

Sometimes all these measures are not enough because when the situation gets more complex, it requires collaboration from different teams and the different mind thinks from different angels. As WHO has solidarity testing to make sure that all the countries participating in this knowledge about the virus and have access to the same resources so that the process of creating a cure is more solid and quicker.

The same goes for a team of engineers developing source code from a different perspective of the software development lifecycle. Sometimes, despite the differences, they need to work together to solve the bug faster and create a better user experience for all the end-users.

Stay safe!