Part of being an engineering leader is communicating the needs of the engineering team to non-technical stakeholders. Take backups, for example. Backups cost money to store and take time to create. Theoretically, if everyone does their job perfectly, they shouldn’t be necessary, but experienced engineering leaders know to prepare for failure.
In this piece, we’ll look at how to decide what types of backup you need, how they will impact cost, and how you can show non-technical team members the importance of having backups. We’ll look at several cases where backups would have mitigated significant technical risk along the way.
Choosing What to Include in Backups
Organizations typically have a variety of data that’s worth backing up. Server configurations, digital media, code repositories, and databases are just a few of the mission-critical assets that should be considered.
In general, the more important a particular asset is to your business, the more critical it is that it’s backed up.
For example, if you’re using a completely custom server configuration that prevents your application from running on a generic server, this configuration needs to be backed up! However, if you’re using an AMI from Amazon Web Services or some other pre-packaged solution, backing up the state of your infrastructure is probably less necessary.
How Often Should You Run Backups?
The frequency of your backups can be evaluated similarly. If you run a small WordPress blog that publishes content a couple of times per week, daily backups are probably overkill. However, if you run an e-commerce store that’s processing hundreds of customer orders per hour, you should probably have backups running every hour or more.
Choosing what to backup and how frequently to do so is an important trade-off when evaluating the cost and effectiveness of a backup and restoration plan. If non-technical stakeholders push back on the cost of backups, you can reduce the frequency or scope of what’s backed up to find a suitable middle ground.
However, if your team still isn’t sold on the need for backups, you need to make your case.
Making the Case
Having proper backups is an investment, but the reduced business risk often provides a significant return. Let’s look at a few situations where backups can make a difference. Hopefully, you can use these examples to justify putting a rock-solid backup and restore plan in place at your organization.
In July of 2019, Ubuntu Security reported that the credentials for a company-owned GitHub account were compromised. These compromised credentials were used to create repositories, issues, and more.
We can confirm that on 2019-07-06 there was a Canonical owned account on GitHub whose credentials were compromised and used to create repositories and issues among other activities. Canonical has removed the compromised account from the Canonical organisation in GitHub and is still investigating the extent of the breach, but there is no indication at this point that any source code or PII was affected. – Ubuntu Security
In this case, it seems that critical infrastructure was decoupled from Github, and the breach wasn’t allowed to spread. However, Canonical (the publisher of Ubuntu) had to restore various repositories and issue trackers to their previous state. When rolling back from an account compromise like this, backups are infinitely helpful, as they give you a previously-good state to compare with the current state.
Additionally, attackers often leave backdoors in compromised codebases. This can allow them to gain greater access once the initial discovery and remediation process is completed. If you only have your infected codebase, it can be challenging to uncover all the infected files or possible vectors for future attacks.
Ransomware is the act of taking control of a codebase or set of infrastructure and encrypting it so that only the attacker can unlock it. In exchange for returning access to the victim, the attacker demands a ransom be paid, usually in cryptocurrency. If the ransom is not paid in a certain timeframe, the attackers threaten to delete the files, rendering them irrecoverable.
This is exactly what happened in May of 2019 when ZDNet reported that, “Hundreds of developers have had Git source code repositories wiped and replaced with a ransom demand.” The hackers had modified the git histories to the point where the repositories were unusable, and they demanded payment within ten days to reverse the changes.
If a compromised repository is part of your organization’s core codebase, an attack like this will cause a severe disruption in operations. Developers will be unable to commit code, creating a complete stoppage in new feature development. Bug fixes and even support tickets (if managed through GitHub) could be affected. However, if you were hit with an attack like this and had a backup of your repository, you could restore from this backup and continue working on day-to-day tasks while the issue was resolved.
Ransomware attacks are most effective when the victim has no other option for accessing their data besides paying the ransom. Having a complete backup – even one that is a few hours old – ensures that you’re not helpless in the face of a ransomware attack.
Many companies have moved parts of their codebases to 3rd party providers. The reasons for this are many, but if code storage is not your core business, you can save time and money by relying on an outside service.
However, depending on a 3rd party always carries some risk, and GitHub is no exception. No service can deliver 100% uptime, but when your entire business (or codebase) depends on GitHub’s availability, you might want to mitigate that risk by having your own backups.
For example, in June 2020, Github had a major outage that lasted for hours before stability returned. If you relied on Github to store your code, this meant much of the development work for the day was on hold until they resolved the issue. These outages may have an impact on developer productivity and project timelines if they occur during a crucial launch window.
Like the ransomware scenario described above, the best way to mitigate service downtime is to have a plan in place and a backup of any repositories and associated metadata. While you can create these backups on your own, using a service like BackHub will make it significantly easier. With a proper backup and restore plan in place, what could potentially be a work-stopping outage can instead be reduced to just an alert and switchover.
Reducing Platform Dependence
One of the downsides to using a 3rd party provider for something that’s not your core business is that your business depends on that provider. Those providers are businesses in their own right and may face financial or regulatory pressures that limit your ability to use their platform.
For example, in the summer of 2019, GitHub was forced to comply with US export law and had to prevent users in Iran, Syria, Crimea, and other sanctioned nations from accessing their service. Anyone in the affected countries was cut off and forced to find a different repository host.
This is another case where backups would have been invaluable. With a backup, your repository can be restored and pushed to another provider, or you can maintain a self-hosted version of your repository. Events like these are rare, but having a backup is a small price to pay for maintaining access to your code.
Some of the case studies detailed above may be more applicable to your business than others, so you will need to decide which will resonate with your organization’s non-technical stakeholders. However, presenting concrete scenarios like these can be a great way to start the conversation. Framing backups as an investment that reduces business risk as opposed to an unnecessary expenditure can be powerful.