Maintaining backups for your Git repositories and having a plan to restore them are both essential. A vast majority of organizations and developers rely solely on a hosting provider like GitHub or Bitbucket, which does not always guarantee they will have access to their Git repositories or that these are safe from loss or tampering.
Threats may be external—like a malicious entity acquiring access to the code hosting account and erasing repositories, making irreversible changes to the code base, or introducing vulnerabilities that are difficult to track down. Or they may be internal—like an administrator losing unrecoverable credentials to a code hosting account, a developer unknowingly performing a Git rebase and forcefully pushing changes, or a disgruntled employee deleting a repository. Without proper safeguards, a developer or organization may stand to lose a lot.
However, when you have a backup and recovery plan, you are able to restore your repository to a recent state in the case of any incident, with limited interruption. Deleted or corrupted repositories can be restored with minimal hassle, and if a code hosting account is compromised or is inaccessible, you can set up an alternative account. Stakeholders who rely on your code can make a fast recovery as well. Backups are also important to preserve metadata like wikis, pull requests, issues, etc. not available through just cloning. In this article, you’ll learn how to maintain some form of a full or limited backup using tools like Git clone and Cron scripts, SCM, Syncthing, and Rewind Backups for GitHub. Furthermore, you’ll understand how each method works, its pros and cons, and whether it results in a genuine backup.
Local Backups with Git Clone in Cron Scripts
To create a local backup, you will write a script to clone your repositories on a regular basis using cron jobs. Instead of writing over one clone, you’ll capture multiple snapshots of the repository taken at different times. This makes restorations easier as you can pick which snapshot to restore without making any changes to it first. The cron job tool/generator you’ll use will depend on what is available to your operating system.
To ensure you have a complete copy, use git clone --mirror
to mirror your repositories. These copies will include all remote and local branches, tags, and refs. However, note that with cloning, you do not get an identical copy of the repository. The cloned repo is not a full backup; it lacks hooks, reflogs, configuration, description files, and other metadata.
The major benefit of this method is that you aren’t relying on multiple external tools for backups. This method limits security vulnerabilities and has the bonus of being free; however, it can be intensive and requires technical knowledge of scripting and Cron jobs, among other things. If you don’t already have this knowledge, it can involve a bit of a learning curve.
Compared to other methods that use tools, writing these scripts involves more work because you have to write them from scratch. This method can also get increasingly complicated if you’d like to add error monitoring, logging, job retries, or error notification. Without these features, you wouldn’t know when a job failed. Also, if you’re going to maintain multiple snapshots instead of rewriting a single clone, you’ll have to account for cleanups and archiving.
Online Backups with SCM Backup
SCM Backup is a backup tool that allows you to make an offline clone of a repository you have hosted on a code hosting provider like GitHub or Bitbucket. It’s especially helpful for backing up several repositories at once, across multiple users or organizations on your hosting service. In order for it to work, you will need to have Git or the VCS used on your repository installed. .NET Core also needs to be installed. It retrieves a list of all your repositories using the API of your code hosting service and then clones all the repositories into a local backup folder, except any repositories you’ve chosen to exclude. If a repository already exists, it simply updates it with the latest changes.
SCM Backup is fairly easy to use and requires minimal technical knowledge. Once you’ve set up the configuration file, settings.yml,
located in the backup folder, all you have to do is run the application. It is free, open source, and supports GitHub, GitLab, and Bitbucket. By merely specifying a user or an organization in the settings, it can back up all the repositories made by them. You do not have to worry about specifying individual repositories to back them up, and it can run unattended so you do not have to monitor it when making regular backups. It also maintains a log of its operations that can be consulted in case errors occur, or you can configure email notifications when backups fail. Another plus is that it provides a wide range of configuration settings where you can specify everything from the backup folder location to authentication credentials, email settings, and more. If available on your code hosting provider, SCM Backup can also back up wikis.
Although SCM Backup is much more straightforward to use when it comes to its backup process, it does have a few drawbacks. For one, the repositories it clones are bare and do not contain hooks, reflogs, or configuration files, or metadata such as issues, pull requests, or releases. Additionally, configuration settings may differ in some cases across the different code hosting providers. Lastly, as referenced above, in order to run it, you need to have .NET Core installed on your machine.
Online Backup with Syncthing
Syncthing is a GUI/CLI application that facilitates file syncing across several devices and is available across multiple platforms, including mobile. For syncing to be accomplished, all the devices need to have Syncthing installed on them and be configured to connect with one another. When Syncthing is run for the first time, it generates a unique identifier called the Device ID. On the Syncthing web GUI, you would add each device using the Device ID and then select what folders to sync. This process will be replicated across all the devices; once they have been configured, the syncing commences across them.
Note that, technically, syncing and backing up are two different things. While the aim of a backup is to retain a copy of an original file from a particular point in time for restoration in the case of a loss incident, the aim of syncing is to make sure that similar files across devices are identical at any point in time. When a loss incident occurs on a synced file, it is replicated on the other devices. Although there may be measures taken to prevent this, Syncthing is not the best option to rely upon for a backup.
Some benefits of using Syncthing include that it is free and available on multiple platforms. Since it provides a GUI, it can be easier and more intuitive to use compared to the methods already covered. It also supports error notification and logging in case a syncing problem occurs, and it is customizable, allowing you to configure how syncing will work through various settings.
However, since Syncthing only works between individual devices, you cannot directly back up your repository from a code hosting provider. When an error occurs, syncing immediately stops and requires a manual fix of the error to restart. Syncing a Git repository among multiple devices may lead to repository corruption and conflicts, especially if individuals work on different branches. In these cases, syncing completely stops until all corruption and conflicts are solved.
Additionally, Syncthing is very resource-intensive compared to cloning a repository because its method of synchronization involves continuous scanning, hashing, and encryption. Finally, a major drawback of synchronization is that it maintains and replicates just one copy of a repository, compared to having several snapshots of a repository at various times.
Full Git Backups With Rewind
Rewind Backups for GitHub (formerly BackHub) is a repository backup web service that allows configuration of regular backups of repositories across GitHub, GitLab, or Bitbucket in just a few clicks using its web interface.
These backups are complete: not only do they include the repositories themselves, but they also include other hosting-service-specific metadata like wikis, issues, pull requests, releases, projects, milestones, etc. Backup snapshots are taken daily and have a 30-day lifespan. These snapshots are made available on Rewind’s platform for you to access and download as you please.
Besides maintaining backups, Rewind provides tools for effortless repository restorations on your hosting provider. It also allows you to set up redundant backups on AWS S3 in addition to the backups it maintains on its servers. For compliance purposes, it maintains an audit log that is essential for monitoring and security, and it offers an archive feature for inactive repositories. Also, it allows you to share snapshots with specific individuals or within your team, which may be useful if you are part of a large organization.
Being web-based, Rewind is easy to navigate and makes setting up backups a lot less complicated. It’s secure as it only allows users with owner-level access to configure and share backups within the organization. It allows you to set up email notifications in case of any errors during backup. And, it automatically takes daily snapshots of repositories you specify and maintains them for you for thirty days. If this length of time is not ideal, it facilitates uploads to AWS S3, where the snapshots can be retained for longer, as needed. As you can see, using a complete hosted solution like Rewind has many advantages.
Conclusion
While storing only your code on a hosting provider like GitHub is widely accepted, it does pose some risks. Your organization may be a victim of malicious users who gain access to your hosting account, erase repositories, introduce vulnerabilities, restrict access, or irreparably modify them. Or, unintentionally or not, such incidents of loss can be caused by individuals within your organization. Your hosting provider may also encounter issues of its own and go down, making access to your repositories impossible.
For all these reasons, it’s of vital importance to maintain backups. With a backup, if such an incident occurs, you can quickly restore your repositories without causing much damage to your organization and other stakeholders. While there are many ways to back up repositories, be sure to prioritize a solution that ensures easy configuration and restoration while eliminating opportunities for corruption.
If a set-it-and-forget-it Git backups sounds good to you, why not try out Rewind’s backup solutions for developers today?