As microservices continue to gain adoption in enterprise software applications, more companies are using them to solve interesting problems at scale.
“Breaking a monolith into microservices has clear engineering benefits including improved flexibility, simplified scaling, and easier management—all of which result in better customer experiences.”–Mary Treseler, Vice President of Content Strategy at O’Reilly
But new technologies always bring new challenges, and microservices are no exception. One of the biggest hurdles in adopting microservices is the need for new, unique tooling that works across multiple codebases, programming languages, and HTTP layers. One such example is handling GitHub backups at scale when you use a microservice architecture.
In this post, you’ll see how Mercado Libre—the largest online retailer in Latin America—uses BackHub to manage backups for their massive collection of over 13,000 repositories. You’ll learn about some of the unique challenges companies like Mercado Libre face when backing up their GitHub repositories, and how BackHub has designed an industry-leading solution to handle these challenges.
Introducing Mercado Libre
North American and European readers may not be familiar with Mercado Libre, but the company is an e-commerce giant in Latin America. They operate in eighteen countries and processed over $14 billion in payments in the first three quarters of 2020. In addition to their e-commerce platform, the company distributes a point-of-sale system, a payments platform, and a growing logistics operation.
Technology at Mercado Libre
All these services were built in-house by their team of over 5,000 software engineers, product managers, and designers using a unique microservices architecture. The team now maintains over 13,000 GitHub repositories, which are managed by a custom-built orchestrator.
Each microservice is written in the best programming language for the job at hand (typically Java, Go, or Python) and independently deployed to Amazon Web Services or Google Cloud Platform. This architecture makes their system incredibly robust, but it introduces some unique challenges, especially when the team started to explore backing up all their repositories.
Mercado Libre’s Journey to BackHub
As you might imagine, keeping track of 13,000 repositories is a big job in itself, but the problem gets even more complicated when you face the level of scrutiny that a publicly traded company like Mercado Libre faces. As their payment processing platform grew, Mercado Libre decided they needed to have internal backups of all their codebases hosted on GitHub.
Repository backups are a common part of SOC2 compliance, so while GitHub is a very reliable place to keep your code, your code is a critical part of your infrastructure and backups ensure you have access to it at all times. Like many teams, Mercado Libre started by building an in-house backup solution.
One of their engineers created a script that could be run nightly on an AWS Lambda. It performed a `git clone` on each of Mercado Libre’s GitHub repositories and pushed the data to an S3 bucket. This simple solution worked for a while, but pretty soon, it was clear that it had some shortcomings.
First, there was very little visibility into the backup process. Engineers at Mercado Libre could check the dates on each repository in their S3 bucket, but nobody had time to check each repository every day. Second, the script didn’t use the GitHub API, so it wasn’t able to backup issues, pull requests, or metadata from each repository. If GitHub failed, this data would have been lost. Finally, there wasn’t an easy path to restore data from these backups. Mercado Libre’s engineers would have had to recreate all the GitHub repositories manually, so restoring all 13,000 repositories would have been a time-consuming task.
After running the Lambda script for a while, someone on the engineering team noticed that the backups were no longer working. The permissions on the S3 bucket had been changed, so the IAM role performing the backups could no longer save git repositories to the bucket.
After realizing the rabbit hole their team would be going down to ensure this didn’t happen again, Mercado Libre’s engineers started looking for a better solution. They asked GitHub, who recommended BackHub (now part of Rewind).
Adopting BackHub at Mercado Libre
When Mercado Libre approached BackHub to ask about backing up their 13,000 GitHub repositories, the BackHub team was excited.
“We were happy because we wanted to test the limits of our platform. It was going to be a bit of a challenge, but we knew it would be interesting to work on and were confident the BackHub platform could handle it.” –Steffen Müller, Security Engineer at BackHub
While BackHub handled backups for over 80,000 repositories every day, this would be the biggest single user on the platform, so it would give the BackHub engineers the chance to test a new level of scale.
Mercado Libre tried BackHub for three weeks and monitored the results. The initial backup took four days due to the time-based limits imposed by the GitHub API and the sheer volume of data that needed to be saved, but incremental backups after that took just an hour. Mercado Libre’s team was impressed with the results, but before we get to that, let’s look at how BackHub handles backups at this scale.
How BackHub Works
BackHub is the only GitHub-recommended solution for repository backups in the GitHub marketplace. With years of experience backing up thousands of repositories, BackHub is uniquely able to solve problems like those faced by Mercado Libre. Let’s look at some of the technical factors that allow BackHub to operate reliably at this scale.
Observability and Alerts
Knowing when your software fails is always important, but it’s even more critical when you’re managing backups. So, BackHub’s team has invested heavily in a robust alerting and observability toolchain that makes use of AWS CloudWatch, Honeybadger, and a combination of Slack, email, and SMS notifications depending on the severity level. This allows BackHub to know about and mitigate any backup failures _before_ customers need them.
Understanding GitHub’s API
Any developer can read GitHub’s API documentation and figure out which endpoints they should use to back up their repositories, but making reliable backups at scale is a little more nuanced than that. GitHub imposes rate limits and abuse detection methods to ensure that you don’t overuse their API, so BackHub has built proprietary algorithms that ensure reliable backups within these limits.
BackHub has also learned to handle all the documented and undocumented failures that GitHub’s API might return. Sometimes these errors are intermittent or hard to reproduce, like SSL connection timeouts. They might also be related to a specific API request that is too large or has a character-encoding problem.
“Anyone dealing with third-party APIs has to deal with new errors as they pop up, but at BackHub we have lots of experience figuring this out.” –Steffen Müller, Security Engineer at BackHub
For example, GitHub’s API initially failed to send Mercado Libre’s repository data using their webhook because the payload was too large. Fortunately, BackHub had seen this error before and was able to use their daily fallback trigger to request the data they needed from GitHub.
Experience with Git at Scale
Finally, BackHub’s team has spent years developing expertise in git, the underlying version control system that powers GitHub. Their incremental backups are made possible through the use of multiple git clients so that they can make reliable backups of even the largest repositories quickly and securely.
In addition to reliable nightly automated backups of your GitHub repositories, BackHub saves all your GitHub metadata, pull requests, comments, issues, and projects. They offer the ability to back up repositories to your AWS S3 account, and your data is encrypted in transit and at rest. Finally, if you ever need to restore your backed-up GitHub repositories, BackHub can do so in seconds.
Mercado Libre’s use case is an excellent example of how BackHub can save you weeks of engineering time while improving your infrastructure’s redundancy.
“BackHub has been very good for Mercado Libre. We know it’s working, and we don’t have to maintain our own solution anymore. It runs without any intervention from us.” –Mariano Guelar, Governance Project Leader at Mercado Libre
Mercado Libre’s team got BackHub up and running quickly and with very little configuration required. Once their backups were being stored in S3 every night, they set up a lifecycle policy that moves files to Amazon Glacier storage after sixty days. This allows them to stay compliant while minimizing their cloud storage costs.
If your organization is looking for a robust GitHub repository backup solution, check out BackHub. Whether you’re running a single monorepo or 13,000 small repositories, BackHub can ensure you never lose access to your code.