Recent attacks on the GitHub platform have caused organizations to consider the safety of their data in GitHub repositories. These attacks have shown that GitHub’s built-in security features are not always adequate for enterprise-level native security. In the past, developers have unknowingly shared sensitive files including SSH keys, making them available to anyone searching the public repositories, including hackers.
Which begs the question: is GitHub still safe to use? With over 80 million repositories worldwide, GitHub is easily the most popular open-source code management system. It’s a vital part of many developers’ toolkits, and giving it up could certainly be quite daunting.
Fortunately, with clearly defined security roles and responsibilities for your organization and your cloud provider, as well as an effective repository backup and restore system, you can ensure that your GitHub repos remain safe and secure. In this article, we’ll discuss some of GitHub’s security concerns, and discuss ways to overcome them.
Recent Attacks on the GitHub Platform
The risks of using GitHub can be better understood if we take a closer look at some of the recent attacks on the platform.
In March 2020, acting on information from a security researcher, GitHub Security Labs found the Octopus Scanner malware in 26 of its repositories. Octopus Scanner targets open-source software and activates when a developer downloads an infected project from the GitHub repository. It only affects machines with the Apache NetBeans IDE for Java development installed.
Octopus Scanner then deploys a remote access Trojan (RAT) that infects build files and project source code. The RAT sends information back to the cybercriminals, allowing them to take control of the machine. Octopus Scanner also prevents source code files from being replaced or overwritten making it harder to remove the malware.
In early 2021, GitHub security engineer Justin Perdok detected cybercriminals targeting GitHub repositories for unauthorized crypto-mining operations. Cybercriminals forked repositories with GitHub Actions enabled, added malicious code to legitimate code, and filed a pull request to merge the compromised code back to the repository.
Because GitHub Actions triggers automatic responses after checking in code to the repository, the pull request automatically downloads the crypto mining software to the owner’s repository. By the time the attack was discovered, hackers had deployed almost 100 crypto mining apps.
The Risks of Cloud Storage for Code
Cloud data storage, like GitHub, is one of the most common cloud solutions used by enterprises. While most providers have security measures in place, there are still several potential risks associated with cloud storage systems. It’s important to understand the key concepts and the responsibilities of each party involved with maintaining security in the cloud.
Cloud-Security Concepts
Cloud security professionals use technology, protocols, and industry best practices to protect environments, data, and applications running in the cloud. They’re concerned with securing applications and data, virtual machines, operating systems, and physical infrastructure including network hardware and end-user devices.
Cloud service providers practice a Shared Security Responsibility model that ensures that all parties are aware of their individual responsibilities for the security of the cloud infrastructure. The level of responsibility between client and provider will vary based on the service being provided.
Vendor responsibilities include the maintenance of data centers and cloud infrastructure to prevent service interruptions and backend mechanisms like security features and tools for managing access controls. Client security teams are responsible for using best practices and configuring security controls to protect the organization against cyber threats, intentional and accidental data loss, and insider threats.
To maintain a secure environment, responsibilities must be clearly defined to indicate the assets, processes, and functions each party owns.
Comparing Cloud Providers with Self-Service
With concerns about data security in online repositories like GitHub, some companies prefer to deploy the repository on their own hardware. Code lives in a secure space disconnected from the internet giving you complete control over who can access your code and the level of access they have. An on-premise solution may also be the only option for organizations in industries that prohibit data retention in cloud environments.
Besides having more control over the security of your code, an on-premise repository gives you complete control over your tech stack, reduces latency to improve performance, and offers greater compliance with regulatory controls.
There are several hosting platforms to choose from for repository management including GitHub Enterprise, GitLab, and BitBucket. Your hardware configuration will vary depending on the number of users accessing your repository but a base infrastructure includes a network, power, and server.
The repository can be set up on a bare-metal machine or a virtual server. Server specs will be based on the number of persons using the repository but minimum requirements start at 4 CPU cores, at least 32GB memory, high-performance SSD with enough storage for files and data.
Security Challenges with On-Premise Git Repositories
While an on-premise solution gives you total control over access to your code, you’ll also be completely responsible for keeping it secure. Unlike cloud solutions, there is no shared responsibility, and all responsibility lies with your security team to make sure that the repository is protected from hackers and internal threats.
Careless users such as developers still present a security challenge. As we saw with the Octopus Scanner attack, there is always the chance of someone uploading compromised code to the main repository.
Users should have the least access necessary and secure server access by restricting IP addresses, requiring VPN-only access, or using a virtual desktop environment. When there are multiple repositories, assign developers permissions only for the repositories they need to access.
The greatest security threat to an on-premise Git repository is the potential for data loss because of a natural disaster, hardware failure, or ransomware attack. All infrastructure, including the source code management solution, must be well maintained with updates and security patches and backed up regularly.
Cloud-based disaster recovery solutions give you almost immediate access to data backups. On-premise repositories are more challenging to restore in the event of data loss. Even if you store backups off-site, you won’t be able to avoid downtime while you retrieve and restore the backup data.
Advantages and Disadvantages of Cloud Code Storage Solutions
Reduced costs, greater scalability, and increased resiliency are common reasons businesses choose cloud storage solutions. Cloud-based storage eliminates the costs associated with purchasing extra storage to meet future needs and the typical pay-as-you-go subscription model makes it easy to increase or decrease storage capacity on-demand without changing IT infrastructure. Repository backups to cloud storage provide better resiliency against data loss to help you recover faster from unexpected events.
But there are some disadvantages. It’s not always easy to migrate from one vendor solution to another and differences in vendor platforms could cause configuration issues that create security vulnerabilities.
A cloud storage solution also gives you less control over your data. The cloud infrastructure is owned and managed by the cloud vendor, who handles the execution of services within the cloud infrastructure. Security controls may not be customizable to your organization’s specific needs.
If you’re in an industry regulated by government or industry requirements, you’ll need to choose a cloud provider that is compliant. Compliance breaches by a provider could subject your business to hefty penalties and fines.
Increasing Security of Repositories Through Backups
A good backup and recovery plan is critical for maintaining compliance, ensuring maximum repository uptime, and preventing data loss.
A cyber criminal’s demands for ransom lose power when you can easily restore clean data from a viable backup. If an account is compromised, a rollback can restore repositories and metadata to a previously good state. Even with strong security controls in place, a codebase can be accidentally deleted, or a disgruntled employee could maliciously delete code from your repository.
An efficient backup and recovery plan ensures that your code is safely backed up and can be easily retrieved and restored with little effect on application development timelines. With BackHub for Git repositories, you can create daily recurring backups for private and public GitHub repositories and easily restore repositories back to GitHub with the BackHub Restore App.
Getting Started with BackHub for GitHub Repositories
BackHub for GitHub creates daily snapshots of your data and keeps them for 30 days, with an option to store them for 365 days. Backups include the repository, all branches, and associated metadata including issues, milestones, pull requests, and releases. With BackHub, you can also sync backups offsite to your own cloud storage (Amazon S3) for long-term retention. BackHub also offers a complete account activity log for audit and compliance purposes.
Get started with a free 14-day trial Pro Plan or an Enterprise Plan. Pro Plans start at $14 USD for up to 10 repositories.
Once your account is configured, head on over to GitHub Marketplace to install BackHub. Set up only takes a few minutes and you can use your personal or company account to create daily snapshots of your GitHub repository. During installation, you can select the repositories you want to back up. BackHub eliminates the need for creating, maintaining, and running daily GitHub repository backup scripts.
Conclusion
Whether on-premise or in the cloud, data can be vulnerable to accidental deletion, malware, corruption, and other security threats. As a cloud-based service, GitHub is not immune to these threats. Securing data in the cloud will always be a shared responsibility between you and your cloud provider. Data is best secured when you and your cloud provider are clear on your individual security roles.
An effective repository backup and recovery solution is the first step in protecting your code against security threats.