Top GitHub Compliance Concerns

Matthew Fuller | Last updated on April 13, 2023 | 8 minute read

Version control software such as Git has been a tremendous boon to the software development industry. By allowing developers to easily commit, share, and solicit feedback on versioned changes to code, Git, and supporting services like GitHub, have quickly risen in popularity. Individuals, startups, and enterprises alike have adopted GitHub as their internal source control repository of choice.

Yet like any other service in a connected corporate environment, GitHub and the Git workflow present their own set of security and compliance challenges that must be addressed by your organization’s compliance teams.

As a managed service, GitHub provides many options that can help you maintain compliance with the necessary standards, but it is still vitally important to evaluate the requirements of the compliance programs for which your organization is responsible, and ensure proper use and monitoring of these controls.

In this roundup, we’ll share six compliance concerns for engineering teams working in GitHub, and how to address them.

Evaluating Compliance Requirements

Compliance programs have vastly different requirements depending on the target industry and data being managed. To help distill these requirements into a manageable list, it can be useful to create a controls matrix, consisting of the superset of requirements. For example, you may need to comply with both Payment Card Industry (PCI) and Health Insurance Portability and Accountability Act (HIPAA) compliance programs, which address disparate parts of your data processing strategy, yet overlap in other areas. This matrix of controls can then help you when evaluating and implementing the security options offered by GitHub.

In general, most compliance programs will focus on several key areas: data categorization, access control, permissions, auditing and access review, source code integrity, and backup and restore processes. Through careful implementation of GitHub’s provided features, adoption of third-party tools where necessary, and regular reviews, the complexity of complying with these compliance requirements can be reduced.

Data Location

When deciding to incorporate GitHub into your organization, one of the most crucial early compliance decisions will be whether to use GitHub’s hosted SaaS service or to self-host GitHub using GitHub’s Enterprise option. The factors that weigh on this decision include your operational and uptime requirements, internal support criteria, and numerous others that exceed the scope of this article. From a compliance perspective, it’s important to determine whether any specific regulatory requirement prohibits the use of GitHub’s hosted options.

This investigation will focus heavily on the concept of data categorization – the classification of data that your organization plans to store in GitHub. In a majority of cases, this will include source code, configuration data, infrastructure documentation and diagrams, and other sensitive data. Depending on your use case, it may also include customer data but may be unlikely to include data such as customer billing information, health care records, or other protected data classes.

Access Control and User Management

Compliance requirements tend to be very particular about how services are accessed, who can access them, how that access is managed, and how it can be revoked when necessary. Each of these concerns can be directly addressed using various settings available to GitHub users. Subscribers to GitHub’s Enterprise plans can implement Single Sign-On (SSO) authentication using SAML-compliant identity providers. This option allows an organization’s access to GitHub to be managed using existing providers rather than re-implemented directly in GitHub with yet another combination of username and password to remember (or lose!).

Routing user management to an external identity service removes a significant portion of regular access review checks that need to be conducted. For example, a majority of SOC II’s quarterly user access reviews can be delegated to a federated identity provider, rather than to the team managing GitHub. If SSO is not used, these checks can become a time-consuming task of manually reviewing hundreds or even thousands of users in the GitHub organization.

If user management is done directly in GitHub, two-factor authentication (2FA) should be enforced at the organization level. This can be done via the GitHub organization settings pages and ensures that every user that accesses the GitHub organization uses 2FA when logging into the account.

A screenshot of a GitHub account with two factor authentication enabled.

Another enterprise feature that can improve security and reduce the scope of access reviews for compliance is IP address restriction. GitHub Enterprise also allows organizations to upload a list of IP addresses, such as an office or VPN egress addresses, which GitHub will use to restrict access to the GitHub organization and its associated repositories. While this is not a sufficient control by itself, it can greatly reduce the attack surface in the event a GitHub user’s credentials are compromised.

Although it is not a core requirement of most compliance programs, using groups to manage user access can significantly reduce the burden of access management and review. Imagine the complexity of validating access to hundreds of code repositories if users are added directly as contributors rather than to groups as contributors. While GitHub does support managing users via teams (GitHub’s terminology for group membership), there is not an easy way to enforce access via teams rather than direct access as a user. This is an important control to validate during the quarterly access reviews or even to automate using GitHub’s APIs.

Role-Based Access Control

GitHub has a built-in role-based access control (RBAC) system that ensures fine-grained access to repositories and settings. When users are added to the organization, they can either be an owner or a member. Owners are administrators; they have full access to the organization, its settings, other users, and source code. The number of owners should be kept to an absolute minimum, but be sure that at least one owner is available at all times to handle urgent setting changes or other critical activities. Finally, a user’s membership in an organization should be kept private to help avoid social engineering attacks where an attacker targets specific users based on known organization associations.

A screenshot of a GitHub repository with private membership enabled.

Members can be added directly to repositories with one of five roles, ranging from “read” (a basic read-only role) to “administrator” (full access). However, rather than adding these members directly to the repository, they can be invited to a team instead. This team can then be given the same roles to the repository. This is a much easier way of maintaining user access and helps reduce accidental over-provisioning.

A screenshot showing different user permissions available in GitHub.

Third-Party Access

Users are not the only entities that can be given access to GitHub. Third-party GitHub applications, OAuth integrations, webhooks, and API integrations can all be given varying levels of access to GitHub’s settings and repository source code. It is critical that this access be limited, tightly scoped, and audited regularly. Remember that these applications are not owned or maintained by GitHub, so your organization bears full responsibility for their use.

Auditing

A core tenet of most compliance programs is logging all access to systems. In disaster recovery and security post mortems, knowing who accessed a system, from where the requests originated, and what data was accessed are key requirements for successfully recovering after an incident. Fortunately, GitHub supports detailed audit logging that includes timestamps, IP addresses, usernames, and accessed resources. These logs can be accessed by organization owners from the GitHub console.

Code Security

Source code is perhaps the most sensitive data your organization will store in GitHub. Loss of control over this code can lead to well-crafted security infiltration attempts and exposure of your company’s intellectual property. GitHub ships with robust access control settings for repositories, but it is still necessary for your teams to implement these settings properly and routinely audit their configuration. Each repository can either be public or private. In most corporate environments, the default for all repositories should be “private.” GitHub provides an option to ensure that all newly created repositories default to private, which should be enabled at the organization level.

GitHub provides several additional features to help ensure the integrity of your source code repositories. Signed commits, a feature that allows developers to cryptographically prove code authorship, can provide greater confidence that the code being submitted originates from the intended developers. GitHub also supports approving known domains as contributors to the organization’s codebases. This helps prevent malicious users with similar-looking domains from accidentally being approved to submit code. Finally, branch protection can be enabled to require developers to move their code through approved workflows, such as pull requests, rather than pushing it directly to the upstream branch.

Through a series of acquisitions, GitHub has also begun offering dependency and security scanning for repositories, to help ensure critical security risks are detected and fixed. Some dependency version issues can even be automatically patched using GitHub’s automated tool, Dependabot.

Backup and Restore

As a service provider, GitHub is generally responsible for the backup of its systems and user data. However, many compliance programs mandate that users of third-party tools like GitHub also have a shared responsibility to the backup and recovery processes of their data. In other words, while it is unlikely that GitHub would permanently lose your data, you still have a compliance obligation to demonstrate that you keep backups that can be restored in the event of an emergency or security situation.

To fully comply with compliance requirements, these backups must be taken at consistent intervals, stored off-site on different systems and hardware, and tested regularly as part of disaster recovery drills. Different compliance programs have varying acceptable timeframes and procedures, but the need for a trusted backup solution is consistent. Third-party providers, such as BackHub by Rewind can remove much of the complexity from this process.

Conclusion

Although aligning your organization’s compliance requirements with your use of GitHub can seem daunting, there are many built-in and third-party options that can significantly reduce complexity, improve your security posture, and ensure your data remains safe and compliant according to your customer’s expectations. Careful selection of these controls is a good first step, but ongoing review and regular access audits are critical to continued success.


Profile picture of <a class=Matthew Fuller">
Matthew Fuller
Matthew Fuller is an accomplished founder and entrepreneur in the cloud security space. He began his career developing, deploying, and supporting cloud-native applications for startups and enterprises, during which he saw the difficulties in developing truly secure workloads in complex cloud environments. This led him to found CloudSploit, an open source and commercial cloud security configuration monitoring service, which was acquired in 2019 by Aqua Security.