zoominfo

Continuous Code Auditing for SOC 2 Compliance

by | Jul 5, 2021

A woman looking at a computer screen showing lines of code.

With the increased adoption of continuous deployment, more and more organizations have increased their deployment frequency in order to get their latest products into the hands of their customers as quickly as possible. A complement to this fast-paced deployment practice is continuous auditing. Both of these approaches constitute a leftward shift from the more traditional methodologies of software development.

This increased rate of deployment can lead to the occasional necessity for an emergency hotfix. Like many organizations, we are using GitHub to build and deploy our software. In some cases, emergency fixes are required to avoid real-world consequences including downtime or the degradation of services. Having these changes documented as code (rather than manipulating resources in a web console) is preferred. In many cases, MTTR (Mean time to repair) is more advantageous than MTTR (Mean time to resolve). At times, an emergency hotfix can solve the problem immediately, allowing for the appropriate retrospective to be spent on working towards a longer-term solution to the underlying problem.

At Rewind, we follow a change management process as part of SOC 2 compliance. This process allows for emergency changes, whereby changes can be approved but without the usual number of reviews for the change. There are times when an emergency fix may need to be released to production quickly, but allowing production engineers or alike the ability to impact production in emergency situations should be audited.

Working in conjunction with our SOC 2 auditor, we have developed (and open-sourced) tooling to scan pull requests within a specified time window leveraging GitHub’s search syntax. If any pull requests are found by the search query, they are logged in AWS CloudWatch Logs and the relevant CloudWatch Alarms are sent to SQS and then picked up by our operations team.

This notifies us of any emergency changes so that we can track the reason for the emergency change (required for SOC2 auditing) and triage them if necessary. This allows us to trust but verify, and also retain these records for an extended period of time, since GitHub only stores audit logs for 90 days. Certifications such as SOC 2 require a well-documented change management process and procedure, including how to deal with emergency changes and how these changes are audited.

How to Set Up a Continuous Code Monitor

The solution uses AWS SAM and its CLI to build, package, and deploy an AWS Lambda (written in Ruby). To reduce the size of the Lambda package, we leveraged AWS Lambda layers to package up our dependencies separately.

See the following CloudFormation snippet:

 AuditorLambdaFunction:
    Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
    Properties:
      CodeUri: src/
      Handler: lambda.handler
      MemorySize: 384
      ReservedConcurrentExecutions: 1
      Role: !GetAtt LambdaRole.Arn
      Runtime: ruby2.7
      Timeout: 300
      Layers:
        - !Ref AuditorLambdaLayer
      Environment:
        Variables:
          GITHUB_ORG_NAME: !Ref GitHubOrgName
          GITHUB_TOKEN_SSM_PATH: !Ref GitHubTokenSSMPath
          LAST_TIME_CHECKED_SSM_PATH: !Ref LastTimeCheckedSSMPath
    Tags:
      function: github-pr-auditor
      service: common
      platform: common
      lambda: github-pr-auditor
      region: !Ref AWS::Region

  AuditorLambdaLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: github-pr-auditor-dependencies
      Description: Dependencies for github-pr-auditor
      ContentUri: lambda_layer
      CompatibleRuntimes:
        - ruby2.7
      RetentionPolicy: Retain
    Metadata:
      BuildMethod: makefile

 

As you can see above, the Function is dependent upon a single layer called AuditorLambdaLayer. The layer is packaged up separately and contains all of the dependencies defined in the Gemfile.lock necessary to run the application. One caveat we ran into was that SAM does not package up ruby gems (for Lambda Layers) in a way that the Lambda runtime is expecting. Luckily we found this issue and were able to work around it by reorganizing the files in the Layer by making use of a custom Makefile.

After ensuring that the Lambda itself ran as expected, we set up an AWS Events Rule to run the Lambda on a schedule. The following CloudFormation snippet defines this:

 LambdaSchedule:
    Type: "AWS::Events::Rule"
    Properties:
      Description: >
        A schedule for the Lambda function.
      ScheduleExpression: !Ref LambdaRate
      State: ENABLED
      Targets:
        - Arn: !Sub ${AuditorLambdaFunction.Arn}
          Id: LambdaSchedule

Schedule Expressions for Rules can be defined as strings such as “rate(24 hours)” or “rate(5 minutes)” depending on how frequently you want to run the code.

The last piece of the puzzle was to ensure that alarms would notify us when a certain type of log appeared.

 EmergencyChangeAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Ref AlarmEmergencyChangeName
      AlarmDescription: A GitHub PR emergency change was merged
      MetricName: GitHubEmergencyChange
      Namespace: GitHubAuditing
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 1
      Threshold: 1
      TreatMissingData: notBreaching
      AlarmActions:
        - !Ref AlarmSNSTopicArn
      ComparisonOperator: GreaterThanOrEqualToThreshold

  EmergencyChangeFilter:
    Type: AWS::Logs::MetricFilter
    Properties:
      LogGroupName: !Ref LambdaLogGroup
      FilterPattern: |-
        "is non-compliant!"
      MetricTransformations:
        - MetricValue: "1"
          MetricNamespace: GitHubAuditing
          MetricName: GitHubEmergencyChange

We consider this type of alarm to be an “Emergency Change”. It matches the text “is non-compliant!” which is configured as a MetricFilter. These alarms then get sent to the configured AWS SNS topic, allowing for our operations team to be notified in a timely manner.

More Advanced Auditing

If you’re lucky enough to have a subscription to Github Enterprise, there are additional methods for auditing changes such as querying GitHub’s Audit Log API. There are good examples in The GitHub Enterprise Audit log API for GraphQL beginners. This enables a far more granular approach to auditing, allowing for many other types of actions to be monitored as well.

Wrapping Up

SOC 2 compliance is all about checks and balances. As a fast-growing startup, Rewind wants to continue to move quickly and continuously deploy code changes. But we need to balance that with the requirement for being able to audit when changes do not follow the usual PR review process. Rather than asking users to track these, we have created automated tooling to ensure we are following our own change management procedure.

Want to solve problems like this, with experts like these? Want to get paid at the same time? Check out Rewind’s open positions and discover a career in data science, engineering, or security. 

Dave Gallant
Dave Gallant
Dave is an experienced software developer who loves tooling, systems, security, and clouds. He’s constantly learning and reading new things. When he isn’t at a keyboard, Dave enjoys traveling the world to admire different architectural styles and sample local cuisines.