*Summary: this post talks about ways that organizations could implement security controls to log and audit code accesses to simplify incident response in the event of source code compromise.*

A large part of Carve’s customer base is software development organizations, for whom the source code represents the “crown jewels,” the intellectual property whose compromise would be a nightmare for the business. Protecting the code is paramount to these organizations’ security.

Over the past year, the number of attacks targeting code hosted on the Internet (e.g. Github) has been on the rise. These attacks have taken on different shapes – from “watering hole” attacks such as the attack on CodeCov to the GitHub oAuth tokens compromise via Heroku and TravisCI that hit the news recently.

Oftentimes, these attacks go undetected for a long time. For example, it appears that the CodeCov compromise went undetected for two months. When they are detected, however, being able to properly respond to these types of incidents is extremely important. A top priority when responding to unauthorized source code access is understanding who accessed what areas of code, and when they committed changes.

Now, how would you know who accessed what from where (and if that’s legitimate) if you’re staring at two months’ worth of logs?

To start answering this question, we must identify all accesses to the code that we know have to be “good.” By excluding them from the superset of all events, we would reduce the extent of the suspicious activity that needs to be investigated. To accomplish this, we’d want to enumerate as many possible locations that code would normally be accessed from  (below I’m assuming an Internet-based code hosting solution such as GitHub), and ensure that we have the proper logging mechanisms covering those locations. 

Here are some common ones:

  • Internal network and/or VPN addresses
    These are easy. The IP address space for those is fairly contained and accesses from this space can be easily identified.
  • Software engineers’ residential IP addresses
    These are a little tougher because they frequently change. On one occasion, Carve approached this through indirect logging. We implemented a mechanism that would log the IP address and the identity of the user for every access to the corporate messenger service and push them into a separate Splunk index (not the messages themselves though; that would have been very questionable). This way, when we needed to review code access logs, we would be able to “put the user behind the keyboard,” and map code accesses to residential IP addresses that would show up in the messenger-sourced Splunk index.
  • Third-party, cloud-based CD/CD systems (e.g. CircleCI, TravisCI, etc.)
    Software shops widely use these, and they can create a fairly unique challenge. As a lot of these providers’ infrastructures are based in the cloud, oftentimes they would spin up ephemeral runners (i.e. those that go up and down randomly and at arbitrary IP addresses within their cloud providers’ space) that would pull down code from GitHub as part of the CI/CD pipeline, build the code and then disappear into the void. When you then look at the GitHub security logs and try to account for a given access from a runner, all you might see could be a random EC2 instance address that was associated with a runner at some point, but you can’t reliably map it to any known good party. To tackle this challenge, we have been instrumenting CI/CD jobs across various providers (mileage will vary as there are subtleties with any provider) with additional logging mechanisms that would create an authoritative imprint in the logs with the identity, timestamp, and location of every ephemeral runner. As a result, we have the confidence that we will be able to account for accesses from temporal locations that we’d otherwise lose track of.

To summarize, I wanted to stress the usefulness of system-wide controls that would reduce the “noise” created by legitimate, “known-good” events that is so detrimental and distracting, especially when you are under stress trying to investigate an active incident. We at Carve have been actively helping our customers implement such controls, and we look forward to helping your organization as well.