GitGuardian, the security startup hunting down online secrets to keep companies safe from hackers

GitGuardian logo

(Image credit: GitGuardian)

published 29 November 2019

When the login details of an Uber engineer were exposed in 2016 – signalling one of the most high-profile breaches of recent years – the names and addresses of 57 million riders and drivers were left at the mercy of hackers.

None of Uber’s corporate systems had been directly breached, though. Its security infrastructure was working as it should. Instead, the credentials were found buried within the code of an Uber developer’s personal GitHub account. This account and its repositories were hacked, reportedly due to poor password hygiene and the stolen credentials used to access Uber’s vast datastore. This breach, which Uber sat on for a year, resulted in a then-record-breaking $148 million fine.

Yet despite this public lesson in how not to handle private credentials, so-called company secret leakage is an everyday occurrence.

The rise of secret leakage

Research from North Carolina State University found that in just six months between October 2017 and April 2018, more than half a million secrets were uploaded to GitHub repositories, including sensitive login details, access keys, auth tokens and private files. A 2019 SANS Institute survey found that half of company data breaches in the past 12 months were a result of credential hacking – higher than any other attack method among firms using cloud-based services.

This is where GitGuardian comes in.

Uber sends hacking victim new password in plain text email Uber fined $148m for attempting to hide 2016 data breach NASA employee data exposed for at least three weeks due to misconfigured web app Third-party Facebook app leaked 540m user records on AWS server How to check if your passwords have been stolen

Founded in 2017 by Jérémy Thomas and Eric Fourrier – a pair of applied mathematics graduates and software engineers specialising in data science, machine learning and AI – the Paris-based cybersecurity startup uses a combination of algorithms, including pattern matching and machine learning, to hunt for signs of company secrets in online code. According to the company’s figures, more than a staggering 3,000 secrets make their way online every day.

“The idea for GitGuardian came when Eric and I spotted a vulnerability buried in a GitHub repository,” CEO and co-founder Thomas tells IT Pro. “This vulnerability involved sensitive credentials relating to a major company being leaked online that had the potential to cost the firm tens of millions of dollars if they had got into the wrong hands. We alerted the company to the vulnerability and it was able to nullify it in less than a week.”

“We then built an algorithm and real-time monitoring platform that automated and significantly built-upon the manual steps we took when we made that initial detection, and this platform attracted interest from GitHub’s own Scott Chacon as well as Solomon Hykes from Docker and Renaud Visage from EventBrite.”

How the cloud is fuelling secret leakage

The problem of sensitive data leakage stems in part from the increasing reliance of software developers on third-party services. To integrate such services, developers often juggle hundreds of credentials with varying sensitivity, from API keys used to provide mapping features on websites to Amazon Web Services login details, and private cryptographic keys for servers. Not to mention the many secrets designed to protect data, surrounding payment systems, intellectual property and more.

In the process of handling these integrations, more than 40 million developers and almost 3 million businesses and organisations globally use GitHub, the public platform that lets developers share code and collaboratively work on projects. Either by accident (in the majority of cases), or occasionally knowingly, these uploads have company secrets buried within them alongside the code that’s being developed. As was seen with the Uber breach, hackers can theoretically scour this code, steal credentials and hack company accounts all without the developer and their employer being any the wiser.

How GitGuardian plugs these leaks

GitGuardian’s technology works by first linking developers registered on GitHub to their respective companies. This already gives the company greater insight over who their developers are on GitHub and the levels of public activity they’re involved in. This is especially important for developers’ personal repositories because they’re completely out of their companies’ control, yet too often contain corporate credentials.

Once linked, GitGuardian’s algorithms scrutinise any and all code changes, known as commits, made by these developers in real-time, looking for signs of company secrets. Such signs within these commits range from code patterns to file types that have previously been found to contain credentials.

“Our algorithms scan the content of more than 2.5 million commits a day, covering over 300 types of secrets from keys to database connection strings, SSL certificates, usernames and passwords,” Thomas continues.

Once a leak occurs, it takes four seconds for GitGuardian to detect it and send an alert to the developer and their security team. On average, the information is removed within 25 minutes and the credential is revoked within the hour. For every alert, GitGuardian seeks feedback from its developers and security teams who rate the accuracy of the detection: were company secrets actually exposed or was it a false positive? Consequently, the algorithm is constantly evolving in response to new secrets and how they are leaked.

This seems like a simple premise, even if the technology behind it is far from simple. But what’s to stop a hacker building a similar algorithm to intercept the secrets before GitGuardian’s platform spots it?

“GitGuardian is indeed competing with individual black hat hackers, as well as organised criminal groups,” Thomas explains. “We constantly improve our algorithms to be quicker and smarter than they are, and to be able to detect a wider scope of vulnerabilities, which requires a dedicated, highly skilled team.

“We're helped in this by our users and customers who give us feedback – at scale – that we reinject into our algorithms. Our white hat approach allows us to collect feedback and this gives us a tremendous edge over black hats. You can see this as the unfair advantage you get by doing good.”

GitGuardian has already supported global government organisations, more than 100 Fortune 500 companies and 400,000 individual developers. It’s now setting its sights on adding even more developers and companies to its platform to further improve its algorithm, and extend this technology for use on private sites.

“We started GitGuardian by tackling secrets in source code and private sites,” concludes Thomas. “Our ambition really is to be developers’ and cybersecurity professionals’ best friend when it comes to securing the vulnerability area that is emerging due to modern software development techniques [and] we’re on the road to doing this.”