Infrastructure Takeover through Certificate Transparency Scanning

Today, I am happy to publish the code for scanct, a program that searches certificate transparency logs for known self-hosted services, hoping to find exposed credentials such as AWS keys. You can find the source code here.

Overview

While Git repository scanning is already done on large GitHub/GitLab instances, smaller, self-hosted instances rarely scan for exposed secrets. To find possible instances, we can use Certificate Transparency Logs to get hostnames that probably contain GitLab instances. Then, we can use the GitLab API to find public repositories, clone them and scan for secrets using a git scanner like gitleaks.

This process usually results in lots of findings and needs some clean-up afterwards. For now, I focused on high-value secrets such as AWS keys which can automatically be verified using aws sts get-caller-identity. The program can then give valid keys to a human for research and responsible disclosure.

Inspired by a recent blog post, I also added support for Jenkins instances, verifying whether they have a publicly accessibly /script endpoint or public job files. An open /script endpoint leads to RCE on the Jenkins container/instance as well as compromising all Jenkins secrets.

Implementation

I implemented the program in Go, see here for the source code. While it was originally adapted from shhgit, I removed almost all the code in the process due to different requirements.

The program uses a SQLite database for storing its data. This has the advantage that crashes do not lead to data loss, requiring only a program restart. Also, it is easier to enforce data consistency via an external database.

The program uses a multiple-stage process its processing:

  1. Analyze the fetched CT entries and fetch remaining certificates, storing their subject in instances as unprocessed
  2. Look at unprocessed instances, matching them against hostname patterns, running checks to verify whether the host actually contains a GitLab/Jenkins/… instance and save that metadata to a software-specific table, e.g. jenkins or git_labs
  3. Look at unprocessed git_labs, fetch all publicly available repositories and save them as unprocessed repositories.
  4. Look at unprocessed repositories, clone them, scan for secrets and save them as unprocessed findings
  5. Look at unprocessed findings of type aws-access-token, verify them and save them as aws_keys.

Fetching and parsing millions of certificates takes some time and the CT logs are rate-limited. Therefore, all subjects are stored on disk, so some disk space is needed.

Findings

I ran the program on the 2023 Let’s Encrypt logs up until end of February, here is what I found:

The largest GitLab instances I encountered were:

The largest Jenkins instances I encountered were:

Due to the amount of findings, it is basically impossible to report everything. I did however try to report all high-value findings.

What next?

In terms of scanct, the following would be possible:

I am unsure about the last two since as of right now, scanct only reads instance data, never modifying it. User registration would lead to spamming the instances which is something I certainly don’t want.

If you have any ideas or comments, I am happy to answer your message!