Today, I am happy to publish the code for scanct
, a program that searches certificate transparency logs for known self-hosted services, hoping to find exposed credentials such as AWS keys. You can find the source code here.
While Git repository scanning is already done on large GitHub/GitLab instances, smaller, self-hosted instances rarely scan for exposed secrets. To find possible instances, we can use Certificate Transparency Logs to get hostnames that probably contain GitLab instances. Then, we can use the GitLab API to find public repositories, clone them and scan for secrets using a git scanner like gitleaks.
This process usually results in lots of findings and needs some clean-up afterwards.
For now, I focused on high-value secrets such as AWS keys which can automatically be verified using aws sts get-caller-identity
.
The program can then give valid keys to a human for research and responsible disclosure.
Inspired by a recent blog post, I also added support for Jenkins instances, verifying whether they have a publicly accessibly /script
endpoint or public job files.
An open /script
endpoint leads to RCE on the Jenkins container/instance as well as compromising all Jenkins secrets.
I implemented the program in Go, see here for the source code. While it was originally adapted from shhgit, I removed almost all the code in the process due to different requirements.
The program uses a SQLite database for storing its data. This has the advantage that crashes do not lead to data loss, requiring only a program restart. Also, it is easier to enforce data consistency via an external database.
The program uses a multiple-stage process its processing:
instances
as unprocessedjenkins
or git_labs
git_labs
, fetch all publicly available repositories and save them as unprocessed repositories
.repositories
, clone them, scan for secrets and save them as unprocessed findings
findings
of type aws-access-token
, verify them and save them as aws_keys
.Fetching and parsing millions of certificates takes some time and the CT logs are rate-limited. Therefore, all subjects are stored on disk, so some disk space is needed.
I ran the program on the 2023 Let’s Encrypt logs up until end of February, here is what I found:
/script
accessThe largest GitLab instances I encountered were:
The largest Jenkins instances I encountered were:
Due to the amount of findings, it is basically impossible to report everything. I did however try to report all high-value findings.
In terms of scanct
, the following would be possible:
git.
I am unsure about the last two since as of right now, scanct
only reads instance data, never modifying it.
User registration would lead to spamming the instances which is something I certainly don’t want.
If you have any ideas or comments, I am happy to answer your message!