Resources/Research

API Key Leaks on GitHub: What the Data Shows

Every year, tens of millions of secrets land on GitHub. They come from beginner mistakes, rushed commits, AI assistants, and CI misconfigurations. Here is what the data actually shows about how keys leak, how fast they are exploited, and what makes the problem worse than most developers think.

10 min read·May 2026·KeyVault Edge Team

29M+

Secrets detected on GitHub in 2025

Source: GitGuardian

8 min

Average time to first exploit attempt

Source: GitGuardian

2×

Higher leak rate with AI coding assistants

Source: GitGuardian

79%

Of leaked secrets are still valid after 5 days

Source: GitGuardian

The scale of the problem

GitGuardian's annual State of Secrets Sprawl report is the most comprehensive dataset on credential leakage available publicly. The 2026 edition (covering 2025 activity) found 29 million new secrets detected across public GitHub commits - up from 12.8 million in 2022. The growth is not slowing.

The report scans all public commits in real time. It does not scan private repositories, meaning the 29 million figure is a floor, not a ceiling. Internal research from GitGuardian suggests private repositories contain roughly 4× the density of secrets found in public repos - developers are more careless when they believe code is not publicly visible.

The most commonly leaked credential types in 2025:

Provider / Type	Share of detections
Generic API keys (unbranded)	38%
OpenAI / AI provider keys	14%
AWS access keys	10%
Google Cloud / GCP credentials	8%
GitHub personal access tokens	7%
Stripe API keys	5%
Other (Twilio, SendGrid, etc.)	18%

Approximate distribution. Source: GitGuardian 2026 State of Secrets Sprawl.

How API keys end up on GitHub

The majority of leaks are not caused by deliberate carelessness. They fall into five patterns:

The forgotten .env file

A developer creates a .env file for local development, adds it to the repo before adding .gitignore, and the commit history permanently contains the secret. Even after deleting the file in a later commit, the key is in git history and can be recovered with git log.

The hardcoded quick test

A developer pastes a real key to test something quickly, intending to remove it. The commit is pushed before the cleanup. This is the single most common pattern in GitGuardian's incident data.

CI/CD log exposure

A deployment script echoes environment variables for debugging. The CI log is set to public (common in open-source projects on GitHub Actions). Every variable is visible to anyone who views the run.

AI coding assistant autocomplete

The developer has a key in their clipboard or open in another window. The AI assistant autocompletes a code block using a value that resembles the format of an API key. The developer accepts the suggestion without reviewing it. GitGuardian found that Copilot and Cursor-assisted commits leak at 2× the baseline rate.

Dependency or template exposure

A project template, Dockerfile, or third-party package has a placeholder that resembles a real key format. A developer replaces the placeholder with their real key and commits. The key is then in a widely-forked or frequently-cloned repository.

The 8-minute exploit window

GitGuardian runs automated detection against the GitHub event stream in real time. Their data shows the median time between a secret being pushed to a public repository and the first external use of that secret is 8 minutes. For high-value keys (AWS, OpenAI, Stripe), the median is lower.

This is because the attacker is also automated. Several well-documented botnets continuously monitor the GitHub public event API (the same API that powers GitHub's own notifications). When a commit is pushed, the bot downloads the diff, runs regex against it, and attempts to use any matched credential within seconds. The bot does not care if the commit is immediately reverted. Reverting a commit does not remove the key from git history, and the bot already has a copy.

The revoke-and-rotate timeline

T+0Commit pushed to public repo

T+1 minBot detects key in GitHub event stream

T+3 minBot validates key is live (test request)

T+8 minFirst exploit attempt (often mining or resale)

T+20 minGitHub Secret Scanning alert fires (for partner providers)

T+45 minDeveloper receives GitHub notification

T+60 minDeveloper manually revokes key (if they act immediately)

T+60+ minDamage already done

The important observation here is that detection-and-response is too slow. By the time any notification reaches a developer, the exploit window has already been active for at least 30 minutes. The only effective controls are pre-exposure (preventing the key from being committed) orexposure-tolerance (ensuring the leaked credential is worthless).

Git history is permanent - almost

79% of leaked secrets are still valid five days after detection. This figure is high because most developers do not know their key was leaked, or they believe that deleting the file and pushing a new commit is sufficient remediation. It is not.

Deleting a file in a git commit only removes it from the working tree. The file still exists in every prior commit. Anyone who clones the repository and runs git log -p can see every version of every file, including the deleted one with the key.

Proper removal requires rewriting git history with git filter-repo (the modern replacement for git filter-branch) and force-pushing. Even then, GitHub caches content via its API, and any fork or clone taken before the rewrite still contains the secret. For any key that was ever in a public repo - even briefly - treat it as permanently compromised and rotate immediately.

If you need to remove a secret from git history

# Install git-filter-repo
pip install git-filter-repo

# Remove a specific file from all history
git filter-repo --path .env --invert-paths

# Force push (coordinate with your team first)
git push origin --force --all

# IMPORTANT: Rotate the key regardless.
# History rewriting is not a substitute for rotation.

Why AI coding tools make this worse

GitGuardian's 2026 data introduced a new finding: commits generated with AI coding assistant assistance leak secrets at approximately 2× the baseline rate. The mechanism is not fully understood, but several factors are likely contributors:

▸AI assistants autocomplete credential placeholders using values from context - clipboard, open files, or recently pasted content. Developers accept suggestions without auditing every line.
▸AI-assisted commits tend to be larger (more files, more context) - larger commits are harder for the developer to review before pushing.
▸AI tools sometimes generate boilerplate with hardcoded example values that look like real keys but use formats identical to live credentials.
▸Developers using AI tools often move faster and skip manual review steps, reducing the likelihood of catching a committed secret before push.

The practical implication: if your team uses AI coding tools, pre-commit scanning is more important, not less. The Gitleaks pre-commit hook runs in under a second and catches most credential patterns before they leave the developer's machine.

What the data means for your security posture

The key insight from GitGuardian's dataset is that detection-and-response is a losing strategy for credential leaks. The exploit window is shorter than the detection-to-response cycle by a factor of 5–10×.

The two strategies that actually work are:

Strategy 1: Prevention

Pre-commit scanning (Gitleaks), .gitignore discipline, mandatory code review, no keys in CI logs. Prevents the vast majority of accidental leaks. Requires ongoing team discipline - the protection degrades when developers are under time pressure.

Strategy 2: Exposure tolerance

Design your credential architecture so that the thing that leaks is not the real secret - it is a host-bound token that is cryptographically worthless without your specific domain. The 8-minute exploit window becomes irrelevant when the leaked credential does nothing for an attacker. This is the architectural shift that makes the data problem go away at the root.

Prevention and exposure tolerance are complementary, not alternatives. Prevention reduces the attack surface; exposure tolerance ensures that the residual risk from prevention failures has zero consequence.

Make API key leaks a non-event

KeyVault Edge implements exposure-tolerance architecture: your real API keys are stored in an HSM, and what lives in your codebase is a host-bound token that is cryptographically useless to anyone outside your domain.

Get started free - 3 tokens, 100K requests/month