Host-Binding Encryption: The Architecture Behind KeyVault Edge
The KeyVault Edge security model rests on a single property: a sanitized token must be useless outside the domains and IP ranges it was issued for. This post is a full technical exposition of how that property is achieved and maintained.
The problem we're solving
API keys as issued by providers (OpenAI, Stripe, AWS, etc.) have one catastrophic property: they are origin-independent. A key issued to your account can be used from any IP address, any network, any country - as long as the key string is known.
This means that the security of your API access reduces entirely to the security of the key string itself. If the string leaks through any channel - source code, CI logs, a developer's email, a stolen laptop - the attacker has full access to your API account with no further barrier.
Host-binding solves this at the proxy layer without requiring provider cooperation. We intercept API requests, validate the origin, and only forward requests from authorised origins. The real key never leaves the proxy.
Sanitized token design
A KeyVault Edge sanitized token has the prefix kve_hb_ followed by a base58-encoded 32-byte random identifier. The token is a lookup key - it does not encode any secrets itself.
kve_hb_<YOUR_43_CHAR_TOKEN_HERE>
kve_ → KeyVault Edge token (identifies our system)
hb_ → host-bound variant (other variants exist for IP-bound, etc.)
7xKm… → 32-byte random identifier, base58-encoded (43 chars)
Total length: 50 chars - similar to OpenAI key length for drop-in compatThe token maps in our database to:
{
token_id: "7xKm9Lp2QrNvTwY3ZsBcDfGhJmKpQsVxY2",
real_key_ref: "<pointer to HSM-encrypted real key>",
target_provider: "openai",
authorized_origins: ["yourdomain.com", "staging.yourdomain.com"],
authorized_ips: [], // optional IP allowlist
created_at: "2026-01-15T10:23:00Z",
last_used_at: "2026-05-10T14:22:11Z",
request_count: 14823,
status: "active"
}Host-binding: the cryptographic constraint
Host-binding is enforced at the proxy layer, not at the token layer. The token itself carries no cryptographic binding - it is a lookup key. The binding is stored server-side and enforced on every request.
When the proxy receives a request with a kve_hb_ token in the Authorization header, it:
- 1.Extracts the token ID from the Authorization header
- 2.Looks up the token record in the edge KV store (sub-millisecond latency)
- 3.Extracts the Origin or Referer header from the request
- 4.Checks the origin against the authorized_origins list
- 5.If the origin is unauthorized: returns 403, logs the attempt, fires a breach alert
- 6.If the origin is authorized: fetches the encrypted real key from the HSM reference
- 7.Decrypts the real key in isolated Worker memory (never written to disk or logs)
- 8.Substitutes the real key into the Authorization header
- 9.Forwards the request to the target provider
- 10.Returns the provider response to the caller
Cloudflare Workers: why V8 isolates matter
The proxy runs on Cloudflare Workers. The choice is architectural, not operational: V8 isolates provide a fundamentally different security boundary than traditional server processes.
No shared memory between requests
Each Worker invocation runs in a separate V8 isolate. A decrypted real key fetched for request N cannot leak into request N+1, even if they share the same physical hardware.
No file system access
Workers have no persistent file system. Decrypted key material cannot be written to disk, cached in a temp file, or accessed by another process on the same host.
300+ PoPs
Requests are served from the PoP nearest to the caller. The decryption happens close to the origin, reducing cross-region key material transit.
No cold start latency for secrets
The encrypted key reference is fetched from Cloudflare KV at request time. Decryption adds ~0.5ms to request latency.
Request flow: end-to-end
// Your code (unchanged from standard OpenAI SDK usage)
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY, // kve_hb_...
baseURL: process.env.OPENAI_BASE_URL, // https://openai.keyvaultedge.com/v1
});
const completion = await openai.chat.completions.create({ ... });
// ─── What happens at the network layer ───────────────────────────────────
// 1. SDK sends: POST https://openai.keyvaultedge.com/v1/chat/completions
// Authorization: Bearer kve_hb_...
// Origin: https://yourdomain.com
//
// 2. Worker receives request at nearest PoP
// 3. Validates Origin against token's authorized_origins
// 4. Fetches encrypted key from KV + HSM
// 5. Forwards: POST https://api.openai.com/v1/chat/completions
// Authorization: Bearer sk-proj-real_key_here
// (Origin header stripped from outbound request)
//
// 6. Streams response back to caller
// ─────────────────────────────────────────────────────────────────────────The round-trip latency overhead is typically 2–5ms. For streaming responses, the Worker begins forwarding chunks immediately - the decrypt-and-forward happens before the first token is returned.
Breach detection and alerting
Every request to the proxy is logged with the following metadata (no request body content is logged):
{
timestamp: "2026-05-10T14:22:11.341Z",
token_id: "7xKm9Lp2...",
origin: "https://yourdomain.com",
ip: "203.0.113.42",
cf_country: "US",
cf_pop: "SJC",
path: "/v1/chat/completions",
status: 200,
authorized: true,
latency_ms: 3
}When authorized: false is logged (unauthorized origin attempting to use a token), the system:
- Fires a webhook to the user's configured breach alert endpoint
- Sends an email notification within 30 seconds
- Adds the requesting origin and IP to the anomaly log in the dashboard
- Applies exponential backoff to subsequent unauthorized requests from the same IP
Threat model
Token leaked in source code
Attacker gets a 403 from the proxy. Breach alert fires. Zero API access.
Token leaked in CI logs
Same as above. Host-binding check fails for any non-authorised origin.
Token used from a compromised authorized domain
Requests succeed (the domain is authorized). Anomalous usage patterns trigger alerts. Revoke token from dashboard.
KeyVault Edge infrastructure compromised
Real keys are stored encrypted in HSM. Compromise of Worker code would not expose raw keys without also compromising the HSM. Scope: all tokens potentially revocable from provider dashboard.
Request body interception
All connections are TLS 1.3. Worker-to-provider leg is also TLS. No request body logging by design.
Try the architecture in your project
Two environment variable changes. No SDK changes. Free for up to 3 tokens and 100K requests per month.
Get started free