Resources/Technical

Host-Binding Encryption: The Architecture Behind KeyVault Edge

The KeyVault Edge security model rests on a single property: a sanitized token must be useless outside the domains and IP ranges it was issued for. This post is a full technical exposition of how that property is achieved and maintained.

15 min read·Feb 2026·KeyVault Edge Team

The problem we're solving

API keys as issued by providers (OpenAI, Stripe, AWS, etc.) have one catastrophic property: they are origin-independent. A key issued to your account can be used from any IP address, any network, any country - as long as the key string is known.

This means that the security of your API access reduces entirely to the security of the key string itself. If the string leaks through any channel - source code, CI logs, a developer's email, a stolen laptop - the attacker has full access to your API account with no further barrier.

Host-binding solves this at the proxy layer without requiring provider cooperation. We intercept API requests, validate the origin, and only forward requests from authorised origins. The real key never leaves the proxy.

Sanitized token design

A KeyVault Edge sanitized token has the prefix kve_hb_ followed by a base58-encoded 32-byte random identifier. The token is a lookup key - it does not encode any secrets itself.

Token structure
kve_hb_<YOUR_43_CHAR_TOKEN_HERE>

kve_   → KeyVault Edge token (identifies our system)
hb_    → host-bound variant (other variants exist for IP-bound, etc.)
7xKm…  → 32-byte random identifier, base58-encoded (43 chars)

Total length: 50 chars - similar to OpenAI key length for drop-in compat

The token maps in our database to:

Token record schema
{
  token_id:           "7xKm9Lp2QrNvTwY3ZsBcDfGhJmKpQsVxY2",
  real_key_ref:       "<pointer to HSM-encrypted real key>",
  target_provider:    "openai",
  authorized_origins: ["yourdomain.com", "staging.yourdomain.com"],
  authorized_ips:     [],  // optional IP allowlist
  created_at:         "2026-01-15T10:23:00Z",
  last_used_at:       "2026-05-10T14:22:11Z",
  request_count:      14823,
  status:             "active"
}

Host-binding: the cryptographic constraint

Host-binding is enforced at the proxy layer, not at the token layer. The token itself carries no cryptographic binding - it is a lookup key. The binding is stored server-side and enforced on every request.

When the proxy receives a request with a kve_hb_ token in the Authorization header, it:

  1. 1.Extracts the token ID from the Authorization header
  2. 2.Looks up the token record in the edge KV store (sub-millisecond latency)
  3. 3.Extracts the Origin or Referer header from the request
  4. 4.Checks the origin against the authorized_origins list
  5. 5.If the origin is unauthorized: returns 403, logs the attempt, fires a breach alert
  6. 6.If the origin is authorized: fetches the encrypted real key from the HSM reference
  7. 7.Decrypts the real key in isolated Worker memory (never written to disk or logs)
  8. 8.Substitutes the real key into the Authorization header
  9. 9.Forwards the request to the target provider
  10. 10.Returns the provider response to the caller

Cloudflare Workers: why V8 isolates matter

The proxy runs on Cloudflare Workers. The choice is architectural, not operational: V8 isolates provide a fundamentally different security boundary than traditional server processes.

No shared memory between requests

Each Worker invocation runs in a separate V8 isolate. A decrypted real key fetched for request N cannot leak into request N+1, even if they share the same physical hardware.

No file system access

Workers have no persistent file system. Decrypted key material cannot be written to disk, cached in a temp file, or accessed by another process on the same host.

300+ PoPs

Requests are served from the PoP nearest to the caller. The decryption happens close to the origin, reducing cross-region key material transit.

No cold start latency for secrets

The encrypted key reference is fetched from Cloudflare KV at request time. Decryption adds ~0.5ms to request latency.

Request flow: end-to-end

Typical OpenAI chat completion via KeyVault Edge
// Your code (unchanged from standard OpenAI SDK usage)
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,       // kve_hb_...
  baseURL: process.env.OPENAI_BASE_URL,     // https://openai.keyvaultedge.com/v1
});

const completion = await openai.chat.completions.create({ ... });

// ─── What happens at the network layer ───────────────────────────────────
// 1. SDK sends:  POST https://openai.keyvaultedge.com/v1/chat/completions
//                Authorization: Bearer kve_hb_...
//                Origin: https://yourdomain.com
//
// 2. Worker receives request at nearest PoP
// 3. Validates Origin against token's authorized_origins
// 4. Fetches encrypted key from KV + HSM
// 5. Forwards:   POST https://api.openai.com/v1/chat/completions
//                Authorization: Bearer sk-proj-real_key_here
//                (Origin header stripped from outbound request)
//
// 6. Streams response back to caller
// ─────────────────────────────────────────────────────────────────────────

The round-trip latency overhead is typically 2–5ms. For streaming responses, the Worker begins forwarding chunks immediately - the decrypt-and-forward happens before the first token is returned.

Breach detection and alerting

Every request to the proxy is logged with the following metadata (no request body content is logged):

Request log entry
{
  timestamp:    "2026-05-10T14:22:11.341Z",
  token_id:     "7xKm9Lp2...",
  origin:       "https://yourdomain.com",
  ip:           "203.0.113.42",
  cf_country:   "US",
  cf_pop:       "SJC",
  path:         "/v1/chat/completions",
  status:       200,
  authorized:   true,
  latency_ms:   3
}

When authorized: false is logged (unauthorized origin attempting to use a token), the system:

  • Fires a webhook to the user's configured breach alert endpoint
  • Sends an email notification within 30 seconds
  • Adds the requesting origin and IP to the anomaly log in the dashboard
  • Applies exponential backoff to subsequent unauthorized requests from the same IP

Threat model

Token leaked in source code

Attacker gets a 403 from the proxy. Breach alert fires. Zero API access.

Token leaked in CI logs

Same as above. Host-binding check fails for any non-authorised origin.

Token used from a compromised authorized domain

Requests succeed (the domain is authorized). Anomalous usage patterns trigger alerts. Revoke token from dashboard.

KeyVault Edge infrastructure compromised

Real keys are stored encrypted in HSM. Compromise of Worker code would not expose raw keys without also compromising the HSM. Scope: all tokens potentially revocable from provider dashboard.

Request body interception

All connections are TLS 1.3. Worker-to-provider leg is also TLS. No request body logging by design.

Try the architecture in your project

Two environment variable changes. No SDK changes. Free for up to 3 tokens and 100K requests per month.

Get started free