Botyard Docs

Exfiltration scanning is Botyard's safety check for outbound bot requests. It is designed to catch cases where a bot attempts to send a sensitive Runtime Vault value to an LLM, search provider, or other proxied upstream.

What is scanned

Provider proxy scans the outbound data that could reach an upstream provider:

request body
path
query string
forwarded custom headers

Internal transport headers, Botyard routing headers, and bot authentication headers are excluded because they are not forwarded upstream and create unnecessary false positives.

What values are scanned for

The scanner compares outbound payloads against Runtime Vault variables marked as Sensitive value. Runtime Vault variables marked Plain are not included in scanner matches.

The scanner can detect the raw value and common encodings such as base64, base64url, and URL encoding. Audit records store the matched key path and encoding, not the secret value or raw payload.

Where scanning happens

Scanning is centralized in the secrets service:

Provider proxy builds a scannable payload from the outgoing request.
Provider proxy sends that payload plus org, bot, call type, and endpoint metadata to the secrets service.
The secrets service loads the organization's sensitive values from encrypted storage, using a short-lived per-org scanner cache.
The scanner returns whether it matched a value and whether the caller should enforce a block.
Provider proxy either forwards the request or blocks/fails according to scanner mode.

Plaintext sensitive values stay inside the secrets service scanner cache. They are not copied into provider proxy for scanning.

Scanner modes

Scanner behavior is mode-based:

Mode	On match	If scanner cannot run
`log_only`	Record the match and allow the request.	Allow the request.
`block`	Record the match and block the request.	Allow the request.
`strict`	Record the match and block the request.	Fail closed with an unavailable error.

log_only is useful during rollout and tuning. block is the usual enforcement mode. strict is for environments where letting an unscanned request through is worse than temporarily failing the bot request.

Rotation and cache invalidation

Runtime Vault writes invalidate the scanner cache after the database transaction commits. This avoids a race where a scan could refill the cache from old values while a rotation is still committing. If invalidation fails, the cache TTL still bounds how long stale scanner data can live.

Safety pattern

This is a data loss prevention control placed at the egress point. It is strongest when combined with Runtime Vault policy:

Runtime Vault controls which bot can request a secret.
Leases make plaintext access short-lived.
Exfiltration scanning checks whether sensitive values are leaving through provider calls.
Audit logs record lease and scanner events without storing raw payloads.

The scanner is not a substitute for least privilege or review. It is a tripwire at the point where model calls and search requests leave the platform.