Platform LLM Quotas

DriftWise rate-limits platform-LLM usage per org via two stacked gates: a weekly quota and an hourly rate limit. BYOK bypasses both.

How the caps apply

Plan	Weekly quota	Hourly rate limit
Free	5 / ISO-week	unlimited
Team	unlimited	20 / hour
Enterprise	unlimited	contract-negotiated (default unlimited)

-1 = unlimited (gate skipped, no DB touch). 0 = hard-off (every call is immediately rejected — used for enterprise accounts with paused contracts).

Bucket semantics

Weekly: ISO-week, reset at 00:00 UTC each Monday. Resets are absolute — there is no per-org anchor date.
Hourly: fixed window (not sliding). A caller can burst up to 20 calls at 12:59:xx and another 20 calls at 13:00:xx — this is a documented limitation of fixed-window limiters, not a safety feature. The weekly gate caps the worst-case blast radius; the hourly gate is smoothing, not an absolute throughput guarantee.

Every reserve call runs through a single transaction:

If weekly cap is finite, atomically increment the weekly bucket.
If hourly cap is finite, atomically increment the hourly bucket.
If the hourly gate denies after the weekly increment, the weekly counter is decremented in the same transaction — blocked calls never consume quota.

HTTP response shapes

402 `plan_weekly_quota_exhausted`

{
  "code": "plan_weekly_quota_exhausted",
  "error": "plan_weekly_quota_exhausted",
  "message": "Weekly AI analysis quota reached for your plan.",
  "required_plan": "team",
  "used": 5,
  "cap": 5,
  "week_resets_at": "2026-04-27T00:00:00Z",
  "byok_config_url": "/api/v2/orgs/<org_id>/llm-config"
}

code and error always carry the same value — code is the stable machine-readable field, and error mirrors it for backward compatibility with pre-2025 API-key callers. Dispatch on code.

429 `plan_hourly_rate_limit`

{
  "error": "plan_hourly_rate_limit",
  "message": "hourly AI analysis rate limit reached",
  "used": 20,
  "cap": 20,
  "hour_resets_at": "2026-04-21T14:00:00Z",
  "byok_config_url": "/api/v2/orgs/<org_id>/llm-config"
}

A Retry-After header (in seconds, ceiling — rounded up, minimum 1) accompanies every 429. Ceiling semantics matter: a client that waits exactly Retry-After seconds must land in the next bucket, not the tail of the current one.

402 `plan_hard_off`

Contract-paused or explicitly-disabled org: weekly_platform_llm_quota = 0 or hourly_platform_llm_limit = 0. Body shape matches plan_weekly_quota_exhausted — code is plan_hard_off, message is "Platform AI analyses are disabled for this plan.", required_plan is "team", and a bucket field carries "weekly" or "hourly". Unblock with BYOK, or contact support.

Reading your usage

GET /orgs/:id/llm-usage returns the current-bucket used/cap for both the weekly and hourly gates plus the next reset timestamp for each. -1 in any *_cap field means unlimited. When byok_configured is true, the caps are advisory — BYOK requests bypass both gates. See LLM Providers for the BYOK setup and the billing tag of the API reference for the endpoint shape.

Rollback on LLM failure

Platform-LLM errors never consume a quota slot. If the upstream provider returns any error (5xx, timeout, network failure, or a 4xx that bubbles out of the LLM client), DriftWise releases the weekly and hourly reservation. The release is atomic — /analyze falls back to a templated narrative and still releases, /generate-fix returns 500 and releases, the drift-narrative worker marks the snapshot error and releases.

BYOK requests never touch the platform-quota counters in the first place, so there is nothing to release. Repeated BYOK failures instead increment the BYOK failure circuit breaker, which trips after consecutive failures and surfaces as 429 byok_rate_limited (with Retry-After) until the backoff window clears or a successful call resets the counter.

Breaking change — per-request BYOK removed (April 2026)

The legacy per-request BYOK shape — llm_config as a field on POST /analyze — has been removed. The strict request decoder rejects unknown fields before the handler runs, returning HTTP 400 with a body shaped like:

{ "error": "invalid request body: json: unknown field \"llm_config\"" }

Migrate to persisted BYOK via PUT /api/v2/orgs/:id/llm-config.

Why the change

One place to configure — the drift-narrative worker (async) now uses the same BYOK credential as the /analyze handler. Previously the worker only used the platform key, which meant free/team orgs never got BYOK narratives.
Quota is per-unit, not per-LLM-call — one /analyze invocation now consumes exactly one quota slot regardless of how many internal LLM calls it triggers.

Security trade-off

Per-request BYOK kept your key ephemeral — it never landed in DriftWise's datastore. Persisted BYOK stores the ciphertext on DriftWise infrastructure next to the encryption key. Mitigations in place:

AES-256-GCM envelope encryption.
GKE Secrets wrapped by a KMS keyring (prevent_destroy = true).
Audit log on every create/update/delete — provider name only, never key material.
Hard-delete on revocation; no soft-delete column.
Least-privilege on the server's encryption key.

If your security policy forbids any customer key material at rest on our infrastructure, open a ticket. An enterprise contract can negotiate a hybrid ephemeral path — that is a planned follow-up, not a current feature.

How the caps apply​

Bucket semantics​

HTTP response shapes​

402 plan_weekly_quota_exhausted​

429 plan_hourly_rate_limit​

402 plan_hard_off​

Reading your usage​

Rollback on LLM failure​

Breaking change — per-request BYOK removed (April 2026)​

Why the change​

Security trade-off​