Skip to main content

Rate limits

Rate limiting is applied per API key (after authentication) using a token bucket per customer.

Limits by tier

TierApprox. RPSBurst (typical)
sandbox1020
growth100200
scale5001000
enterprise10002000

Defaults can be overridden by deployment environment (RATE_LIMIT_RPS / RATE_LIMIT_BURST).

429 response

{
"error": "rate limit exceeded"
}

Headers:

  • Retry-After — seconds to wait before retrying (Gateway sets a short window, e.g. 1).

Example:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: application/json

{"error":"rate limit exceeded"}

Best practices

  • Exponential backoff on 429 (respect Retry-After).
  • Queue bursts client-side instead of opening thousands of parallel connections.
  • Prefer webhooks over tight polling loops (see Webhooks).

Python pattern

import time
import requests

def get_with_backoff(url, headers, max_retries=5):
for i in range(max_retries):
r = requests.get(url, headers=headers, timeout=30)
if r.status_code != 429:
return r
ra = int(r.headers.get("Retry-After", "2"))
time.sleep(ra + 2 ** i)
return r