Rate limits
Rate limiting is applied per API key (after authentication) using a token bucket per customer.
Limits by tier
| Tier | Approx. RPS | Burst (typical) |
|---|---|---|
| sandbox | 10 | 20 |
| growth | 100 | 200 |
| scale | 500 | 1000 |
| enterprise | 1000 | 2000 |
Defaults can be overridden by deployment environment (RATE_LIMIT_RPS / RATE_LIMIT_BURST).
429 response
{
"error": "rate limit exceeded"
}
Headers:
Retry-After— seconds to wait before retrying (Gateway sets a short window, e.g.1).
Example:
HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: application/json
{"error":"rate limit exceeded"}
Best practices
- Exponential backoff on 429 (respect
Retry-After). - Queue bursts client-side instead of opening thousands of parallel connections.
- Prefer webhooks over tight polling loops (see Webhooks).
Python pattern
import time
import requests
def get_with_backoff(url, headers, max_retries=5):
for i in range(max_retries):
r = requests.get(url, headers=headers, timeout=30)
if r.status_code != 429:
return r
ra = int(r.headers.get("Retry-After", "2"))
time.sleep(ra + 2 ** i)
return r