Rate Limiting

Request limits and quotas.

Rate Limit Headers

Every response includes rate limit information:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1702837200

Header	Description
`X-RateLimit-Limit`	Maximum requests per window
`X-RateLimit-Remaining`	Remaining requests in window
`X-RateLimit-Reset`	Unix timestamp when limit resets

Default Limits

Plan	Requests/min	Requests/day
Free	10	100
Starter	60	1,000
Pro	300	10,000
Enterprise	Custom	Custom

Rate Limit Exceeded

When you exceed the limit:

Status: 429 Too Many Requests

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Try again in 60 seconds.",
  "details": {
    "limit": 100,
    "remaining": 0,
    "reset": 1702837200,
    "retry_after": 60
  },
  "status": "error"
}

Best Practices

Implement Exponential Backoff

async function callWithRetry(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const retryAfter = error.details?.retry_after || Math.pow(2, i);
        await sleep(retryAfter * 1000);
        continue;
      }
      throw error;
    }
  }
}

Monitor Rate Limit Headers

const response = await fetch('/api/v3/llm/call', options);

const remaining = response.headers.get('X-RateLimit-Remaining');
const reset = response.headers.get('X-RateLimit-Reset');

if (remaining < 10) {
  console.warn(`Low rate limit: ${remaining} requests remaining`);
}

Use Request Queuing

For high-volume applications, implement a queue to smooth out request bursts:

class RequestQueue {
  constructor(maxPerSecond = 10) {
    this.queue = [];
    this.processing = false;
    this.interval = 1000 / maxPerSecond;
  }

  async add(request) {
    return new Promise((resolve, reject) => {
      this.queue.push({ request, resolve, reject });
      this.process();
    });
  }

  async process() {
    if (this.processing || this.queue.length === 0) return;

    this.processing = true;
    const { request, resolve, reject } = this.queue.shift();

    try {
      const result = await request();
      resolve(result);
    } catch (error) {
      reject(error);
    }

    setTimeout(() => {
      this.processing = false;
      this.process();
    }, this.interval);
  }
}

Quota Management

Monthly quotas are tracked separately from rate limits.

Check Quota

curl https://llm.zihin.ai/api/quota \
  -H "X-Api-Key: zhn_live_xxxxx"

Response:

{
  "success": true,
  "quota": {
    "limit": 10000,
    "used": 2500,
    "remaining": 7500,
    "reset_date": "2025-02-01T00:00:00.000Z"
  }
}

Quota Exceeded

{
  "error": "quota_exceeded",
  "message": "Monthly quota exceeded",
  "details": {
    "limit": 10000,
    "used": 10000,
    "reset_date": "2025-02-01T00:00:00.000Z"
  },
  "status": "error"
}

Rate Limit Headers​

Default Limits​

Rate Limit Exceeded​

Best Practices​

Implement Exponential Backoff​

Monitor Rate Limit Headers​

Use Request Queuing​

Quota Management​

Check Quota​

Quota Exceeded​