Rate Limiting

Overview

Responsible scraping protects both your application and target websites. This guide covers rate limiting strategies, retry logic, and best practices for production use.

Important: Excessive requests can get your IP blocked and violate website terms of service. Always implement rate limiting.

Why rate limiting matters

Protect target websites

Prevents server overload
Respects website resources
Maintains good standing with site owners

Protect your application

Avoids IP bans
Prevents credit waste on failed requests
Ensures consistent data quality

Legal and ethical compliance

Respects robots.txt
Follows terms of service
Demonstrates good faith usage

ManyPi rate limits

API limits

All plans have the same rate limit:

Limit Type	Value
Requests per minute	60
Burst limit	10 concurrent requests

Rate limits are applied per API key. You can create multiple API keys in your dashboard to scale horizontally (e.g., 3 API keys = 180 requests/minute).

While rate limits are the same across plans, your credit allocation varies by plan tier. Higher plans get more monthly credits for more scraping volume.

Pro tip: Create separate API keys for different services or environments (production, staging, batch jobs) to isolate rate limits and improve reliability.

Rate limit headers

Every API response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 55
X-RateLimit-Reset: 1640000000

X-RateLimit-Limit: Maximum requests per minute (60)
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when limit resets

Implementing rate limiting

Simple delay between requests

const sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms));

async function scrapeWithDelay(urls) {
  const results = [];
  
  for (const url of urls) {
    const result = await scrapeUrl(url);
    results.push(result);
    
    // Wait 2 seconds between requests
    await sleep(2000);
  }
  
  return results;
}

Token bucket algorithm

More sophisticated rate limiting that allows bursts:

class RateLimiter {
  private tokens: number;
  private lastRefill: number;
  private readonly capacity: number;
  private readonly refillRate: number; // tokens per second

  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  private refill(): void {
    const now = Date.now();
    const timePassed = (now - this.lastRefill) / 1000;
    const tokensToAdd = timePassed * this.refillRate;
    
    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }

  async acquire(): Promise<void> {
    this.refill();
    
    if (this.tokens < 1) {
      const waitTime = (1 - this.tokens) / this.refillRate * 1000;
      await new Promise(resolve => setTimeout(resolve, waitTime));
      this.refill();
    }
    
    this.tokens -= 1;
  }
}

// Usage
const limiter = new RateLimiter(10, 2); // 10 tokens, refill 2 per second

async function scrapeWithRateLimit(urls: string[]) {
  const results = [];
  
  for (const url of urls) {
    await limiter.acquire();
    const result = await scrapeUrl(url);
    results.push(result);
  }
  
  return results;
}

Using p-limit for concurrency control

import pLimit from 'p-limit';

// Allow max 5 concurrent requests
const limit = pLimit(5);

async function scrapeConcurrently(urls) {
  const promises = urls.map(url => 
    limit(() => scrapeUrl(url))
  );
  
  return Promise.all(promises);
}

// With delay between batches
async function scrapeBatches(urls, batchSize = 10) {
  const results = [];
  
  for (let i = 0; i < urls.length; i += batchSize) {
    const batch = urls.slice(i, i + batchSize);
    const batchResults = await scrapeConcurrently(batch);
    results.push(...batchResults);
    
    // Wait between batches
    if (i + batchSize < urls.length) {
      await sleep(5000); // 5 second delay
    }
  }
  
  return results;
}

Retry logic

Exponential backoff

Retry failed requests with increasing delays:

async function scrapeWithRetry(
  url: string,
  maxRetries = 3,
  baseDelay = 1000
): Promise<any> {
  let lastError: Error;
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(
        'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID',
        {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${process.env.MANYPI_API_KEY}`,
            'Content-Type': 'application/json'
          },
          body: JSON.stringify({ url })
        }
      );
      
      if (response.status === 429) {
        // Rate limited - wait and retry
        const retryAfter = response.headers.get('Retry-After');
        const delay = retryAfter 
          ? parseInt(retryAfter) * 1000 
          : baseDelay * Math.pow(2, attempt);
        
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await sleep(delay);
        continue;
      }
      
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`);
      }
      
      const data = await response.json();
      
      if (!data.success) {
        throw new Error(data.error);
      }
      
      return data;
      
    } catch (error) {
      lastError = error as Error;
      
      if (attempt < maxRetries - 1) {
        const delay = baseDelay * Math.pow(2, attempt);
        console.log(`Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`);
        await sleep(delay);
      }
    }
  }
  
  throw new Error(`Failed after ${maxRetries} attempts: ${lastError.message}`);
}

Retry with jitter

Add randomness to prevent thundering herd:

function calculateBackoff(attempt, baseDelay = 1000, maxDelay = 30000) {
  const exponentialDelay = baseDelay * Math.pow(2, attempt);
  const jitter = Math.random() * 1000; // 0-1000ms random jitter
  return Math.min(exponentialDelay + jitter, maxDelay);
}

async function scrapeWithJitter(url, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await scrapeUrl(url);
    } catch (error) {
      if (attempt < maxRetries - 1) {
        const delay = calculateBackoff(attempt);
        await sleep(delay);
      } else {
        throw error;
      }
    }
  }
}

Error handling

Comprehensive error handling

interface ScrapeError {
  type: 'rate_limit' | 'network' | 'validation' | 'server' | 'unknown';
  message: string;
  retryable: boolean;
  retryAfter?: number;
}

async function scrapeWithErrorHandling(url: string): Promise<any> {
  try {
    const response = await fetch(/* ... */);
    const data = await response.json();
    
    if (!data.success) {
      const error: ScrapeError = {
        type: classifyError(data.errorType),
        message: data.error,
        retryable: isRetryable(data.errorType)
      };
      
      throw error;
    }
    
    return data;
    
  } catch (error) {
    if (error instanceof TypeError) {
      // Network error
      throw {
        type: 'network',
        message: 'Network request failed',
        retryable: true
      } as ScrapeError;
    }
    
    throw error;
  }
}

function classifyError(errorType: string): ScrapeError['type'] {
  switch (errorType) {
    case 'rate_limit_error':
      return 'rate_limit';
    case 'validation_error':
      return 'validation';
    case 'internal_error':
      return 'server';
    default:
      return 'unknown';
  }
}

function isRetryable(errorType: string): boolean {
  return ['rate_limit_error', 'internal_error', 'network_error']
    .includes(errorType);
}

Circuit breaker pattern

Prevent cascading failures:

class CircuitBreaker {
  private failures = 0;
  private lastFailureTime = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  
  constructor(
    private threshold: number = 5,
    private timeout: number = 60000 // 1 minute
  ) {}
  
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.timeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker is open');
      }
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  private onSuccess(): void {
    this.failures = 0;
    this.state = 'closed';
  }
  
  private onFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();
    
    if (this.failures >= this.threshold) {
      this.state = 'open';
      console.log('Circuit breaker opened');
    }
  }
}

// Usage
const breaker = new CircuitBreaker(5, 60000);

async function scrapeWithCircuitBreaker(url: string) {
  return breaker.execute(() => scrapeUrl(url));
}

Production patterns

Queue-based processing

Use a queue for reliable, rate-limited scraping:

import Bull from 'bull';

// Create queue
const scrapeQueue = new Bull('scraping', {
  redis: { host: 'localhost', port: 6379 }
});

// Configure rate limiting
scrapeQueue.process({
  concurrency: 5,
  limiter: {
    max: 10,        // 10 jobs
    duration: 60000 // per minute
  }
}, async (job) => {
  const { url, scraperId } = job.data;
  
  try {
    const result = await scrapeUrl(url, scraperId);
    return result;
  } catch (error) {
    // Retry logic handled by Bull
    throw error;
  }
});

// Add jobs to queue
async function queueScrape(url: string, scraperId: string) {
  await scrapeQueue.add(
    { url, scraperId },
    {
      attempts: 3,
      backoff: {
        type: 'exponential',
        delay: 2000
      }
    }
  );
}

// Monitor queue
scrapeQueue.on('completed', (job, result) => {
  console.log(`Job ${job.id} completed`);
});

scrapeQueue.on('failed', (job, error) => {
  console.error(`Job ${job.id} failed:`, error.message);
});

Distributed rate limiting with Redis

Share rate limits across multiple servers:

import Redis from 'ioredis';

class DistributedRateLimiter {
  private redis: Redis;
  
  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }
  
  async checkLimit(
    key: string,
    limit: number,
    window: number // seconds
  ): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - (window * 1000);
    
    // Remove old entries
    await this.redis.zremrangebyscore(key, 0, windowStart);
    
    // Count requests in current window
    const count = await this.redis.zcard(key);
    
    if (count >= limit) {
      return false;
    }
    
    // Add current request
    await this.redis.zadd(key, now, `${now}`);
    await this.redis.expire(key, window);
    
    return true;
  }
}

// Usage
const limiter = new DistributedRateLimiter('redis://localhost:6379');

async function scrapeWithDistributedLimit(url: string) {
  const allowed = await limiter.checkLimit(
    'scraping:rate-limit',
    100, // 100 requests
    60   // per 60 seconds
  );
  
  if (!allowed) {
    throw new Error('Rate limit exceeded');
  }
  
  return scrapeUrl(url);
}

Best practices

Respect target website rate limits

Check robots.txt for crawl-delay directives
Start with 2-3 second delays between requests
Monitor for 429 (Too Many Requests) responses
Adjust delays based on response times

Scrape during off-peak hours

Schedule heavy scraping during low-traffic periods:

function isOffPeakHours() {
  const hour = new Date().getHours();
  // 2 AM - 6 AM local time
  return hour >= 2 && hour < 6;
}

async function scrapeResponsibly(urls) {
  if (!isOffPeakHours()) {
    console.log('Waiting for off-peak hours...');
    await waitUntilOffPeak();
  }
  
  return scrapeBatch(urls);
}

Cache aggressively

Don’t re-scrape data that hasn’t changed:

const cache = new Map();

async function scrapeWithCache(url, ttl = 3600000) {
  const cached = cache.get(url);
  
  if (cached && Date.now() - cached.timestamp < ttl) {
    return cached.data;
  }
  
  const data = await scrapeUrl(url);
  cache.set(url, { data, timestamp: Date.now() });
  
  return data;
}

Monitor and adjust

Track success rates and adjust accordingly:

class ScrapeMonitor {
  private stats = {
    total: 0,
    success: 0,
    failed: 0,
    rateLimited: 0
  };
  
  recordSuccess() {
    this.stats.total++;
    this.stats.success++;
  }
  
  recordFailure(type: string) {
    this.stats.total++;
    this.stats.failed++;
    if (type === 'rate_limit') {
      this.stats.rateLimited++;
    }
  }
  
  getSuccessRate() {
    return this.stats.success / this.stats.total;
  }
  
  shouldSlowDown() {
    // Slow down if >10% rate limited
    return this.stats.rateLimited / this.stats.total > 0.1;
  }
}

Use proxy rotation (if needed)

For high-volume scraping, consider rotating proxies:

const proxies = [
  'http://proxy1.example.com:8080',
  'http://proxy2.example.com:8080',
  'http://proxy3.example.com:8080'
];

let currentProxy = 0;

function getNextProxy() {
  const proxy = proxies[currentProxy];
  currentProxy = (currentProxy + 1) % proxies.length;
  return proxy;
}

Always use legitimate proxy services and respect website terms of service.

Monitoring rate limits

Check remaining quota

async function checkRateLimit() {
  const response = await fetch(
    'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID',
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANYPI_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ url: 'https://example.com' })
    }
  );
  
  const remaining = response.headers.get('X-RateLimit-Remaining');
  const reset = response.headers.get('X-RateLimit-Reset');
  
  console.log(`Requests remaining: ${remaining}`);
  console.log(`Resets at: ${new Date(parseInt(reset!) * 1000)}`);
  
  return {
    remaining: parseInt(remaining!),
    resetAt: new Date(parseInt(reset!) * 1000)
  };
}

Alert on low quota

async function scrapeWithQuotaCheck(url: string) {
  const { remaining } = await checkRateLimit();
  
  if (remaining < 10) {
    await sendAlert('Low rate limit quota', {
      remaining,
      url
    });
  }
  
  return scrapeUrl(url);
}

Next steps

View usage

Monitor your API usage and rate limits

Upgrade plan

Get higher rate limits with Pro or Business plans

API Reference

See complete API documentation

Contact support

Need custom rate limits? Get in touch

Getting started

Features

Overview

Why rate limiting matters

Protect target websites

Protect your application

Legal and ethical compliance

ManyPi rate limits

API limits

Rate limit headers

Implementing rate limiting

Simple delay between requests

Token bucket algorithm

Using p-limit for concurrency control

Retry logic

Exponential backoff

Retry with jitter

Error handling

Comprehensive error handling

Circuit breaker pattern

Production patterns

Queue-based processing

Distributed rate limiting with Redis

Best practices

Monitoring rate limits

Check remaining quota

Alert on low quota

Next steps

View usage

Upgrade plan

API Reference

Contact support

Getting started

Features

​Overview

​Why rate limiting matters

​Protect target websites

​Protect your application

​Legal and ethical compliance

​ManyPi rate limits

​API limits

​Rate limit headers

​Implementing rate limiting

​Simple delay between requests

​Token bucket algorithm

​Using p-limit for concurrency control

​Retry logic

​Exponential backoff

​Retry with jitter

​Error handling

​Comprehensive error handling

​Circuit breaker pattern

​Production patterns

​Queue-based processing

​Distributed rate limiting with Redis

​Best practices

​Monitoring rate limits

​Check remaining quota

​Alert on low quota

​Next steps

View usage

Upgrade plan

API Reference

Contact support

Overview

Why rate limiting matters

Protect target websites

Protect your application

Legal and ethical compliance

ManyPi rate limits

API limits

Rate limit headers

Implementing rate limiting

Simple delay between requests

Token bucket algorithm

Using p-limit for concurrency control

Retry logic

Exponential backoff

Retry with jitter

Error handling

Comprehensive error handling

Circuit breaker pattern

Production patterns

Queue-based processing

Distributed rate limiting with Redis

Best practices

Monitoring rate limits

Check remaining quota

Alert on low quota

Next steps