Skip to main content

Overview

Responsible scraping protects both your application and target websites. This guide covers rate limiting strategies, retry logic, and best practices for production use.
Important: Excessive requests can get your IP blocked and violate website terms of service. Always implement rate limiting.

Why rate limiting matters

Protect target websites

  • Prevents server overload
  • Respects website resources
  • Maintains good standing with site owners

Protect your application

  • Avoids IP bans
  • Prevents credit waste on failed requests
  • Ensures consistent data quality
  • Respects robots.txt
  • Follows terms of service
  • Demonstrates good faith usage

ManyPi rate limits

API limits

All plans have the same rate limit:
Limit TypeValue
Requests per minute60
Burst limit10 concurrent requests
Rate limits are applied per API key. You can create multiple API keys in your dashboard to scale horizontally (e.g., 3 API keys = 180 requests/minute).
While rate limits are the same across plans, your credit allocation varies by plan tier. Higher plans get more monthly credits for more scraping volume.
Pro tip: Create separate API keys for different services or environments (production, staging, batch jobs) to isolate rate limits and improve reliability.

Rate limit headers

Every API response includes rate limit information:
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 55
X-RateLimit-Reset: 1640000000
  • X-RateLimit-Limit: Maximum requests per minute (60)
  • X-RateLimit-Remaining: Requests remaining in current window
  • X-RateLimit-Reset: Unix timestamp when limit resets

Implementing rate limiting

Simple delay between requests

const sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms));

async function scrapeWithDelay(urls) {
  const results = [];
  
  for (const url of urls) {
    const result = await scrapeUrl(url);
    results.push(result);
    
    // Wait 2 seconds between requests
    await sleep(2000);
  }
  
  return results;
}

Token bucket algorithm

More sophisticated rate limiting that allows bursts:
class RateLimiter {
  private tokens: number;
  private lastRefill: number;
  private readonly capacity: number;
  private readonly refillRate: number; // tokens per second

  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  private refill(): void {
    const now = Date.now();
    const timePassed = (now - this.lastRefill) / 1000;
    const tokensToAdd = timePassed * this.refillRate;
    
    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }

  async acquire(): Promise<void> {
    this.refill();
    
    if (this.tokens < 1) {
      const waitTime = (1 - this.tokens) / this.refillRate * 1000;
      await new Promise(resolve => setTimeout(resolve, waitTime));
      this.refill();
    }
    
    this.tokens -= 1;
  }
}

// Usage
const limiter = new RateLimiter(10, 2); // 10 tokens, refill 2 per second

async function scrapeWithRateLimit(urls: string[]) {
  const results = [];
  
  for (const url of urls) {
    await limiter.acquire();
    const result = await scrapeUrl(url);
    results.push(result);
  }
  
  return results;
}

Using p-limit for concurrency control

import pLimit from 'p-limit';

// Allow max 5 concurrent requests
const limit = pLimit(5);

async function scrapeConcurrently(urls) {
  const promises = urls.map(url => 
    limit(() => scrapeUrl(url))
  );
  
  return Promise.all(promises);
}

// With delay between batches
async function scrapeBatches(urls, batchSize = 10) {
  const results = [];
  
  for (let i = 0; i < urls.length; i += batchSize) {
    const batch = urls.slice(i, i + batchSize);
    const batchResults = await scrapeConcurrently(batch);
    results.push(...batchResults);
    
    // Wait between batches
    if (i + batchSize < urls.length) {
      await sleep(5000); // 5 second delay
    }
  }
  
  return results;
}

Retry logic

Exponential backoff

Retry failed requests with increasing delays:
async function scrapeWithRetry(
  url: string,
  maxRetries = 3,
  baseDelay = 1000
): Promise<any> {
  let lastError: Error;
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(
        'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID',
        {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${process.env.MANYPI_API_KEY}`,
            'Content-Type': 'application/json'
          },
          body: JSON.stringify({ url })
        }
      );
      
      if (response.status === 429) {
        // Rate limited - wait and retry
        const retryAfter = response.headers.get('Retry-After');
        const delay = retryAfter 
          ? parseInt(retryAfter) * 1000 
          : baseDelay * Math.pow(2, attempt);
        
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await sleep(delay);
        continue;
      }
      
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`);
      }
      
      const data = await response.json();
      
      if (!data.success) {
        throw new Error(data.error);
      }
      
      return data;
      
    } catch (error) {
      lastError = error as Error;
      
      if (attempt < maxRetries - 1) {
        const delay = baseDelay * Math.pow(2, attempt);
        console.log(`Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`);
        await sleep(delay);
      }
    }
  }
  
  throw new Error(`Failed after ${maxRetries} attempts: ${lastError.message}`);
}

Retry with jitter

Add randomness to prevent thundering herd:
function calculateBackoff(attempt, baseDelay = 1000, maxDelay = 30000) {
  const exponentialDelay = baseDelay * Math.pow(2, attempt);
  const jitter = Math.random() * 1000; // 0-1000ms random jitter
  return Math.min(exponentialDelay + jitter, maxDelay);
}

async function scrapeWithJitter(url, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await scrapeUrl(url);
    } catch (error) {
      if (attempt < maxRetries - 1) {
        const delay = calculateBackoff(attempt);
        await sleep(delay);
      } else {
        throw error;
      }
    }
  }
}

Error handling

Comprehensive error handling

interface ScrapeError {
  type: 'rate_limit' | 'network' | 'validation' | 'server' | 'unknown';
  message: string;
  retryable: boolean;
  retryAfter?: number;
}

async function scrapeWithErrorHandling(url: string): Promise<any> {
  try {
    const response = await fetch(/* ... */);
    const data = await response.json();
    
    if (!data.success) {
      const error: ScrapeError = {
        type: classifyError(data.errorType),
        message: data.error,
        retryable: isRetryable(data.errorType)
      };
      
      throw error;
    }
    
    return data;
    
  } catch (error) {
    if (error instanceof TypeError) {
      // Network error
      throw {
        type: 'network',
        message: 'Network request failed',
        retryable: true
      } as ScrapeError;
    }
    
    throw error;
  }
}

function classifyError(errorType: string): ScrapeError['type'] {
  switch (errorType) {
    case 'rate_limit_error':
      return 'rate_limit';
    case 'validation_error':
      return 'validation';
    case 'internal_error':
      return 'server';
    default:
      return 'unknown';
  }
}

function isRetryable(errorType: string): boolean {
  return ['rate_limit_error', 'internal_error', 'network_error']
    .includes(errorType);
}

Circuit breaker pattern

Prevent cascading failures:
class CircuitBreaker {
  private failures = 0;
  private lastFailureTime = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  
  constructor(
    private threshold: number = 5,
    private timeout: number = 60000 // 1 minute
  ) {}
  
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.timeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker is open');
      }
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  private onSuccess(): void {
    this.failures = 0;
    this.state = 'closed';
  }
  
  private onFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();
    
    if (this.failures >= this.threshold) {
      this.state = 'open';
      console.log('Circuit breaker opened');
    }
  }
}

// Usage
const breaker = new CircuitBreaker(5, 60000);

async function scrapeWithCircuitBreaker(url: string) {
  return breaker.execute(() => scrapeUrl(url));
}

Production patterns

Queue-based processing

Use a queue for reliable, rate-limited scraping:
import Bull from 'bull';

// Create queue
const scrapeQueue = new Bull('scraping', {
  redis: { host: 'localhost', port: 6379 }
});

// Configure rate limiting
scrapeQueue.process({
  concurrency: 5,
  limiter: {
    max: 10,        // 10 jobs
    duration: 60000 // per minute
  }
}, async (job) => {
  const { url, scraperId } = job.data;
  
  try {
    const result = await scrapeUrl(url, scraperId);
    return result;
  } catch (error) {
    // Retry logic handled by Bull
    throw error;
  }
});

// Add jobs to queue
async function queueScrape(url: string, scraperId: string) {
  await scrapeQueue.add(
    { url, scraperId },
    {
      attempts: 3,
      backoff: {
        type: 'exponential',
        delay: 2000
      }
    }
  );
}

// Monitor queue
scrapeQueue.on('completed', (job, result) => {
  console.log(`Job ${job.id} completed`);
});

scrapeQueue.on('failed', (job, error) => {
  console.error(`Job ${job.id} failed:`, error.message);
});

Distributed rate limiting with Redis

Share rate limits across multiple servers:
import Redis from 'ioredis';

class DistributedRateLimiter {
  private redis: Redis;
  
  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }
  
  async checkLimit(
    key: string,
    limit: number,
    window: number // seconds
  ): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - (window * 1000);
    
    // Remove old entries
    await this.redis.zremrangebyscore(key, 0, windowStart);
    
    // Count requests in current window
    const count = await this.redis.zcard(key);
    
    if (count >= limit) {
      return false;
    }
    
    // Add current request
    await this.redis.zadd(key, now, `${now}`);
    await this.redis.expire(key, window);
    
    return true;
  }
}

// Usage
const limiter = new DistributedRateLimiter('redis://localhost:6379');

async function scrapeWithDistributedLimit(url: string) {
  const allowed = await limiter.checkLimit(
    'scraping:rate-limit',
    100, // 100 requests
    60   // per 60 seconds
  );
  
  if (!allowed) {
    throw new Error('Rate limit exceeded');
  }
  
  return scrapeUrl(url);
}

Best practices

  • Check robots.txt for crawl-delay directives
  • Start with 2-3 second delays between requests
  • Monitor for 429 (Too Many Requests) responses
  • Adjust delays based on response times
Schedule heavy scraping during low-traffic periods:
function isOffPeakHours() {
  const hour = new Date().getHours();
  // 2 AM - 6 AM local time
  return hour >= 2 && hour < 6;
}

async function scrapeResponsibly(urls) {
  if (!isOffPeakHours()) {
    console.log('Waiting for off-peak hours...');
    await waitUntilOffPeak();
  }
  
  return scrapeBatch(urls);
}
Don’t re-scrape data that hasn’t changed:
const cache = new Map();

async function scrapeWithCache(url, ttl = 3600000) {
  const cached = cache.get(url);
  
  if (cached && Date.now() - cached.timestamp < ttl) {
    return cached.data;
  }
  
  const data = await scrapeUrl(url);
  cache.set(url, { data, timestamp: Date.now() });
  
  return data;
}
Track success rates and adjust accordingly:
class ScrapeMonitor {
  private stats = {
    total: 0,
    success: 0,
    failed: 0,
    rateLimited: 0
  };
  
  recordSuccess() {
    this.stats.total++;
    this.stats.success++;
  }
  
  recordFailure(type: string) {
    this.stats.total++;
    this.stats.failed++;
    if (type === 'rate_limit') {
      this.stats.rateLimited++;
    }
  }
  
  getSuccessRate() {
    return this.stats.success / this.stats.total;
  }
  
  shouldSlowDown() {
    // Slow down if >10% rate limited
    return this.stats.rateLimited / this.stats.total > 0.1;
  }
}
For high-volume scraping, consider rotating proxies:
const proxies = [
  'http://proxy1.example.com:8080',
  'http://proxy2.example.com:8080',
  'http://proxy3.example.com:8080'
];

let currentProxy = 0;

function getNextProxy() {
  const proxy = proxies[currentProxy];
  currentProxy = (currentProxy + 1) % proxies.length;
  return proxy;
}
Always use legitimate proxy services and respect website terms of service.

Monitoring rate limits

Check remaining quota

async function checkRateLimit() {
  const response = await fetch(
    'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID',
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANYPI_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ url: 'https://example.com' })
    }
  );
  
  const remaining = response.headers.get('X-RateLimit-Remaining');
  const reset = response.headers.get('X-RateLimit-Reset');
  
  console.log(`Requests remaining: ${remaining}`);
  console.log(`Resets at: ${new Date(parseInt(reset!) * 1000)}`);
  
  return {
    remaining: parseInt(remaining!),
    resetAt: new Date(parseInt(reset!) * 1000)
  };
}

Alert on low quota

async function scrapeWithQuotaCheck(url: string) {
  const { remaining } = await checkRateLimit();
  
  if (remaining < 10) {
    await sendAlert('Low rate limit quota', {
      remaining,
      url
    });
  }
  
  return scrapeUrl(url);
}

Next steps

View usage

Monitor your API usage and rate limits

Upgrade plan

Get higher rate limits with Pro or Business plans

API Reference

See complete API documentation

Contact support

Need custom rate limits? Get in touch