Overview
Responsible scraping protects both your application and target websites. This guide covers rate limiting strategies, retry logic, and best practices for production use.
Important: Excessive requests can get your IP blocked and violate website terms of service. Always implement rate limiting.
Why rate limiting matters
Protect target websites
Prevents server overload
Respects website resources
Maintains good standing with site owners
Protect your application
Avoids IP bans
Prevents credit waste on failed requests
Ensures consistent data quality
Legal and ethical compliance
Respects robots.txt
Follows terms of service
Demonstrates good faith usage
ManyPi rate limits
API limits
All plans have the same rate limit:
Limit Type Value Requests per minute 60 Burst limit 10 concurrent requests
Rate limits are applied per API key . You can create multiple API keys in your dashboard to scale horizontally (e.g., 3 API keys = 180 requests/minute).
While rate limits are the same across plans, your credit allocation varies by plan tier. Higher plans get more monthly credits for more scraping volume.
Pro tip: Create separate API keys for different services or environments (production, staging, batch jobs) to isolate rate limits and improve reliability.
Every API response includes rate limit information:
HTTP / 1.1 200 OK
X-RateLimit-Limit : 60
X-RateLimit-Remaining : 55
X-RateLimit-Reset : 1640000000
X-RateLimit-Limit: Maximum requests per minute (60)
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when limit resets
Implementing rate limiting
Simple delay between requests
const sleep = ( ms ) => new Promise ( resolve => setTimeout ( resolve , ms ));
async function scrapeWithDelay ( urls ) {
const results = [];
for ( const url of urls ) {
const result = await scrapeUrl ( url );
results . push ( result );
// Wait 2 seconds between requests
await sleep ( 2000 );
}
return results ;
}
Token bucket algorithm
More sophisticated rate limiting that allows bursts:
class RateLimiter {
private tokens : number ;
private lastRefill : number ;
private readonly capacity : number ;
private readonly refillRate : number ; // tokens per second
constructor ( capacity : number , refillRate : number ) {
this . capacity = capacity ;
this . refillRate = refillRate ;
this . tokens = capacity ;
this . lastRefill = Date . now ();
}
private refill () : void {
const now = Date . now ();
const timePassed = ( now - this . lastRefill ) / 1000 ;
const tokensToAdd = timePassed * this . refillRate ;
this . tokens = Math . min ( this . capacity , this . tokens + tokensToAdd );
this . lastRefill = now ;
}
async acquire () : Promise < void > {
this . refill ();
if ( this . tokens < 1 ) {
const waitTime = ( 1 - this . tokens ) / this . refillRate * 1000 ;
await new Promise ( resolve => setTimeout ( resolve , waitTime ));
this . refill ();
}
this . tokens -= 1 ;
}
}
// Usage
const limiter = new RateLimiter ( 10 , 2 ); // 10 tokens, refill 2 per second
async function scrapeWithRateLimit ( urls : string []) {
const results = [];
for ( const url of urls ) {
await limiter . acquire ();
const result = await scrapeUrl ( url );
results . push ( result );
}
return results ;
}
Using p-limit for concurrency control
import pLimit from 'p-limit' ;
// Allow max 5 concurrent requests
const limit = pLimit ( 5 );
async function scrapeConcurrently ( urls ) {
const promises = urls . map ( url =>
limit (() => scrapeUrl ( url ))
);
return Promise . all ( promises );
}
// With delay between batches
async function scrapeBatches ( urls , batchSize = 10 ) {
const results = [];
for ( let i = 0 ; i < urls . length ; i += batchSize ) {
const batch = urls . slice ( i , i + batchSize );
const batchResults = await scrapeConcurrently ( batch );
results . push ( ... batchResults );
// Wait between batches
if ( i + batchSize < urls . length ) {
await sleep ( 5000 ); // 5 second delay
}
}
return results ;
}
Retry logic
Exponential backoff
Retry failed requests with increasing delays:
async function scrapeWithRetry (
url : string ,
maxRetries = 3 ,
baseDelay = 1000
) : Promise < any > {
let lastError : Error ;
for ( let attempt = 0 ; attempt < maxRetries ; attempt ++ ) {
try {
const response = await fetch (
'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID' ,
{
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ process . env . MANYPI_API_KEY } ` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({ url })
}
);
if ( response . status === 429 ) {
// Rate limited - wait and retry
const retryAfter = response . headers . get ( 'Retry-After' );
const delay = retryAfter
? parseInt ( retryAfter ) * 1000
: baseDelay * Math . pow ( 2 , attempt );
console . log ( `Rate limited. Retrying in ${ delay } ms...` );
await sleep ( delay );
continue ;
}
if ( ! response . ok ) {
throw new Error ( `HTTP ${ response . status } : ${ response . statusText } ` );
}
const data = await response . json ();
if ( ! data . success ) {
throw new Error ( data . error );
}
return data ;
} catch ( error ) {
lastError = error as Error ;
if ( attempt < maxRetries - 1 ) {
const delay = baseDelay * Math . pow ( 2 , attempt );
console . log ( `Attempt ${ attempt + 1 } failed. Retrying in ${ delay } ms...` );
await sleep ( delay );
}
}
}
throw new Error ( `Failed after ${ maxRetries } attempts: ${ lastError . message } ` );
}
Retry with jitter
Add randomness to prevent thundering herd:
function calculateBackoff ( attempt , baseDelay = 1000 , maxDelay = 30000 ) {
const exponentialDelay = baseDelay * Math . pow ( 2 , attempt );
const jitter = Math . random () * 1000 ; // 0-1000ms random jitter
return Math . min ( exponentialDelay + jitter , maxDelay );
}
async function scrapeWithJitter ( url , maxRetries = 3 ) {
for ( let attempt = 0 ; attempt < maxRetries ; attempt ++ ) {
try {
return await scrapeUrl ( url );
} catch ( error ) {
if ( attempt < maxRetries - 1 ) {
const delay = calculateBackoff ( attempt );
await sleep ( delay );
} else {
throw error ;
}
}
}
}
Error handling
Comprehensive error handling
interface ScrapeError {
type : 'rate_limit' | 'network' | 'validation' | 'server' | 'unknown' ;
message : string ;
retryable : boolean ;
retryAfter ?: number ;
}
async function scrapeWithErrorHandling ( url : string ) : Promise < any > {
try {
const response = await fetch ( /* ... */ );
const data = await response . json ();
if ( ! data . success ) {
const error : ScrapeError = {
type: classifyError ( data . errorType ),
message: data . error ,
retryable: isRetryable ( data . errorType )
};
throw error ;
}
return data ;
} catch ( error ) {
if ( error instanceof TypeError ) {
// Network error
throw {
type: 'network' ,
message: 'Network request failed' ,
retryable: true
} as ScrapeError ;
}
throw error ;
}
}
function classifyError ( errorType : string ) : ScrapeError [ 'type' ] {
switch ( errorType ) {
case 'rate_limit_error' :
return 'rate_limit' ;
case 'validation_error' :
return 'validation' ;
case 'internal_error' :
return 'server' ;
default :
return 'unknown' ;
}
}
function isRetryable ( errorType : string ) : boolean {
return [ 'rate_limit_error' , 'internal_error' , 'network_error' ]
. includes ( errorType );
}
Circuit breaker pattern
Prevent cascading failures:
class CircuitBreaker {
private failures = 0 ;
private lastFailureTime = 0 ;
private state : 'closed' | 'open' | 'half-open' = 'closed' ;
constructor (
private threshold : number = 5 ,
private timeout : number = 60000 // 1 minute
) {}
async execute < T >( fn : () => Promise < T >) : Promise < T > {
if ( this . state === 'open' ) {
if ( Date . now () - this . lastFailureTime > this . timeout ) {
this . state = 'half-open' ;
} else {
throw new Error ( 'Circuit breaker is open' );
}
}
try {
const result = await fn ();
this . onSuccess ();
return result ;
} catch ( error ) {
this . onFailure ();
throw error ;
}
}
private onSuccess () : void {
this . failures = 0 ;
this . state = 'closed' ;
}
private onFailure () : void {
this . failures ++ ;
this . lastFailureTime = Date . now ();
if ( this . failures >= this . threshold ) {
this . state = 'open' ;
console . log ( 'Circuit breaker opened' );
}
}
}
// Usage
const breaker = new CircuitBreaker ( 5 , 60000 );
async function scrapeWithCircuitBreaker ( url : string ) {
return breaker . execute (() => scrapeUrl ( url ));
}
Production patterns
Queue-based processing
Use a queue for reliable, rate-limited scraping:
import Bull from 'bull' ;
// Create queue
const scrapeQueue = new Bull ( 'scraping' , {
redis: { host: 'localhost' , port: 6379 }
});
// Configure rate limiting
scrapeQueue . process ({
concurrency: 5 ,
limiter: {
max: 10 , // 10 jobs
duration: 60000 // per minute
}
}, async ( job ) => {
const { url , scraperId } = job . data ;
try {
const result = await scrapeUrl ( url , scraperId );
return result ;
} catch ( error ) {
// Retry logic handled by Bull
throw error ;
}
});
// Add jobs to queue
async function queueScrape ( url : string , scraperId : string ) {
await scrapeQueue . add (
{ url , scraperId },
{
attempts: 3 ,
backoff: {
type: 'exponential' ,
delay: 2000
}
}
);
}
// Monitor queue
scrapeQueue . on ( 'completed' , ( job , result ) => {
console . log ( `Job ${ job . id } completed` );
});
scrapeQueue . on ( 'failed' , ( job , error ) => {
console . error ( `Job ${ job . id } failed:` , error . message );
});
Distributed rate limiting with Redis
Share rate limits across multiple servers:
import Redis from 'ioredis' ;
class DistributedRateLimiter {
private redis : Redis ;
constructor ( redisUrl : string ) {
this . redis = new Redis ( redisUrl );
}
async checkLimit (
key : string ,
limit : number ,
window : number // seconds
) : Promise < boolean > {
const now = Date . now ();
const windowStart = now - ( window * 1000 );
// Remove old entries
await this . redis . zremrangebyscore ( key , 0 , windowStart );
// Count requests in current window
const count = await this . redis . zcard ( key );
if ( count >= limit ) {
return false ;
}
// Add current request
await this . redis . zadd ( key , now , ` ${ now } ` );
await this . redis . expire ( key , window );
return true ;
}
}
// Usage
const limiter = new DistributedRateLimiter ( 'redis://localhost:6379' );
async function scrapeWithDistributedLimit ( url : string ) {
const allowed = await limiter . checkLimit (
'scraping:rate-limit' ,
100 , // 100 requests
60 // per 60 seconds
);
if ( ! allowed ) {
throw new Error ( 'Rate limit exceeded' );
}
return scrapeUrl ( url );
}
Best practices
Respect target website rate limits
Check robots.txt for crawl-delay directives
Start with 2-3 second delays between requests
Monitor for 429 (Too Many Requests) responses
Adjust delays based on response times
Scrape during off-peak hours
Schedule heavy scraping during low-traffic periods: function isOffPeakHours () {
const hour = new Date (). getHours ();
// 2 AM - 6 AM local time
return hour >= 2 && hour < 6 ;
}
async function scrapeResponsibly ( urls ) {
if ( ! isOffPeakHours ()) {
console . log ( 'Waiting for off-peak hours...' );
await waitUntilOffPeak ();
}
return scrapeBatch ( urls );
}
Don’t re-scrape data that hasn’t changed: const cache = new Map ();
async function scrapeWithCache ( url , ttl = 3600000 ) {
const cached = cache . get ( url );
if ( cached && Date . now () - cached . timestamp < ttl ) {
return cached . data ;
}
const data = await scrapeUrl ( url );
cache . set ( url , { data , timestamp: Date . now () });
return data ;
}
Track success rates and adjust accordingly: class ScrapeMonitor {
private stats = {
total: 0 ,
success: 0 ,
failed: 0 ,
rateLimited: 0
};
recordSuccess () {
this . stats . total ++ ;
this . stats . success ++ ;
}
recordFailure ( type : string ) {
this . stats . total ++ ;
this . stats . failed ++ ;
if ( type === 'rate_limit' ) {
this . stats . rateLimited ++ ;
}
}
getSuccessRate () {
return this . stats . success / this . stats . total ;
}
shouldSlowDown () {
// Slow down if >10% rate limited
return this . stats . rateLimited / this . stats . total > 0.1 ;
}
}
Use proxy rotation (if needed)
For high-volume scraping, consider rotating proxies: const proxies = [
'http://proxy1.example.com:8080' ,
'http://proxy2.example.com:8080' ,
'http://proxy3.example.com:8080'
];
let currentProxy = 0 ;
function getNextProxy () {
const proxy = proxies [ currentProxy ];
currentProxy = ( currentProxy + 1 ) % proxies . length ;
return proxy ;
}
Always use legitimate proxy services and respect website terms of service.
Monitoring rate limits
Check remaining quota
async function checkRateLimit () {
const response = await fetch (
'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID' ,
{
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ process . env . MANYPI_API_KEY } ` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({ url: 'https://example.com' })
}
);
const remaining = response . headers . get ( 'X-RateLimit-Remaining' );
const reset = response . headers . get ( 'X-RateLimit-Reset' );
console . log ( `Requests remaining: ${ remaining } ` );
console . log ( `Resets at: ${ new Date ( parseInt ( reset ! ) * 1000 ) } ` );
return {
remaining: parseInt ( remaining ! ),
resetAt: new Date ( parseInt ( reset ! ) * 1000 )
};
}
Alert on low quota
async function scrapeWithQuotaCheck ( url : string ) {
const { remaining } = await checkRateLimit ();
if ( remaining < 10 ) {
await sendAlert ( 'Low rate limit quota' , {
remaining ,
url
});
}
return scrapeUrl ( url );
}
Next steps