Skip to main content

Overview

ManyPI lets you convert any website into a reliable, structured API without writing complex scraping code. Simply describe what data you want, and our AI handles the rest.
Perfect for: E-commerce monitoring, lead generation, content aggregation, market research, and any scenario where you need structured data from websites.

How it works

1

Create a scraper

Define what data you want to extract using natural language or a JSON schema.
2

AI extracts the data

Our AI intelligently navigates the page and extracts exactly what you specified.
3

Get structured JSON

Receive clean, validated data in your defined format via API or dashboard.
4

Integrate anywhere

Use the API in your applications, automation workflows, or data pipelines.

Real-world examples

Example 1: E-commerce product API

Goal: Monitor competitor prices and product availability.
Prompt
Extract product information from this e-commerce page:
- Product title
- Current price in USD
- Original price if on sale
- Star rating (out of 5)
- Number of reviews
- Availability status (in stock or out of stock)
- Main product image URL

Example 2: Job listings API

Goal: Aggregate job postings from multiple career sites.
Prompt
Extract job listing information:
- Job title
- Company name
- Location (city and state/country)
- Salary range if available
- Job type (full-time, part-time, contract, remote)
- Posted date
- Application deadline if shown
- Required skills (as an array)
- Job description summary

Example 3: News article API

Goal: Create a content aggregation feed from multiple news sources.
Prompt
Extract article information from news websites:
- Article headline
- Author name
- Publication date and time
- Article category/section
- Full article text
- Featured image URL
- Tags or keywords
- Estimated reading time

Best practices

Good: “Extract product title, price in USD format, star rating out of 5, and boolean availability status”Bad: “Get product info”Specific prompts lead to more accurate schemas and better extraction results.
Test your scraper on 3-5 different pages from the same site to ensure consistency:
  • Different product types
  • Pages with missing data (out of stock, no reviews)
  • Pages with special formatting
This helps catch edge cases before production use.
Not all pages have all fields. Make optional fields nullable in your schema:
{
  "properties": {
    "price": { "type": "number" },
    "salePrice": { 
      "type": ["number", "null"],
      "description": "Only present during sales"
    }
  }
}
Websites can be temporarily unavailable. Add retry logic to your integration:
async function scrapeWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch(/* ... */);
      if (response.ok) return await response.json();
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(1000 * (i + 1)); // Exponential backoff
    }
  }
}
For data that doesn’t change frequently, implement caching:
const cache = new Map();
const CACHE_TTL = 3600000; // 1 hour

async function getCachedData(url) {
  const cached = cache.get(url);
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data;
  }
  
  const data = await scrapeUrl(url);
  cache.set(url, { data, timestamp: Date.now() });
  return data;
}
Use email notifications to stay informed:
  1. Go to Integrations in your dashboard
  2. Enable email notifications for your scraper
  3. Get notified when scrapes complete or fail
  4. Monitor credit usage and set up alerts

Common patterns

Pattern 1: Scheduled scraping

Run scrapers on a schedule using cron jobs or cloud functions:
const cron = require('node-cron');

// Run every day at 9 AM
cron.schedule('0 9 * * *', async () => {
  console.log('Starting daily scrape...');
  
  const urls = await getUrlsToScrape();
  
  for (const url of urls) {
    await scrapeAndStore(url);
  }
  
  console.log('Daily scrape complete');
});

Pattern 2: Webhook integration

Trigger scrapes from external events:
Express.js
app.post('/webhook/new-product', async (req, res) => {
  const { productUrl } = req.body;
  
  // Scrape the new product
  const response = await fetch(
    'https://app.manypi.com/api/scrape/YOUR_SCRAPER_ID',
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANYPI_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ url: productUrl })
    }
  );
  
  const { data } = await response.json();
  
  // Process and store
  await processNewProduct(data);
  
  res.json({ success: true });
});

Pattern 3: Batch processing

Process multiple URLs efficiently:
Python
import asyncio
import aiohttp

async def scrape_batch(urls, scraper_id, api_key):
    async with aiohttp.ClientSession() as session:
        tasks = []
        
        for url in urls:
            task = scrape_url(session, url, scraper_id, api_key)
            tasks.append(task)
        
        results = await asyncio.gather(*tasks)
        return results

async def scrape_url(session, url, scraper_id, api_key):
    async with session.post(
        f'https://app.manypi.com/api/scrape/{scraper_id}',
        headers={
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        },
        json={'url': url}
    ) as response:
        return await response.json()

# Usage
urls = ['url1', 'url2', 'url3', ...]
results = asyncio.run(scrape_batch(urls, 'scraper-id', 'api-key'))

Limitations and considerations

Respect robots.txt and terms of serviceAlways check a website’s robots.txt file and terms of service before scraping. Some sites explicitly prohibit automated access.
Rate limitingBe mindful of request frequency. Excessive requests can:
  • Get your IP blocked
  • Overload target servers
  • Consume credits quickly
Implement reasonable delays between requests (1-2 seconds minimum).
Dynamic contentManyPi handles JavaScript-rendered content automatically. However, some sites use advanced anti-bot measures. Contact support if you encounter issues with specific sites.

Next steps