Crawlee

Crawlee

WHAT IS IT?

Crawlee is Apify's Node.js/TypeScript framework for building web crawlers that actually hold up in production. It unifies plain HTTP scraping (Cheerio, JSDOM) and headless browser automation (Playwright, Puppeteer) behind a single API. The goal: ship scrapers that look like real users and sail past modern bot protections without homegrown hacks.

WHY IS IT INTERESTING?

  • Unified HTTP and headless API: Same routing logic, same queue handling, whether you're firing Cheerio or driving Playwright. Swapping engines becomes an implementation detail, not a rewrite.
  • Anti-blocking out of the box: Realistic TLS fingerprints, coherent browser headers, session and proxy rotation. Defaults are tuned to avoid getting flagged by Cloudflare, DataDome and friends.
  • Persistent smart queue: Automatic request deduplication, BFS or DFS traversal, resume-after-crash semantics. Your crawl state survives interruptions.
  • Native autoscaling: The framework adjusts concurrency based on available CPU and memory. No manual tuning, it adapts to whatever hardware it runs on.
  • TypeScript first: Full types, solid IDE autocomplete, clean integration into modern Node projects. The CLI scaffolds a ready-to-run project in seconds.
  • Apify ecosystem: The library runs perfectly fine locally, but deploys without friction onto the Apify platform if you ever need managed production hosting.

USE CASES

  • Scraping JavaScript-heavy sites that require an actual browser engine
  • Large-scale data collection to train LLMs or feed RAG pipelines
  • Price, stock or content monitoring across e-commerce catalogs
  • Structured extraction from SaaS apps that don't expose a public API
  • Automated competitive intelligence with deduplication and crash recovery