WHAT IS IT?
Crawlee is Apify's Node.js/TypeScript framework for building web crawlers that actually hold up in production. It unifies plain HTTP scraping (Cheerio, JSDOM) and headless browser automation (Playwright, Puppeteer) behind a single API. The goal: ship scrapers that look like real users and sail past modern bot protections without homegrown hacks.
WHY IS IT INTERESTING?
- Unified HTTP and headless API: Same routing logic, same queue handling, whether you're firing Cheerio or driving Playwright. Swapping engines becomes an implementation detail, not a rewrite.
- Anti-blocking out of the box: Realistic TLS fingerprints, coherent browser headers, session and proxy rotation. Defaults are tuned to avoid getting flagged by Cloudflare, DataDome and friends.
- Persistent smart queue: Automatic request deduplication, BFS or DFS traversal, resume-after-crash semantics. Your crawl state survives interruptions.
- Native autoscaling: The framework adjusts concurrency based on available CPU and memory. No manual tuning, it adapts to whatever hardware it runs on.
- TypeScript first: Full types, solid IDE autocomplete, clean integration into modern Node projects. The CLI scaffolds a ready-to-run project in seconds.
- Apify ecosystem: The library runs perfectly fine locally, but deploys without friction onto the Apify platform if you ever need managed production hosting.
USE CASES
- Scraping JavaScript-heavy sites that require an actual browser engine
- Large-scale data collection to train LLMs or feed RAG pipelines
- Price, stock or content monitoring across e-commerce catalogs
- Structured extraction from SaaS apps that don't expose a public API
- Automated competitive intelligence with deduplication and crash recovery
