Crawlee

2026.04.15FRAMEWORKTypeScriptz29k

WHAT IS IT?

Crawlee is Apify's Node.js/TypeScript framework for building web crawlers that actually hold up in production. It unifies plain HTTP scraping (Cheerio, JSDOM) and headless browser automation (Playwright, Puppeteer) behind a single API. The goal: ship scrapers that look like real users and sail past modern bot protections without homegrown hacks.

WHY IS IT INTERESTING?

Unified HTTP and headless API: Same routing logic, same queue handling, whether you're firing Cheerio or driving Playwright. Swapping engines becomes an implementation detail, not a rewrite.
Anti-blocking out of the box: Realistic TLS fingerprints, coherent browser headers, session and proxy rotation. Defaults are tuned to avoid getting flagged by Cloudflare, DataDome and friends.
Persistent smart queue: Automatic request deduplication, BFS or DFS traversal, resume-after-crash semantics. Your crawl state survives interruptions.
Native autoscaling: The framework adjusts concurrency based on available CPU and memory. No manual tuning, it adapts to whatever hardware it runs on.
TypeScript first: Full types, solid IDE autocomplete, clean integration into modern Node projects. The CLI scaffolds a ready-to-run project in seconds.
Apify ecosystem: The library runs perfectly fine locally, but deploys without friction onto the Apify platform if you ever need managed production hosting.

USE CASES

Scraping JavaScript-heavy sites that require an actual browser engine
Large-scale data collection to train LLMs or feed RAG pipelines
Price, stock or content monitoring across e-commerce catalogs
Structured extraction from SaaS apps that don't expose a public API
Automated competitive intelligence with deduplication and crash recovery

#javascript #typescript #web-scraping #crawler #automation #headless-browser #nodejs

SOURCES

REPO	https://github.com/apify/crawlee
SITE	https://crawlee.dev
LICENSE	Apache-2.0