
MarkItDown
Microsoft's Python library and CLI to convert PDF, Word, Excel, images, audio and more into clean Markdown, ready for LLMs and text analysis pipelines.
3 entries tagged with "ocr"

Microsoft's Python library and CLI to convert PDF, Word, Excel, images, audio and more into clean Markdown, ready for LLMs and text analysis pipelines.

Self-hosted open-source PDF toolkit with 50+ built-in tools. Merge, split, OCR, sign, convert and compress PDFs with REST API and workflow automation.

JavaScript OCR library that runs in the browser and Node.js. Recognizes text in 100+ languages via WebAssembly with no server-side processing needed.