AdaptiveOcr

Adaptive OCR pipeline: heuristic-first with OCR fallback per page. Born-digital pages are extracted directly from the PDF content stream — no rasterization, no segmentation, no OCR — yielding sub-2 s/page throughput on typical documents. Image-based or non-PDF pages transparently fall through to OCR so callers don't need to know the document type up front. Does not require a VLM provider.