Speeding up first ICR operation by predownloading models
Pre-downloading vision API models using the warmup feature enables teams to eliminate first-request latency, ensure predictable performance across all document processing operations, and verify deployment readiness before accepting production traffic. Whether you’re building user-facing applications where the first user request must be fast without model download delays, implementing batch processing systems that start large jobs immediately without waiting for model downloads, creating containerized services with health checks that verify all dependencies are available before accepting traffic, deploying offline-capable systems that download models while connected then process documents without internet access later, or building production APIs with strict SLA requirements where every request must meet consistent latency targets, the warmup feature provides predictable startup behavior, deployment verification, and consistent performance. The vision API uses AI models for document understanding features like intelligent content recognition (ICR), including layout detection, text recognition, table extraction, equation recognition, and key-value pair detection. These models are downloaded on demand when first used, which introduces latency during the initial processing request — warmup pre-downloads all required models before processing documents, ensuring consistent performance from the first request.
This guide demonstrates using the warmup feature to pre-download vision API models for ICR operations, eliminating first-request latency and ensuring predictable document processing performance from application startup.
How Nutrient helps you achieve this
Nutrient Python SDK handles vision API model management, download orchestration, and caching infrastructure. With the SDK, you don’t need to worry about:
- Managing AI model downloads, storage locations, and cache invalidation strategies
- Determining which models are required for specific vision engine configurations (ICR, OCR, VLM)
- Handling network failures during model downloads and implementing retry logic
- Coordinating model availability checks with application readiness probes and health endpoints
Instead, Nutrient provides an API that handles all the complexity behind the scenes, enabling you to focus on your business logic.
Complete implementation
Below is a complete working example that demonstrates pre-downloading vision API models using the warmup feature for ICR operations. The following lines set up the Python application. Start by importing the required classes from the SDK. The imports include the Document class for opening documents, Vision for accessing vision API operations including warmup, and VisionEngine enumeration for specifying ICR mode:
from nutrient_sdk import Document, Visionfrom nutrient_sdk.settings import VisionEngineWarming up the vision API
Open a document using a context manager(opens in a new tab), configure the vision settings for ICR mode, and call warmup to pre-download all required models. The following code opens an image or document file (PNG format in this example) using Document.open() with a file path parameter. The context manager pattern (using the with statement) ensures the document is properly closed after processing, releasing memory and document data regardless of whether warmup succeeds or fails. The document.settings.vision_settings.engine property assignment specifies which vision engine to use — setting it to VisionEngine.Icr configures the ICR engine instead of alternatives like OCR or VLM-enhanced modes — ICR provides advanced document understanding, including layout detection, text recognition, table extraction, equation recognition, and key-value pair detection. The Vision.set() static method creates a vision processor instance bound to the document with the configured ICR settings. The vision.warmup() method triggers the model download process, fetching all AI models required for ICR operations from the SDK’s model repository and caching them locally for subsequent processing. The print statements provide feedback during the potentially multi-second download process, informing users that model downloads are in progress and when models are ready for document processing:
with Document.open("input.png") as document: # Configure ICR engine document.settings.vision_settings.engine = VisionEngine.Icr
# Create Vision instance vision = Vision.set(document)
# Pre-download all required models # This ensures subsequent extract_content() calls are fast print("Downloading Vision models...") vision.warmup() print("Models ready!")Processing documents after warmup
Once warmup completes, all subsequent vision operations will execute without model download delays, ensuring predictable and fast processing. The following code demonstrates document processing after warmup. The vision.extract_content() method performs ICR operations on the document, applying layout detection to identify document structure (headings, paragraphs, tables, lists), text recognition to extract textual content with high accuracy, table extraction to identify table structures and cell contents, equation recognition to detect mathematical equations, and key-value pair detection to identify form fields and labeled data. Because warmup has already downloaded all required models, extract_content() executes immediately without network delays or download waits. The method returns extracted content as a JSON string containing the document’s structural and textual information. The nested context manager with open() creates and writes the JSON output to "output.json", ensuring the file is automatically closed after writing completes. The outer context manager automatically closes the document after file writing completes, ensuring proper resource cleanup:
# Now extract_content() won't need to download anything content_json = vision.extract_content()
with open("output.json", "w") as f: f.write(content_json)Best practices
Consider these patterns for using warmup effectively in production environments:
- Application startup — Call warmup during initialization before accepting requests, ensuring models are ready when the first user request arrives, and eliminating latency spikes during initial traffic.
- Background thread — Run warmup asynchronously on a background thread to avoid blocking the main thread during application startup, enabling the application to initialize other components in parallel while models download.
- Health checks — Include warmup status in readiness probes for containerized deployments (Kubernetes, ECS, Docker Swarm), ensuring orchestrators only route traffic to instances where all vision models are available and processing is ready.
- Deployment pipelines — Integrate warmup into deployment scripts or initialization containers to verify model availability before marking deployments as successful, catching network or storage issues early in the deployment process.
- Offline environments — Run warmup while connected to download models, and then disconnect and process documents offline later, enabling air-gapped or restricted network environments to use vision features without continuous internet access.
What gets downloaded?
The warmup process downloads AI models based on your VisionSettings engine configuration — different engines require different model sets with varying download sizes and initialization times:
- ICR mode (
VisionEngine.Icr) — Downloads layout detection models for identifying document structure (headings, paragraphs, tables, lists), text recognition models for extracting textual content with high accuracy, table extraction models for identifying table structures and cell contents, equation recognition models for detecting mathematical equations, and key-value pair detection models for identifying form fields and labeled data. ICR mode provides the most comprehensive document understanding capabilities. - OCR mode (
VisionEngine.Ocr) — Downloads OCR trained data files for text recognition operations, including language-specific character recognition models and text extraction algorithms. OCR mode provides basic text extraction without advanced layout understanding. - VLM-enhanced mode (
VisionEngine.Vlm) — Downloads the same models as ICR mode (layout detection, text recognition, table extraction, equation recognition, key-value pair detection), plus any VLM-specific resources for enhanced document understanding with vision language models. VLM mode combines traditional ICR with AI-powered semantic understanding.
Models are cached locally in the SDK’s model directory after download, persisting across application restarts. Subsequent application restarts won’t need to download models again unless the cache is cleared, the SDK version changes with updated models, or the model directory is deleted. The cache location is managed automatically by the SDK, with no manual cache management required for typical deployments. For containerized environments, consider mounting a persistent volume for the model cache directory to avoid redownloading models on every container restart, or prebaking downloaded models into the container image during the build process.
Conclusion
The vision API warmup workflow consists of several key operations:
- Open a document using a context manager(opens in a new tab) for automatic resource cleanup after warmup and processing complete.
- The SDK supports multiple document formats, including PNG, JPEG, PDF, and TIFF for vision operations.
- Access the vision settings with
document.settings.vision_settings.engineto configure the vision engine. - Set the engine to ICR with property assignment
VisionEngine.Icrto enable advanced document understanding with layout detection, text recognition, table extraction, equation recognition, and key-value pair detection. - Alternative engines include OCR mode for basic text extraction and VLM-enhanced mode for semantic understanding with vision language models.
- Create a vision instance with
Vision.set()bound to the document with configured engine settings. - Call
vision.warmup()to trigger pre-download of all AI models required for the configured vision engine, fetching models from the SDK’s model repository and caching them locally. - Warmup downloads different model sets based on engine configuration — ICR downloads comprehensive document understanding models, OCR downloads text recognition models, and VLM downloads ICR models plus semantic understanding resources.
- Print statements provide feedback during model downloads, informing users about download progress and completion status for potentially multi-second operations.
- After warmup completes, call
vision.extract_content()to perform ICR operations without model download delays, ensuring predictable and fast processing for all subsequent requests. - The
extract_content()method returns extracted content as JSON, including document structure (headings, paragraphs, tables, lists), textual content, table structures, equations, and key-value pairs. - Write the extracted JSON to a file using a nested context manager with
open()for automatic resource cleanup after writing completes. - Handle
NutrientExceptionfor vision processing failures, including model download errors, processing failures, or configuration issues. - The context manager ensures proper resource cleanup when processing completes or exceptions occur.
Nutrient handles AI model downloads, storage locations, cache management, network failure retry logic, model availability verification, and download orchestration so you don’t need to understand model repository protocols or manage complex model infrastructure manually. The warmup feature provides predictable startup behavior and deployment verification for user-facing applications, eliminating first-request latency where the first user must experience fast processing without model download delays; batch processing systems starting large jobs immediately without waiting for model downloads; containerized services with health checks verifying all dependencies before accepting traffic enabling Kubernetes readiness probes; offline-capable systems downloading models while connected, and then processing without internet access later for air-gapped environments; and production APIs with strict SLA requirements where every request meets consistent latency targets without variance from model downloads.
You can download this ready-to-use sample package, fully configured to help you integrate warmup into your application startup.