Healthcare teams spend hours manually keying data from forms, records, and scanned documents. Nutrient Data Extraction API turns those documents into validated, traceable structured data — so your team reviews exceptions, not every field.
Trusted by enterprises, governments, and teams building document workflows at scale
USE CASES
Extract patient demographics, insurance details, consent fields, checkboxes, signatures, and supporting information from intake documents.
Structure provider, payer, procedure, diagnosis, date, authorization, and supporting information for review and routing.
Automate extraction of payer details, dates, amounts, codes, patient responsibility, payment details, and structured tables from EOBs and claims documents.
Extract test names, values, reference ranges, dates, prescription details, provider information, and structured fields from lab and prescription documents.
Prepare mixed scanned and digital medical records for review, search, AI, and downstream workflow use.
GOVERNED EXTRACTION
LLMs can reason over documents — but medical workflows need deterministic, auditable output grounded in the source file, not generated answers that vary between runs.
Typed output remains tied to the source document — not generated answers that change between runs.
Flag uncertain values before they move downstream into patient records or administrative systems.
Every extracted value is anchored to its source location for traceability and human review.
Tables, forms, checkboxes, and key-value regions preserved — not flattened into plain text.
Support validation steps before structured data enters patient records, claims systems, or administrative workflows.
Source context and page detail support audit trails in regulated healthcare environments.
WHAT YOU CAN EXTRACT
Names, DOBs, NPI numbers, addresses, and identifiers from intake and referral documents.
ICD-10, CPT, and related codes extracted with confidence scores for validation before downstream use.
Payer names, policy numbers, group IDs, and coverage details from insurance and claims documents.
Test names, result values, units, and reference ranges preserved from lab reports.
Structured key-value regions, handwritten fields, and checkbox states captured with ICR — with confidence scores for review.
Tables from EOBs, prior authorization forms, and claims documents, preserved with row and column context.
HOW IT WORKS
Parse
Turn medical PDFs, scans, images, and Office files into document structure.
Extract
Identify text, tables, forms, handwriting, checkboxes, codes, dates, and signatures.
Map
Define the fields your workflow needs and get back validated, typed output.
Structure
Return typed JSON for systems and validation, or Markdown for search and AI workflows.
Process
Send structured data into review queues, administrative workflows, or downstream applications.
OUTPUT FORMATS
Confidence scores, coordinates, and page context included.
Choose output: "json" or Markdown per request.
Spatial JSON
For extraction · validation · review
{ "status": "processed", "pages": [{ "elements": [ { "type": "key_value_pair", "label": "Patient name", "value": "Sarah Chen", "confidence": 0.99, "page": 1, "bounds": [82, 128, 284, 152] }, { "type": "key_value_pair", "label": "Diagnosis code", "value": "J45.40", "confidence": 0.97, "page": 1, "bounds": [82, 164, 220, 188] } ] }]}Markdown
For RAG · search · knowledge bases
# Patient Intake Form
**Patient** Sarah Chen**DOB** 1985-03-12**Insurance ID** UHC-8821047
## Diagnosis
| Code | Description || --- | --- || J45.40 | Moderate persistent asthma |
**Provider** Dr. Marcus Webb**Date** 2024-11-08No persistent document storage
All documents are processed and discarded. No input or resulting documents are stored on our infrastructure.
SOC 2 Type 2
Backed by Nutrient’s SOC 2 Type 2 security practices, built for use in business-critical and compliance-sensitive workflows.
Trust and compliance
TLS encryption by default
All API communication is TLS-encrypted. Documents in transit are protected end to end.
HIPAA-compatible architecture
No document retention, encrypted transport, and access controls designed to support HIPAA-sensitive workflows.
No. All uploaded documents are processed and immediately discarded. No input files or extracted content are retained on Nutrient infrastructure after the API response is returned.
Yes. For hand-completed fields, checkboxes, and handwritten values, the API applies intelligent character recognition (ICR) to capture what standard OCR misses. Every extracted value includes a confidence score so uncertain fields can be flagged for human review before moving downstream.
Data Extraction API processes PDFs, images (including scans and photos), Word, Excel, and PowerPoint files. It handles scanned PDFs, fillable forms, and mixed digital/image-based documents without requiring a separate OCR pipeline.
Every extracted element includes confidence scores, page references, and coordinates so you can compare outputs, flag low-confidence fields for human review, and trace values back to the source document before they move into patient records, claims systems, or administrative workflows.
Nutrient Data Extraction API is an extraction layer — not an EHR, claims platform, or workflow management system. It extracts structured data from medical documents so that data can be reviewed, validated, and routed into the systems your team already uses.