Extract structured data from medical documents with reliable precision

Healthcare teams spend hours manually keying data from forms, records, and scanned documents. Nutrient Data Extraction API turns those documents into validated, traceable structured data — so your team reviews exceptions, not every field.

Patient intake form with allergies and medication tables alongside extracted structured JSON output

Trusted by enterprises, governments, and teams building document workflows at scale

Used by Lufthansa, Disney, Autodesk, UBS, Dropbox, IBM
Lufthansa
Disney
Autodesk
UBS
Dropbox
IBM

USE CASES

Example medical document workflows

Patient intake form extraction

Patient intake

Extract patient demographics, insurance details, consent fields, checkboxes, signatures, and supporting information from intake documents.

Prior authorization document extraction

Prior authorization and referrals

Structure provider, payer, procedure, diagnosis, date, authorization, and supporting information for review and routing.

EOB and claims document extraction

EOBs and claims documents

Automate extraction of payer details, dates, amounts, codes, patient responsibility, payment details, and structured tables from EOBs and claims documents.

Lab results and prescriptions extraction

Lab results and prescriptions

Extract test names, values, reference ranges, dates, prescription details, provider information, and structured fields from lab and prescription documents.

Medical records and packets

Medical records and packets

Prepare mixed scanned and digital medical records for review, search, AI, and downstream workflow use.

GOVERNED EXTRACTION

Why not rely on LLM-only extraction for medical documents?

LLMs can reason over documents — but medical workflows need deterministic, auditable output grounded in the source file, not generated answers that vary between runs.

Predictable structured output

Typed output remains tied to the source document — not generated answers that change between runs.

Confidence signals for routing and review

Flag uncertain values before they move downstream into patient records or administrative systems.

Coordinates and page references

Every extracted value is anchored to its source location for traceability and human review.

Layout-aware structure

Tables, forms, checkboxes, and key-value regions preserved — not flattened into plain text.

Human review before downstream use

Support validation steps before structured data enters patient records, claims systems, or administrative workflows.

Auditability for compliance workflows

Source context and page detail support audit trails in regulated healthcare environments.

WHAT YOU CAN EXTRACT

Structured fields from any medical document

Structured data extracted from a medical record document
Patient and provider details

Names, DOBs, NPI numbers, addresses, and identifiers from intake and referral documents.


Diagnosis and procedure codes

ICD-10, CPT, and related codes extracted with confidence scores for validation before downstream use.


Payer and insurance information

Payer names, policy numbers, group IDs, and coverage details from insurance and claims documents.


Lab values and reference ranges

Test names, result values, units, and reference ranges preserved from lab reports.


Form fields and handwritten values

Structured key-value regions, handwritten fields, and checkbox states captured with ICR — with confidence scores for review.


Tables and multirow structured data

Tables from EOBs, prior authorization forms, and claims documents, preserved with row and column context.


HOW IT WORKS

From medical document to structured output

Parse

Turn medical PDFs, scans, images, and Office files into document structure.

Extract

Identify text, tables, forms, handwriting, checkboxes, codes, dates, and signatures.

Map

Define the fields your workflow needs and get back validated, typed output.

Structure

Return typed JSON for systems and validation, or Markdown for search and AI workflows.

Process

Send structured data into review queues, administrative workflows, or downstream applications.

OUTPUT FORMATS

Spatial JSON. Or Markdown. From the same API.

Confidence scores, coordinates, and page context included.
Choose output: "json" or Markdown per request.

Spatial JSON

For extraction · validation · review

.json
{
"status": "processed",
"pages": [{
"elements": [
{
"type": "key_value_pair",
"label": "Patient name",
"value": "Sarah Chen",
"confidence": 0.99,
"page": 1,
"bounds": [82, 128, 284, 152]
},
{
"type": "key_value_pair",
"label": "Diagnosis code",
"value": "J45.40",
"confidence": 0.97,
"page": 1,
"bounds": [82, 164, 220, 188]
}
]
}]
}

Markdown

For RAG · search · knowledge bases

.md
# Patient Intake Form
**Patient** Sarah Chen
**DOB** 1985-03-12
**Insurance ID** UHC-8821047
## Diagnosis
| Code | Description |
| --- | --- |
| J45.40 | Moderate persistent asthma |
**Provider** Dr. Marcus Webb
**Date** 2024-11-08

No persistent document storage

All documents are processed and discarded. No input or resulting documents are stored on our infrastructure.

SOC 2 Type 2

Backed by Nutrient’s SOC 2 Type 2 security practices, built for use in business-critical and compliance-sensitive workflows.

Trust and compliance

Built for medical document workflows in production

TLS encryption by default

All API communication is TLS-encrypted. Documents in transit are protected end to end.

HIPAA-compatible architecture

No document retention, encrypted transport, and access controls designed to support HIPAA-sensitive workflows.

Medical document extraction questions

Does Nutrient store medical documents after processing?

No. All uploaded documents are processed and immediately discarded. No input files or extracted content are retained on Nutrient infrastructure after the API response is returned.

Can the API handle handwritten medical forms?

Yes. For hand-completed fields, checkboxes, and handwritten values, the API applies intelligent character recognition (ICR) to capture what standard OCR misses. Every extracted value includes a confidence score so uncertain fields can be flagged for human review before moving downstream.

What file formats are supported?

Data Extraction API processes PDFs, images (including scans and photos), Word, Excel, and PowerPoint files. It handles scanned PDFs, fillable forms, and mixed digital/image-based documents without requiring a separate OCR pipeline.

How do I validate extracted data before it enters downstream systems?

Every extracted element includes confidence scores, page references, and coordinates so you can compare outputs, flag low-confidence fields for human review, and trace values back to the source document before they move into patient records, claims systems, or administrative workflows.

Is this a full healthcare platform or an extraction API?

Nutrient Data Extraction API is an extraction layer — not an EHR, claims platform, or workflow management system. It extracts structured data from medical documents so that data can be reviewed, validated, and routed into the systems your team already uses.

GET STARTED

Start extracting data from medical documents

No credit card required.

Your free account includes:

    • 5,000 Data Extraction API credits per month
    • Parse PDFs, scans, images, and Office files
    • Typed JSON output with confidence scores and source context
    • No persistent document storage