How we made fast, reliable document RAG

Austin Nguyen

July 7, 2025

Ever tried to extract data from a PDF with a complex layout? If you’ve built systems that need to understand tables, invoices, or structured documents, you know PDFs are notoriously difficult to parse. While formats like Word or HTML have clear structural elements (headings, paragraphs, tables), PDFs are fundamentally different.

PDFs are based on PostScript — a page description language that tells printers how to draw text and images at specific coordinates. Inside a PDF file, you won’t find <table> tags or paragraph markers. Instead, you’ll see instructions like “draw 'Revenue' at position (100, 200) in 12pt Arial.”

This makes PDFs essentially unstructured data, which is a nightmare for AI agents trying to understand document content. But at Nutrient, we’ve been working with PDFs for more than 10 years, and we’ve developed an efficient, quality solution using layout analysis.

What is layout analysis?

Layout analysis is the process of understanding the visual structure of a document — identifying tables, columns, headers, and other elements based on their position and appearance rather than explicit markup. Think of it as teaching computers to “see” documents the way humans do.

Real-world example: Extracting financial tables

To see layout analysis in action, this section will use a statement of income as an example. The goal is to extract content from the following PDF and transform it into chunks that can be fed to a retrieval-augmented generation (RAG) system. This is a common pattern where AI models retrieve relevant document chunks to answer questions.

PDF of an example statement of income

Without layout analysis, you can try to extract all the text from the PDF, but chances are that although the AI will still be able to understand part of the documents, it’ll miss the contextual information that this page is actually a table. In addition, in the case of financial documents, it’s crucial for the AI to understand table information so that it can effectively answer questions about the company:

Chunk 1:
XYZ COMPANY LIMITED STATEMENT OF INCOME AND RETAINED EARNINGS FOR THE YEAR ENDED JUNE 30, 2002 UNAUDITED - See "Notice to Reader" 2002  2001 REVENUE  $ 1,104,786  $ 1,133,736 COST OF SALES Opening inventory  156,657  146,278  Delivery  1,607  1,249  Purchases  740,994  794,101    941,628  899,258  Closing inventory  159,144  156,657    784,971  740,114  GROSS PROFIT  348,765  364,672  OPERATING EXPENSES (schedule)  339,905  286,817  INCOME FROM OPERATIONS  8,860  77,855  OTHER INCOME (EXPENSES)

Chunk 2:
Loss on disposal of property, plant and equipment  --  (387)  Gain on sale of investment  16,149  -- Miscellaneous  (1,101)  337    (50)  15,048  NET INCOME BEFORE TAX  8,810  92,903  INCOME TAX EXPENSE  --  14,387 NET INCOME  8,810  78,516  (DEFICIT) - Beginning of Year  (54,160)  (61,350)  DIVIDENDS  --  (16,000)  RETAINED EARNINGS (DEFICIT) - End of Year  $  17,166  $  (61,350)

Chunk 3:
The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.

With layout analysis, the document chunks should be divided based on the content of the page, in addition to detecting any tables that might be in the page:

Chunk 1:
XYZ COMPANY LIMITED STATEMENT OF INCOME AND RETAINED EARNINGS FOR THE YEAR ENDED JUNE 30, 2002 UNAUDITED - See "Notice to Reader"

Chunk 2:
| | | 2002  | 2001 |
|---|---|---|---|
| REVENUE  | | $ 1,104,786 |  $ 1,133,736 |
| COST OF SALES | | | |
| Opening inventory | |  156,657  | 146,278  |
| Delivery | |  1,607  | 1,249  |
| Purchases | |  740,994  | 794,101  |
| | | 899,258  | 941,628  |
| Closing inventory | |  159,144  | 156,657  |
| | | 740,114  | 784,971  |
| GROSS PROFIT  | | 364,672  | 348,765  |
| OPERATING EXPENSES (schedule)  | | 286,817  | 339,905  |
| INCOME FROM OPERATIONS  | | 77,855  | 8,860  |
| OTHER INCOME (EXPENSES) | | | |
| Loss on disposal of property, plant and equipment | |  -- |  (387)  |
| Gain on sale of investment | |  16,149  | -- |
| Miscellaneous | |  (1,101)  | 337  |
| | | 15,048  | (50)  |
| NET INCOME BEFORE TAX  | | 92,903  | 8,810  |
| INCOME TAX EXPENSE  | | 14,387 | --  |
| NET INCOME  | | 78,516  | 8,810  |
| (DEFICIT) - Beginning of Year  | | (61,350)  | (54,160)  |
| DIVIDENDS  | | -- |  (16,000)  |
| RETAINED EARNINGS (DEFICIT) - End of Year  | $ |  17,166  $ |  (61,350) |

Chunk 3:
The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.

Our innovation: Lightning-fast layout analysis without neural networks

Traditional layout analysis relies on neural networks that analyze page images, which is a compute-intensive process that can be slow and resource-hungry. That’s perfect when resources are abundant, but for our AI Assistant, we needed something faster to eliminate waiting time between document upload and processing.

We developed a novel approach: Instead of processing images, our algorithm analyzes the raw PDF data directly, i.e. the actual positioning instructions and text elements. The results are impressive:

Our algorithm — ~10 seconds for a 1,000 page-document
Traditional ML approaches (e.g. Docling(opens in a new tab)) — 8+ minutes for the same 1,000 page-document

That’s nearly a 50x speed improvement!

Enabling on-demand AI: The push for zero wait time

The 50x speed improvement isn’t just a benchmark — it fundamentally changes how you can use AI with documents. Traditional approaches force users to wait minutes or even hours before their documents are ready for AI interaction, or require the backend to pre-ingest the document up front, which can also be wasteful and costly.

Our approach enables true on-demand AI experiences:

Instant ingestion — Upload large reports and start querying in seconds, not minutes
Real-time processing — Process documents as users upload them — no batch processing or background jobs needed
Interactive workflows — Enable live document Q&A sessions without preprocessing delays
Scalable architecture — Handle concurrent document uploads without GPU bottlenecks

In addition, as more and more companies are looking into using on-device LLMs for privacy, compliance, and cost reasons, our new algorithm provides unique advantages in the on-device environment, because it:

Doesn’t require heavy computation to run layout analysis
Doesn’t require the on-device LLMs to support multimodal inference
Can be adapted to run in any runtime (browser, desktop, mobile, cloud, etc.)

Layout analysis means better document understanding and less hallucination

“If you can’t measure it, you can’t improve it.” —Peter Drucker

In the spirit of the quote above, it’s important to measure the effectiveness of this new technique on document understanding. In the future, I’ll cover how Nutrient evaluates the performance of our AI, but for the sake of brevity, the long and short of it is that we’re evaluating the AI agent’s response over thousands of document question and answer cases. Our dataset is modified from the public dataset REALKie(opens in a new tab), which focuses on enterprise documents with complex document layout.

We use many metrics for our evaluations, but here are the ones relevant to this blog post:

Accuracy

Does the AI agent’s response match with the correct answer?

Accuracy is influenced by both the retrieval quality and the LLM quality.

Context usefulness

Does the retrieved document chunk contain the information needed to answer the question correctly?

Think of this as “Did we find the right needle in the haystack?”

With our new layout analysis algorithm enabled, our testing showed:

20.28 percent higher accuracy — Turning 1 in 5 previously failed queries into successful, accurate responses
76.04 percent better context retrieval — Nearly doubled the ability to find relevant information

What does this mean in practice? Consider a financial analyst querying a 500-page annual report:

Without layout analysis — AI might miss crucial data in tables, leading to incomplete or wrong answers
With layout analysis — AI understands table structure, correctly extracting revenue figures, year-over-year comparisons, and financial ratios

See it in action with AI Assistant 1.5.0

Excited about the new improvement in document understanding? You can try the improvement yourself in the new 1.5.0 version release of AI Assistant. Other updates in the new release also include support for using AI Assistant with multiple documents at the same time.