---
title: "Advanced PDF data extraction to CSV/XLSX"
canonical_url: "https://www.nutrient.io/guides/document-automation-server/content-extraction/extract-tabular-data-from-pdf/advanced-export-to-csvxlsx/"
md_url: "https://www.nutrient.io/guides/document-automation-server/content-extraction/extract-tabular-data-from-pdf/advanced-export-to-csvxlsx.md"
last_updated: "2026-05-25T18:42:17.671Z"
description: "Easily extract tabular data from PDF to CSV/XLSX using advanced export features and refine text with regex for accurate output."
---

# Extract and export data from PDF files easily

Advanced Export allows the export of areas on the PDF page to CSV or XLSX files.

For an Advanced Export job, the **Select Variables** tab looks like this.![Select Variables tab](@/assets/guides/document-automation-server/content-extraction/image105.png)

The file displayed is either the one selected in the **Location Settings** tab or can be selected by clicking the **Open File** button.

If there is no Document Automation Server (DAS) Content Extraction variable, click the **Add Item** button.

Select the area on the displayed file containing the information required.![Select Area on Displayed File](@/assets/guides/document-automation-server/content-extraction/image106.png)

Click the Camera icon to capture the text.![Captured Text](@/assets/guides/document-automation-server/content-extraction/image107.png)

The extracted text contains more than the actual invoice number. It is because the area is designed to cover slightly different format invoices.

This is the same area on the invoice that makes up the second page of the example document.![Extra Captured Area](@/assets/guides/document-automation-server/content-extraction/image108.png)

Click **Done**.

The text needs to be refined.

For this example file, the invoice number is of the (regular expression) format [a-z][0-9]+.

The first part selects a string starting with the alphabetic characters A-Z. The second part is one or more numeric characters or a hyphen.

This is added to the column settings by selecting **text in zone** where **text matches pattern** and entering the pattern in the textbox. You can view the available tips on the regular expressions by clicking the **?**.![Coulmn Settings](@/assets/guides/document-automation-server/content-extraction/image109.png)

Check the extracted text using the Camera icon in the column settings.

Add another item.

Select an area that covers the **Grand Total** on both pages.![Selecting Area Displaying the Grand Total](@/assets/guides/document-automation-server/content-extraction/image110.png)

Click the Camera icon on Column 3 to select the area.

Check that the text contains the value you are after.

Click **Done**.

The pattern for this selection is more complicated. The literal text **Grand Total** identifies the beginning of the selection. Next, there is one or more whitespace (space or punctuation) characters followed by one or more digits, a decimal point then two digits.![Pattern for Grand Total](@/assets/guides/document-automation-server/content-extraction/image111.png)

Click the Camera icon to see the extracted text.![Captured Text](@/assets/guides/document-automation-server/content-extraction/image112.png)

The next task is to refine this text further by removing everything before the “Grand Total” part.

Select **all text in paragraph after value** and Value = Grand Total where **text matches pattern** one or more digits optionally followed by a decimal point and two digits.![Refining the Text](@/assets/guides/document-automation-server/content-extraction/image113.png)

Click the Camera icon to see the extracted text.![Captured Text](@/assets/guides/document-automation-server/content-extraction/image114.png)

Save the job and run it.

The output file will contain the following:![Output File](@/assets/guides/document-automation-server/content-extraction/image115.png)

---

## Related pages

- [Effortlessly extract tables from PDFs](/guides/document-automation-server/content-extraction/extract-tabular-data-from-pdf/export-to-csvxlsx.md)
- [Effortlessly extract tables from PDF files](/guides/document-automation-server/content-extraction/extract-tabular-data-from-pdf.md)