This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /api/python/settings/vision/advanced/content-extraction-settings.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. ContentExtractionSettings

Settings for ContentExtraction. Values fall back through three levels: document → SDK → built-in default. Writes target the document only when set on a document’s settings, otherwise the SDK globally when set on SdkSettings.

Tags: Vision, Advanced

from nutrient_sdk import ContentExtractionSettings

Construction

ContentExtractionSettings is accessed through a Document instance for per-document overrides, or via SdkSettings for SDK-wide defaults.

# Per-document override
with Document.open("input.pdf") as doc:
settings = doc.settings.content_extraction_settings
settings.some_field = new_value # mutate fields directly
# SDK-wide default (applies to all documents)
SdkSettings.content_extraction_settings.some_field = new_value

Settings are configured by writing to fields on the returned object. The settings property itself cannot be reassigned — doc.settings.content_extraction_settings = other_settings is rejected.

Properties

enable_full_page_ocr_fallback

@property
def enable_full_page_ocr_fallback(self) -> bool
@enable_full_page_ocr_fallback.setter
def enable_full_page_ocr_fallback(self, value: bool) -> None

Indicates whether to run full page OCR when no zones are detected. This provides a simple OCR-only pipeline without segmentation.

Type: bool

Default: true


enable_image_extraction

@property
def enable_image_extraction(self) -> bool
@enable_image_extraction.setter
def enable_image_extraction(self, value: bool) -> None

Indicates whether image metadata extraction is enabled for image zones.

Type: bool

Default: true


enable_ocr_extraction

@property
def enable_ocr_extraction(self) -> bool
@enable_ocr_extraction.setter
def enable_ocr_extraction(self, value: bool) -> None

Indicates whether OCR extraction is enabled for text zones.

Type: bool

Default: true


enable_table_extraction

@property
def enable_table_extraction(self) -> bool
@enable_table_extraction.setter
def enable_table_extraction(self, value: bool) -> None

Indicates whether table structure extraction is enabled for table zones.

Type: bool

Default: true


minimum_zone_confidence

@property
def minimum_zone_confidence(self) -> float
@minimum_zone_confidence.setter
def minimum_zone_confidence(self, value: float) -> None

Minimum confidence threshold to process a zone (0.0 - 1.0). Zones below this confidence will be skipped.

Type: float

Default: 0.5f