Coordinate spaces

All spatial data in the JSON response — element bounds, word-level bounds, table cell bounds, and page.width / page.height — shares a single coordinate system per page.

Axes and origin

The origin sits at the top-left corner of the page. The X axis increases to the right and the Y axis increases downward.

Coordinate space diagram showing the origin at the top-left of the page, with X increasing right, Y increasing down, and an element’s bounding box labeled with x, y, width, and height dimensions.

This matches the convention used by most graphics APIs and UI frameworks, so coordinates can be used directly when drawing overlays on a rendered page.

Bounding boxes

Every bounds object has four fields.

Field	Meaning
`x`	Left edge (distance from page left)
`y`	Top edge (distance from page top)
`width`	Horizontal extent
`height`	Vertical extent

The bottom-right corner of an element is (x + width, y + height). All bounds fall within the page canvas — 0 ≤ x + width ≤ page.width and 0 ≤ y + height ≤ page.height.

Word-level bounds (when includeWords is true) and table cell bounds use the same coordinate space as their parent element.

Units

All spatial coordinates are expressed in render-space pixels, regardless of input type.

Input type	Unit	How it works
PDF	Pixels	Pages are rendered internally at the extraction DPI. Bounds and page dimensions use that render canvas.
Office (DOCX, XLSX, PPTX, etc.)	Pixels	Documents are converted and rendered before extraction. Coordinates use that render canvas.
Images (PNG, JPEG, TIFF, etc.)	Pixels	Images are processed at native resolution. Page dimensions equal the image’s pixel dimensions.

The key contract is that bounds, nested word/cell bounds, and page.width/page.height are always in the same coordinate space. You can scale from that page canvas to any display or downstream coordinate system.

Mapping to a rendered page

Because element bounds and page dimensions share the same coordinate space, you can transform both with the same scale factor to map coordinates into any target space — a rendered image, a browser canvas, or a UI component.

Scale factor

Compute a single scale factor from the page dimensions and the target dimensions:

scale = target_width / page.width

Then multiply every coordinate by this scale:

target_x      = bounds.x      × scale
target_y      = bounds.y      × scale
target_width  = bounds.width  × scale
target_height = bounds.height × scale

This works because the API guarantees that elements, words, and cells all sit in the same coordinate system as the page.

Example: Mapping to a display canvas

The API returns a US Letter PDF with page.width = 1700, page.height = 2200 (render-space pixels). Suppose you want to display the page in an 850-pixel-wide container:

scale = 850 / 1700 = 0.5

An element with bounds: { x: 200, y: 400, width: 556, height: 97 } maps to:

display_x      = 200 × 0.5 = 100 px
display_y      = 400 × 0.5 = 200 px
display_width  = 556 × 0.5 = 278 px
display_height = 97  × 0.5 =  49 px

Python
JavaScript

def to_display_coords(bounds, page, display_width):
    """Map API bounds to display coordinates."""
    scale = display_width / page["width"]
    return {
        "x": bounds["x"] * scale,
        "y": bounds["y"] * scale,
        "width": bounds["width"] * scale,
        "height": bounds["height"] * scale,
    }

# API returns render-space pixels; display at 850 px wide.
page = {"width": 1700, "height": 2200}
bounds = {"x": 200, "y": 400, "width": 556, "height": 97}
print(to_display_coords(bounds, page, display_width=850))
# {'x': 100.0, 'y': 200.0, 'width': 278.0, 'height': 48.5}

function toDisplayCoords(bounds, page, displayWidth) {
  const scale = displayWidth / page.width;
  return {
    x: bounds.x * scale,
    y: bounds.y * scale,
    width: bounds.width * scale,
    height: bounds.height * scale,
  };
}

// API returns render-space pixels; display at 850 px wide.
const page = { width: 1700, height: 2200 };
const bounds = { x: 200, y: 400, width: 556, height: 97 };
console.log(toDisplayCoords(bounds, page, 850));
// { x: 100, y: 200, width: 278, height: 48.5 }

Example: Images at native resolution

For image inputs, the page dimensions equal the image’s native pixel dimensions. If you display the image at its original size, the bounds map directly with no transformation. If you resize the image, apply the same scale factor approach:

scale = display_width / page.width

Example: Drawing overlays on a browser canvas

When rendering a page in a browser at an arbitrary size, you can use the same approach:

JavaScript

function drawElementOverlay(ctx, element, page, canvasWidth) {
  const scale = canvasWidth / page.width;
  const { x, y, width, height } = element.bounds;

  ctx.strokeStyle = "rgba(255, 0, 0, 0.5)";
  ctx.lineWidth = 2;
  ctx.strokeRect(x * scale, y * scale, width * scale, height * scale);
}

Converting between coordinate spaces

You can chain transformations to go between any two coordinate spaces by converting through the API’s page coordinate space as an intermediate step.

For example, to convert from a display position back to API coordinates (useful for hit testing — checking which element a user clicked on):

Python
JavaScript

def from_display_coords(display_x, display_y, page, display_width):
    """Convert display coordinates back to API coordinates."""
    scale = page["width"] / display_width
    return {
        "x": display_x * scale,
        "y": display_y * scale,
    }

# User clicks at pixel (100, 200) on a 850 px wide display
# of a page with API dimensions 1700 × 2200.
page = {"width": 1700, "height": 2200}
api_point = from_display_coords(100, 200, page, display_width=850)
print(api_point)
# {'x': 200.0, 'y': 400.0}

function fromDisplayCoords(displayX, displayY, page, displayWidth) {
  const scale = page.width / displayWidth;
  return {
    x: displayX * scale,
    y: displayY * scale,
  };
}

// User clicks at pixel (100, 200) on a 850 px wide display
// of a page with API dimensions 1700 × 2200.
const page = { width: 1700, height: 2200 };
console.log(fromDisplayCoords(100, 200, page, 850));
// { x: 200, y: 400 }

To check if a point falls inside an element’s bounds:

Python
JavaScript

def contains(bounds, x, y):
    """Check whether a point (in API coordinates) falls inside bounds."""
    return (
        bounds["x"] <= x <= bounds["x"] + bounds["width"]
        and bounds["y"] <= y <= bounds["y"] + bounds["height"]
    )

function contains(bounds, x, y) {
  return (
    x >= bounds.x &&
    x <= bounds.x + bounds.width &&
    y >= bounds.y &&
    y <= bounds.y + bounds.height
  );
}