API Reference - Docling

DocumentConverter

The main class for converting documents.

Class Definition

Python

from docling.document_converter import DocumentConverter

converter = DocumentConverter(
    format=None,
    pipeline="default",
    vlm_model=None,
    ocr_enabled=True,
    ocr_language="eng"
)

Parameters

Parameter	Type	Default	Description
format	InputFormat	None	Specify the input format (PDF, DOCX, etc.). Auto-detected if None.
pipeline	str	"default"	Processing pipeline to use ("default", "vlm", etc.)
vlm_model	str	None	Visual Language Model to use (e.g., "granite_docling")
ocr_enabled	bool	True	Enable OCR processing for scanned documents
ocr_language	str	"eng"	OCR language code

Methods

convert(source)

Convert a document from a file path or URL.

Python

result = converter.convert("document.pdf")
result = converter.convert("https://example.com/document.pdf")

Parameters:

source (str) - File path or URL to the document

Returns: ConversionResult object

ConversionResult

Result object returned by DocumentConverter.convert().

Properties

document - DoclingDocument object containing the converted document
metadata - Dictionary containing conversion metadata

DoclingDocument

The unified document representation format.

Properties

pages - List of Page objects
tables - List of Table objects
images - List of Image objects
structure - Document structure information
metadata - Document metadata dictionary

Export Methods

export_to_markdown(table_format="grid", include_images=True)

Export document to Markdown format.

Parameters:

table_format (str) - Table format: "grid", "pipe", or "simple"
include_images (bool) - Whether to include image references

Returns: str - Markdown representation

export_to_html(include_styles=True, include_images=True)

Export document to HTML format.

Parameters:

include_styles (bool) - Include CSS styles
include_images (bool) - Include image references

Returns: str - HTML representation

export_to_dict()

Export document to Python dictionary (lossless JSON representation).

Returns: dict - Dictionary representation

export_to_doctags()

Export document to DocTags format for AI systems.

Returns: str - DocTags representation

Table

Represents a table extracted from a document.

Properties

rows - List of TableRow objects
columns - List of column headers

Methods

export_to_markdown()

Export table to Markdown format.

export_to_html()

Export table to HTML format.

Image

Represents an image extracted from a document.

Properties

filename - Image filename
image_type - Type of image (diagram, photo, etc.)
width - Image width
height - Image height

InputFormat

Enumeration of supported input formats.

Python

from docling.datamodel.base_models import InputFormat

InputFormat.PDF
InputFormat.DOCX
InputFormat.PPTX
InputFormat.XLSX
InputFormat.HTML
InputFormat.MARKDOWN
# ... and more

Exceptions

DoclingError

Base exception class for Docling-specific errors.

Python

from docling.exceptions import DoclingError

try:
    converter.convert("document.pdf")
except DoclingError as e:
    print(f"Docling error: {e}")

Complete Example

Python

from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import InputFormat
from docling.exceptions import DoclingError
import json

# Initialize converter
converter = DocumentConverter(
    format=InputFormat.PDF,
    pipeline="default"
)

try:
    # Convert document
    result = converter.convert("document.pdf")
    doc = result.document
    
    # Access document properties
    print(f"Pages: {len(doc.pages)}")
    print(f"Tables: {len(doc.tables)}")
    print(f"Images: {len(doc.images)}")
    
    # Export to different formats
    markdown = doc.export_to_markdown()
    html = doc.export_to_html()
    json_data = doc.export_to_dict()
    
    # Work with pages
    for page in doc.pages:
        print(f"Page {page.page_number}: {len(page.content)} elements")
    
    # Work with tables
    for table in doc.tables:
        print(table.export_to_markdown())
    
except DoclingError as e:
    print(f"Error: {e}")

Additional Resources

Full Documentation - Detailed usage guide
Code Examples - Real-world examples
Getting Started - Quick start guide
GitHub Repository - Source code and issues

Full Documentation View Examples Getting Started