DocumentConverter
The main class for converting documents.
Class Definition
from docling.document_converter import DocumentConverter
converter = DocumentConverter(
format=None,
pipeline="default",
vlm_model=None,
ocr_enabled=True,
ocr_language="eng"
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| format | InputFormat | None | Specify the input format (PDF, DOCX, etc.). Auto-detected if None. |
| pipeline | str | "default" | Processing pipeline to use ("default", "vlm", etc.) |
| vlm_model | str | None | Visual Language Model to use (e.g., "granite_docling") |
| ocr_enabled | bool | True | Enable OCR processing for scanned documents |
| ocr_language | str | "eng" | OCR language code |
Methods
convert(source)
Convert a document from a file path or URL.
result = converter.convert("document.pdf")
result = converter.convert("https://example.com/document.pdf")
Parameters:
source(str) - File path or URL to the document
Returns: ConversionResult object
ConversionResult
Result object returned by DocumentConverter.convert().
Properties
document- DoclingDocument object containing the converted documentmetadata- Dictionary containing conversion metadata
DoclingDocument
The unified document representation format.
Properties
pages- List of Page objectstables- List of Table objectsimages- List of Image objectsstructure- Document structure informationmetadata- Document metadata dictionary
Export Methods
export_to_markdown(table_format="grid", include_images=True)
Export document to Markdown format.
Parameters:
table_format(str) - Table format: "grid", "pipe", or "simple"include_images(bool) - Whether to include image references
Returns: str - Markdown representation
export_to_html(include_styles=True, include_images=True)
Export document to HTML format.
Parameters:
include_styles(bool) - Include CSS stylesinclude_images(bool) - Include image references
Returns: str - HTML representation
export_to_dict()
Export document to Python dictionary (lossless JSON representation).
Returns: dict - Dictionary representation
export_to_doctags()
Export document to DocTags format for AI systems.
Returns: str - DocTags representation
Page
Represents a single page in a document.
Properties
page_number- Page number (1-indexed)content- List of content elementswidth- Page widthheight- Page height
Methods
export_to_markdown()
Export page to Markdown.
Table
Represents a table extracted from a document.
Properties
rows- List of TableRow objectscolumns- List of column headers
Methods
export_to_markdown()
Export table to Markdown format.
export_to_html()
Export table to HTML format.
Image
Represents an image extracted from a document.
Properties
filename- Image filenameimage_type- Type of image (diagram, photo, etc.)width- Image widthheight- Image height
InputFormat
Enumeration of supported input formats.
from docling.datamodel.base_models import InputFormat
InputFormat.PDF
InputFormat.DOCX
InputFormat.PPTX
InputFormat.XLSX
InputFormat.HTML
InputFormat.MARKDOWN
# ... and more
Exceptions
DoclingError
Base exception class for Docling-specific errors.
from docling.exceptions import DoclingError
try:
converter.convert("document.pdf")
except DoclingError as e:
print(f"Docling error: {e}")
Complete Example
from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import InputFormat
from docling.exceptions import DoclingError
import json
# Initialize converter
converter = DocumentConverter(
format=InputFormat.PDF,
pipeline="default"
)
try:
# Convert document
result = converter.convert("document.pdf")
doc = result.document
# Access document properties
print(f"Pages: {len(doc.pages)}")
print(f"Tables: {len(doc.tables)}")
print(f"Images: {len(doc.images)}")
# Export to different formats
markdown = doc.export_to_markdown()
html = doc.export_to_html()
json_data = doc.export_to_dict()
# Work with pages
for page in doc.pages:
print(f"Page {page.page_number}: {len(page.content)} elements")
# Work with tables
for table in doc.tables:
print(table.export_to_markdown())
except DoclingError as e:
print(f"Error: {e}")
Additional Resources
- Full Documentation - Detailed usage guide
- Code Examples - Real-world examples
- Getting Started - Quick start guide
- GitHub Repository - Source code and issues