Advanced PDF Understanding

Docling provides state-of-the-art PDF parsing capabilities that go far beyond simple text extraction. Our advanced models understand document structure and semantics.

Page Layout Detection

Automatically detect and understand complex page layouts including multi-column documents, headers, footers, and sidebars. Docling's layout model (Heron) provides fast and accurate page structure analysis.

Reading Order

Intelligently determine the correct reading order of content, even in complex multi-column layouts. This ensures that extracted text maintains logical flow and meaning.

Table Structure Detection

Accurately detect and extract tables from PDFs, preserving cell relationships, headers, and data structure. Tables are exported in formats that maintain their structure for downstream processing.

Formula Recognition

Detect and extract mathematical formulas and equations from documents, preserving their structure for use in scientific and technical applications.

Code Block Detection

Identify and extract code blocks from documents, maintaining proper formatting and syntax highlighting information.

Image Classification

Classify images within documents (diagrams, charts, photos, etc.) to better understand document content and structure.

Multi-Format Support

Docling supports a wide variety of document formats, allowing you to process diverse document types with a single unified API.

Document Formats

  • PDF - Native PDF parsing with advanced layout understanding
  • Word (DOCX) - Microsoft Word document processing
  • PowerPoint (PPTX) - Presentation file parsing
  • Excel (XLSX) - Spreadsheet data extraction
  • Markdown - Markdown file processing
  • HTML - Web page and HTML document parsing
  • AsciiDoc - AsciiDoc format support
  • CSV - Comma-separated value file parsing
  • WebVTT - Web Video Text Tracks subtitle parsing

Audio Formats

  • MP3 - Audio transcription with ASR models
  • WAV - Waveform audio file processing

Image Formats

  • PNG - Portable Network Graphics
  • JPEG - Joint Photographic Experts Group
  • TIFF - Tagged Image File Format
  • BMP - Bitmap images
  • WEBP - WebP image format

Learn more about getting started with different formats.

Extensive OCR Support

Docling provides comprehensive OCR capabilities for scanned documents and images, ensuring that even non-digital documents can be processed effectively.

Scanned PDF Processing

Process scanned PDFs with high-accuracy OCR, extracting text while maintaining document structure and layout information.

Image OCR

Extract text from images in various formats, supporting multiple languages and character sets.

Visual Language Models

Support for several Visual Language Models including GraniteDocling, providing enhanced understanding of document content through vision-language integration. These models can be accelerated with MLX on Apple Silicon hardware.

See examples of OCR processing in action.

Unified Document Representation

All documents are converted into Docling's unified, expressive DoclingDocument format, providing a consistent interface regardless of source format.

Structured Data Access

Access document components and their properties through a clean, programmatic API. Navigate pages, paragraphs, tables, images, and other elements with ease.

Metadata Preservation

Preserve document metadata, structure, and relationships throughout the conversion process.

Flexible Export Formats

Export parsed documents to formats optimized for different use cases, from AI processing to human-readable output.

Markdown

Export to structured Markdown with tables, formatting, and code blocks preserved. Perfect for documentation and content management systems.

HTML

Generate rich HTML output with styling, suitable for web display and further processing.

JSON

Lossless JSON representation preserving all document structure, metadata, and relationships. Ideal for programmatic processing and storage.

DocTags

Structured document tags format designed for AI and RAG systems, providing semantic markup of document content.

Plain Text

Simple text extraction for basic use cases and compatibility with legacy systems.

Learn more about export options in the documentation.

AI Framework Integrations

Docling provides plug-and-play integrations with popular AI frameworks, making it easy to incorporate document processing into your AI applications.

LangChain

Native integration with LangChain document loaders, enabling seamless document processing in LangChain workflows.

LlamaIndex

Integrate Docling with LlamaIndex for RAG (Retrieval-Augmented Generation) applications.

Crew AI

Use Docling with Crew AI for multi-agent document processing workflows.

Haystack

Integration with Haystack for enterprise-grade document processing pipelines.

MCP Server

Connect to any agent using the Model Context Protocol (MCP) server, enabling Docling to work with a wide range of AI agents and tools.

Explore all available integrations and learn how to use them.

Local Execution & Security

Docling is designed with security and privacy in mind, supporting local execution for sensitive data.

Local Processing

Run all document processing locally on your infrastructure. No data leaves your environment, ensuring complete privacy and security.

Air-Gapped Environments

Fully functional in air-gapped environments without requiring internet connectivity or external services.

Data Privacy

Your documents never leave your control. All processing happens on your hardware, giving you complete control over sensitive information.

Command-Line Interface

Docling includes a powerful and convenient CLI for quick document conversions directly from your terminal.

Simple Usage

Convert documents with a single command, supporting URLs and local file paths.

Pipeline Options

Choose between different processing pipelines, including VLM (Visual Language Model) support for enhanced understanding.

Output Formats

Specify output formats and options directly from the command line.

Learn more about the CLI and its capabilities.

Structured Information Extraction

Docling includes beta support for structured information extraction, allowing you to extract specific information from documents in a structured format.

Metadata Extraction

Extract document metadata including titles, authors, references, and language information (coming soon).

Chart Understanding

Understand and extract data from charts including bar charts, pie charts, line plots, and more (coming soon).

Chemistry Understanding

Process complex chemistry documents including molecular structures (coming soon).

What's New

  • ๐Ÿ“ค Structured information extraction [๐Ÿงช beta]
  • ๐Ÿ“‘ New layout model (Heron) by default, for faster PDF parsing
  • ๐Ÿ”Œ MCP server for agentic applications
  • ๐Ÿ’ฌ Parsing of Web Video Text Tracks (WebVTT) files