Advanced PDF Understanding
Docling provides state-of-the-art PDF parsing capabilities that go far beyond simple text extraction. Our advanced models understand document structure and semantics.
Page Layout Detection
Automatically detect and understand complex page layouts including multi-column documents, headers, footers, and sidebars. Docling's layout model (Heron) provides fast and accurate page structure analysis.
Reading Order
Intelligently determine the correct reading order of content, even in complex multi-column layouts. This ensures that extracted text maintains logical flow and meaning.
Table Structure Detection
Accurately detect and extract tables from PDFs, preserving cell relationships, headers, and data structure. Tables are exported in formats that maintain their structure for downstream processing.
Formula Recognition
Detect and extract mathematical formulas and equations from documents, preserving their structure for use in scientific and technical applications.
Code Block Detection
Identify and extract code blocks from documents, maintaining proper formatting and syntax highlighting information.
Image Classification
Classify images within documents (diagrams, charts, photos, etc.) to better understand document content and structure.
Multi-Format Support
Docling supports a wide variety of document formats, allowing you to process diverse document types with a single unified API.
Document Formats
- PDF - Native PDF parsing with advanced layout understanding
- Word (DOCX) - Microsoft Word document processing
- PowerPoint (PPTX) - Presentation file parsing
- Excel (XLSX) - Spreadsheet data extraction
- Markdown - Markdown file processing
- HTML - Web page and HTML document parsing
- AsciiDoc - AsciiDoc format support
- CSV - Comma-separated value file parsing
- WebVTT - Web Video Text Tracks subtitle parsing
Audio Formats
- MP3 - Audio transcription with ASR models
- WAV - Waveform audio file processing
Image Formats
- PNG - Portable Network Graphics
- JPEG - Joint Photographic Experts Group
- TIFF - Tagged Image File Format
- BMP - Bitmap images
- WEBP - WebP image format
Learn more about getting started with different formats.
Extensive OCR Support
Docling provides comprehensive OCR capabilities for scanned documents and images, ensuring that even non-digital documents can be processed effectively.
Scanned PDF Processing
Process scanned PDFs with high-accuracy OCR, extracting text while maintaining document structure and layout information.
Image OCR
Extract text from images in various formats, supporting multiple languages and character sets.
Visual Language Models
Support for several Visual Language Models including GraniteDocling, providing enhanced understanding of document content through vision-language integration. These models can be accelerated with MLX on Apple Silicon hardware.
See examples of OCR processing in action.
Unified Document Representation
All documents are converted into Docling's unified, expressive DoclingDocument format, providing a consistent interface regardless of source format.
Structured Data Access
Access document components and their properties through a clean, programmatic API. Navigate pages, paragraphs, tables, images, and other elements with ease.
Metadata Preservation
Preserve document metadata, structure, and relationships throughout the conversion process.
Flexible Export Formats
Export parsed documents to formats optimized for different use cases, from AI processing to human-readable output.
Markdown
Export to structured Markdown with tables, formatting, and code blocks preserved. Perfect for documentation and content management systems.
HTML
Generate rich HTML output with styling, suitable for web display and further processing.
JSON
Lossless JSON representation preserving all document structure, metadata, and relationships. Ideal for programmatic processing and storage.
DocTags
Structured document tags format designed for AI and RAG systems, providing semantic markup of document content.
Plain Text
Simple text extraction for basic use cases and compatibility with legacy systems.
Learn more about export options in the documentation.
AI Framework Integrations
Docling provides plug-and-play integrations with popular AI frameworks, making it easy to incorporate document processing into your AI applications.
LangChain
Native integration with LangChain document loaders, enabling seamless document processing in LangChain workflows.
LlamaIndex
Integrate Docling with LlamaIndex for RAG (Retrieval-Augmented Generation) applications.
Crew AI
Use Docling with Crew AI for multi-agent document processing workflows.
Haystack
Integration with Haystack for enterprise-grade document processing pipelines.
MCP Server
Connect to any agent using the Model Context Protocol (MCP) server, enabling Docling to work with a wide range of AI agents and tools.
Explore all available integrations and learn how to use them.
Local Execution & Security
Docling is designed with security and privacy in mind, supporting local execution for sensitive data.
Local Processing
Run all document processing locally on your infrastructure. No data leaves your environment, ensuring complete privacy and security.
Air-Gapped Environments
Fully functional in air-gapped environments without requiring internet connectivity or external services.
Data Privacy
Your documents never leave your control. All processing happens on your hardware, giving you complete control over sensitive information.
Command-Line Interface
Docling includes a powerful and convenient CLI for quick document conversions directly from your terminal.
Simple Usage
Convert documents with a single command, supporting URLs and local file paths.
Pipeline Options
Choose between different processing pipelines, including VLM (Visual Language Model) support for enhanced understanding.
Output Formats
Specify output formats and options directly from the command line.
Learn more about the CLI and its capabilities.
Structured Information Extraction
Docling includes beta support for structured information extraction, allowing you to extract specific information from documents in a structured format.
Metadata Extraction
Extract document metadata including titles, authors, references, and language information (coming soon).
Chart Understanding
Understand and extract data from charts including bar charts, pie charts, line plots, and more (coming soon).
Chemistry Understanding
Process complex chemistry documents including molecular structures (coming soon).
What's New
- ๐ค Structured information extraction [๐งช beta]
- ๐ New layout model (Heron) by default, for faster PDF parsing
- ๐ MCP server for agentic applications
- ๐ฌ Parsing of Web Video Text Tracks (WebVTT) files