AI Framework Integrations

Docling provides native integrations with leading AI frameworks, making it easy to incorporate advanced document processing into your AI applications.

LangChain

Integrate Docling seamlessly with LangChain for document processing in your LangChain workflows.

Installation

Terminal
pip install langchain-community

Basic Usage

Python
from langchain_community.document_loaders import DoclingLoader

# Load document
loader = DoclingLoader("document.pdf")
documents = loader.load()

# Use in LangChain chain
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

qa_chain = RetrievalQA.from_chain_type(
    llm=your_llm,
    retriever=vectorstore.as_retriever()
)

response = qa_chain.run("What is this document about?")

Advanced Configuration

Python
from langchain_community.document_loaders import DoclingLoader

loader = DoclingLoader(
    "document.pdf",
    pipeline="vlm",
    vlm_model="granite_docling"
)
documents = loader.load()

Learn more: LangChain Docling Documentation

LlamaIndex

Use Docling with LlamaIndex for RAG (Retrieval-Augmented Generation) applications.

Installation

Terminal
pip install llama-index-readers-docling

Basic Usage

Python
from llama_index.readers.docling import DoclingReader
from llama_index import VectorStoreIndex

# Load documents
reader = DoclingReader()
documents = reader.load_data("document.pdf")

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("Summarize this document")
print(response)

With Custom Configuration

Python
from llama_index.readers.docling import DoclingReader

reader = DoclingReader(
    pipeline="vlm",
    vlm_model="granite_docling"
)
documents = reader.load_data("document.pdf")

Crew AI

Integrate Docling with Crew AI for multi-agent document processing workflows.

Installation

Terminal
pip install crewai docling

Usage Example

Python
from crewai import Agent, Task, Crew
from docling.document_converter import DocumentConverter

# Convert document with Docling
converter = DocumentConverter()
result = converter.convert("document.pdf")
document_content = result.document.export_to_markdown()

# Use in Crew AI agent
researcher = Agent(
    role='Research Analyst',
    goal='Analyze documents and extract insights',
    backstory='Expert at document analysis',
    verbose=True
)

task = Task(
    description=f'Analyze this document: {document_content[:1000]}...',
    agent=researcher
)

crew = Crew(
    agents=[researcher],
    tasks=[task]
)

result = crew.kickoff()

Haystack

Use Docling with Haystack for enterprise-grade document processing pipelines.

Installation

Terminal
pip install haystack-ai docling

Basic Usage

Python
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import DoclingConverter
from haystack.pipelines import Pipeline

# Convert document
converter = DoclingConverter()
documents = converter.convert("document.pdf")

# Store in document store
document_store = InMemoryDocumentStore()
document_store.write_documents(documents)

# Create pipeline
pipeline = Pipeline()
pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Query"])

# Use in Haystack pipeline
# ... your Haystack pipeline configuration ...

MCP Server

Connect to any agent using the Model Context Protocol (MCP) server. Docling's MCP server enables integration with a wide range of AI agents and tools.

Features

  • Standardized interface for document processing
  • Works with any MCP-compatible agent
  • Real-time document conversion
  • Support for multiple document formats

Usage

The MCP server allows agents to request document conversions through a standardized protocol. This enables Docling to work seamlessly with various AI agents and development tools.

Connect agents to Docling via MCP for real-time document processing capabilities.

Custom Integrations

You can easily integrate Docling into any Python application or framework.

Basic Integration Pattern

Python
from docling.document_converter import DocumentConverter

def process_document_for_your_framework(file_path):
    """Convert document and return in your framework's format"""
    converter = DocumentConverter()
    result = converter.convert(file_path)
    
    # Export to your preferred format
    markdown = result.document.export_to_markdown()
    
    # Convert to your framework's document format
    # ... your conversion logic ...
    
    return your_framework_documents

# Use in your application
documents = process_document_for_your_framework("document.pdf")

Integration Best Practices

1. Error Handling

Always wrap document conversion in try-except blocks:

Python
from docling.document_converter import DocumentConverter
from docling.exceptions import DoclingError

try:
    converter = DocumentConverter()
    result = converter.convert("document.pdf")
    documents = result.document.export_to_markdown()
except DoclingError as e:
    # Handle Docling-specific errors
    print(f"Docling error: {e}")
except Exception as e:
    # Handle other errors
    print(f"Error: {e}")

2. Caching

Cache converted documents to avoid reprocessing:

Python
import hashlib
import pickle
import os

def get_cached_document(file_path):
    cache_dir = ".docling_cache"
    os.makedirs(cache_dir, exist_ok=True)
    
    # Create cache key from file hash
    with open(file_path, "rb") as f:
        file_hash = hashlib.md5(f.read()).hexdigest()
    
    cache_path = os.path.join(cache_dir, f"{file_hash}.pkl")
    
    if os.path.exists(cache_path):
        with open(cache_path, "rb") as f:
            return pickle.load(f)
    
    # Convert and cache
    converter = DocumentConverter()
    result = converter.convert(file_path)
    
    with open(cache_path, "wb") as f:
        pickle.dump(result.document, f)
    
    return result.document

3. Batch Processing

Process multiple documents efficiently:

Python
from docling.document_converter import DocumentConverter
from concurrent.futures import ThreadPoolExecutor

def process_documents(file_paths):
    converter = DocumentConverter()
    
    def convert_one(path):
        return converter.convert(path).document
    
    with ThreadPoolExecutor(max_workers=4) as executor:
        documents = list(executor.map(convert_one, file_paths))
    
    return documents

Integration Examples

See more detailed examples in our examples section.

Getting Help