AI Framework Integrations
Docling provides native integrations with leading AI frameworks, making it easy to incorporate advanced document processing into your AI applications.
LangChain
Integrate Docling seamlessly with LangChain for document processing in your LangChain workflows.
Installation
pip install langchain-community
Basic Usage
from langchain_community.document_loaders import DoclingLoader
# Load document
loader = DoclingLoader("document.pdf")
documents = loader.load()
# Use in LangChain chain
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)
qa_chain = RetrievalQA.from_chain_type(
llm=your_llm,
retriever=vectorstore.as_retriever()
)
response = qa_chain.run("What is this document about?")
Advanced Configuration
from langchain_community.document_loaders import DoclingLoader
loader = DoclingLoader(
"document.pdf",
pipeline="vlm",
vlm_model="granite_docling"
)
documents = loader.load()
Learn more: LangChain Docling Documentation
LlamaIndex
Use Docling with LlamaIndex for RAG (Retrieval-Augmented Generation) applications.
Installation
pip install llama-index-readers-docling
Basic Usage
from llama_index.readers.docling import DoclingReader
from llama_index import VectorStoreIndex
# Load documents
reader = DoclingReader()
documents = reader.load_data("document.pdf")
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("Summarize this document")
print(response)
With Custom Configuration
from llama_index.readers.docling import DoclingReader
reader = DoclingReader(
pipeline="vlm",
vlm_model="granite_docling"
)
documents = reader.load_data("document.pdf")
Crew AI
Integrate Docling with Crew AI for multi-agent document processing workflows.
Installation
pip install crewai docling
Usage Example
from crewai import Agent, Task, Crew
from docling.document_converter import DocumentConverter
# Convert document with Docling
converter = DocumentConverter()
result = converter.convert("document.pdf")
document_content = result.document.export_to_markdown()
# Use in Crew AI agent
researcher = Agent(
role='Research Analyst',
goal='Analyze documents and extract insights',
backstory='Expert at document analysis',
verbose=True
)
task = Task(
description=f'Analyze this document: {document_content[:1000]}...',
agent=researcher
)
crew = Crew(
agents=[researcher],
tasks=[task]
)
result = crew.kickoff()
Haystack
Use Docling with Haystack for enterprise-grade document processing pipelines.
Installation
pip install haystack-ai docling
Basic Usage
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import DoclingConverter
from haystack.pipelines import Pipeline
# Convert document
converter = DoclingConverter()
documents = converter.convert("document.pdf")
# Store in document store
document_store = InMemoryDocumentStore()
document_store.write_documents(documents)
# Create pipeline
pipeline = Pipeline()
pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Query"])
# Use in Haystack pipeline
# ... your Haystack pipeline configuration ...
MCP Server
Connect to any agent using the Model Context Protocol (MCP) server. Docling's MCP server enables integration with a wide range of AI agents and tools.
Features
- Standardized interface for document processing
- Works with any MCP-compatible agent
- Real-time document conversion
- Support for multiple document formats
Usage
The MCP server allows agents to request document conversions through a standardized protocol. This enables Docling to work seamlessly with various AI agents and development tools.
Connect agents to Docling via MCP for real-time document processing capabilities.
Custom Integrations
You can easily integrate Docling into any Python application or framework.
Basic Integration Pattern
from docling.document_converter import DocumentConverter
def process_document_for_your_framework(file_path):
"""Convert document and return in your framework's format"""
converter = DocumentConverter()
result = converter.convert(file_path)
# Export to your preferred format
markdown = result.document.export_to_markdown()
# Convert to your framework's document format
# ... your conversion logic ...
return your_framework_documents
# Use in your application
documents = process_document_for_your_framework("document.pdf")
Integration Best Practices
1. Error Handling
Always wrap document conversion in try-except blocks:
from docling.document_converter import DocumentConverter
from docling.exceptions import DoclingError
try:
converter = DocumentConverter()
result = converter.convert("document.pdf")
documents = result.document.export_to_markdown()
except DoclingError as e:
# Handle Docling-specific errors
print(f"Docling error: {e}")
except Exception as e:
# Handle other errors
print(f"Error: {e}")
2. Caching
Cache converted documents to avoid reprocessing:
import hashlib
import pickle
import os
def get_cached_document(file_path):
cache_dir = ".docling_cache"
os.makedirs(cache_dir, exist_ok=True)
# Create cache key from file hash
with open(file_path, "rb") as f:
file_hash = hashlib.md5(f.read()).hexdigest()
cache_path = os.path.join(cache_dir, f"{file_hash}.pkl")
if os.path.exists(cache_path):
with open(cache_path, "rb") as f:
return pickle.load(f)
# Convert and cache
converter = DocumentConverter()
result = converter.convert(file_path)
with open(cache_path, "wb") as f:
pickle.dump(result.document, f)
return result.document
3. Batch Processing
Process multiple documents efficiently:
from docling.document_converter import DocumentConverter
from concurrent.futures import ThreadPoolExecutor
def process_documents(file_paths):
converter = DocumentConverter()
def convert_one(path):
return converter.convert(path).document
with ThreadPoolExecutor(max_workers=4) as executor:
documents = list(executor.map(convert_one, file_paths))
return documents
Integration Examples
See more detailed examples in our examples section.
Getting Help
- Check the documentation for detailed API reference
- View code examples for common use cases
- Visit the GitHub repository for issues and discussions
- Refer to framework-specific documentation for integration details