Basic Usage
The Docling CLI provides a simple and powerful way to convert documents without writing any Python code.
Convert from URL
docling https://arxiv.org/pdf/2206.01062
This will convert the PDF and output the result to stdout in Markdown format.
Convert Local File
docling document.pdf
Specify Output File
docling document.pdf -o output.md
Or use the long form:
docling document.pdf --output output.md
Command Options
Output Format
Specify the output format:
# Markdown (default)
docling document.pdf --format markdown
# HTML
docling document.pdf --format html
# JSON
docling document.pdf --format json
# DocTags
docling document.pdf --format doctags
Pipeline Selection
Choose the processing pipeline:
# Default pipeline
docling document.pdf --pipeline default
# VLM pipeline
docling document.pdf --pipeline vlm
Visual Language Models
Use Visual Language Models for enhanced document understanding:
docling --pipeline vlm --vlm-model granite_docling document.pdf
On Apple Silicon, MLX acceleration is automatically used when available.
Advanced Options
Version Information
docling --version
Help
docling --help
Verbose Output
Get detailed information about the conversion process:
docling document.pdf --verbose
Common Use Cases
Quick Document Preview
Quickly preview a document's content:
docling document.pdf | head -50
Batch Conversion
Convert multiple documents using shell scripting:
for file in *.pdf; do
docling "$file" -o "${file%.pdf}.md"
done
Pipeline Integration
Use Docling CLI in shell pipelines:
docling document.pdf | grep -i "keyword" | head -20
Save to File
Save output to a file:
docling document.pdf > output.md
# Or use the -o flag
docling document.pdf -o output.md
Supported Input Formats
The CLI supports all formats that Docling can process:
- PDF files
- Word documents (DOCX)
- PowerPoint presentations (PPTX)
- Excel spreadsheets (XLSX)
- Markdown files
- HTML files
- Images (PNG, JPEG, TIFF, BMP, WEBP)
- Audio files (MP3, WAV)
- And more...
Output Formats
Choose from multiple output formats:
- Markdown - Human-readable format with tables and formatting
- HTML - Rich HTML output with styling
- JSON - Lossless JSON representation
- DocTags - Structured format for AI systems
Performance Tips
Large Documents
For very large documents, consider:
- Using the default pipeline for faster processing
- Processing in chunks if needed
- Using VLM pipeline only when enhanced understanding is required
Multiple Documents
For batch processing, use parallel execution:
# Using GNU parallel (if available)
parallel docling {} -o {.}.md ::: *.pdf
# Using xargs
find . -name "*.pdf" | xargs -I {} docling {} -o {}.md
Error Handling
The CLI provides clear error messages for common issues:
- File not found errors
- Unsupported format errors
- Conversion errors
For debugging, use the --verbose flag to get detailed error information.
Examples
Convert Research Paper
docling https://arxiv.org/pdf/2408.09869 -o paper.md
Convert with VLM
docling --pipeline vlm --vlm-model granite_docling complex_document.pdf -o output.md
Extract Tables to JSON
docling document.pdf --format json -o tables.json
Integration with Scripts
The CLI can be easily integrated into shell scripts and automation workflows:
#!/bin/bash
# Convert all PDFs in a directory
INPUT_DIR="documents"
OUTPUT_DIR="converted"
mkdir -p "$OUTPUT_DIR"
for pdf in "$INPUT_DIR"/*.pdf; do
filename=$(basename "$pdf" .pdf)
docling "$pdf" -o "$OUTPUT_DIR/$filename.md"
echo "Converted: $filename"
done
Comparison with Python API
The CLI is perfect for:
- Quick conversions
- Shell scripting
- One-off document processing
- Testing and debugging
For more complex workflows, use the Python API.
Next Steps
- Try converting your first document
- Explore different output formats
- Check out examples for more use cases
- Read the documentation for advanced features