Basic Usage

The Docling CLI provides a simple and powerful way to convert documents without writing any Python code.

Convert from URL

Terminal
docling https://arxiv.org/pdf/2206.01062

This will convert the PDF and output the result to stdout in Markdown format.

Convert Local File

Terminal
docling document.pdf

Specify Output File

Terminal
docling document.pdf -o output.md

Or use the long form:

Terminal
docling document.pdf --output output.md

Command Options

Output Format

Specify the output format:

Terminal
# Markdown (default)
docling document.pdf --format markdown

# HTML
docling document.pdf --format html

# JSON
docling document.pdf --format json

# DocTags
docling document.pdf --format doctags

Pipeline Selection

Choose the processing pipeline:

Terminal
# Default pipeline
docling document.pdf --pipeline default

# VLM pipeline
docling document.pdf --pipeline vlm

Visual Language Models

Use Visual Language Models for enhanced document understanding:

Terminal
docling --pipeline vlm --vlm-model granite_docling document.pdf

On Apple Silicon, MLX acceleration is automatically used when available.

Advanced Options

Version Information

Terminal
docling --version

Help

Terminal
docling --help

Verbose Output

Get detailed information about the conversion process:

Terminal
docling document.pdf --verbose

Common Use Cases

Quick Document Preview

Quickly preview a document's content:

Terminal
docling document.pdf | head -50

Batch Conversion

Convert multiple documents using shell scripting:

Terminal
for file in *.pdf; do
    docling "$file" -o "${file%.pdf}.md"
done

Pipeline Integration

Use Docling CLI in shell pipelines:

Terminal
docling document.pdf | grep -i "keyword" | head -20

Save to File

Save output to a file:

Terminal
docling document.pdf > output.md
# Or use the -o flag
docling document.pdf -o output.md

Supported Input Formats

The CLI supports all formats that Docling can process:

  • PDF files
  • Word documents (DOCX)
  • PowerPoint presentations (PPTX)
  • Excel spreadsheets (XLSX)
  • Markdown files
  • HTML files
  • Images (PNG, JPEG, TIFF, BMP, WEBP)
  • Audio files (MP3, WAV)
  • And more...

See all supported formats.

Output Formats

Choose from multiple output formats:

  • Markdown - Human-readable format with tables and formatting
  • HTML - Rich HTML output with styling
  • JSON - Lossless JSON representation
  • DocTags - Structured format for AI systems

Performance Tips

Large Documents

For very large documents, consider:

  • Using the default pipeline for faster processing
  • Processing in chunks if needed
  • Using VLM pipeline only when enhanced understanding is required

Multiple Documents

For batch processing, use parallel execution:

Terminal
# Using GNU parallel (if available)
parallel docling {} -o {.}.md ::: *.pdf

# Using xargs
find . -name "*.pdf" | xargs -I {} docling {} -o {}.md

Error Handling

The CLI provides clear error messages for common issues:

  • File not found errors
  • Unsupported format errors
  • Conversion errors

For debugging, use the --verbose flag to get detailed error information.

Examples

Convert Research Paper

Terminal
docling https://arxiv.org/pdf/2408.09869 -o paper.md

Convert with VLM

Terminal
docling --pipeline vlm --vlm-model granite_docling complex_document.pdf -o output.md

Extract Tables to JSON

Terminal
docling document.pdf --format json -o tables.json

Integration with Scripts

The CLI can be easily integrated into shell scripts and automation workflows:

Bash Script
#!/bin/bash
# Convert all PDFs in a directory

INPUT_DIR="documents"
OUTPUT_DIR="converted"

mkdir -p "$OUTPUT_DIR"

for pdf in "$INPUT_DIR"/*.pdf; do
    filename=$(basename "$pdf" .pdf)
    docling "$pdf" -o "$OUTPUT_DIR/$filename.md"
    echo "Converted: $filename"
done

Comparison with Python API

The CLI is perfect for:

  • Quick conversions
  • Shell scripting
  • One-off document processing
  • Testing and debugging

For more complex workflows, use the Python API.

Next Steps

  • Try converting your first document
  • Explore different output formats
  • Check out examples for more use cases
  • Read the documentation for advanced features