What is Paperless-ngx?
Paperless-ngx is a self-hosted document management system that transforms physical documents into a searchable online archive. It OCRs your documents, extracts text, and lets you search everything via API.
Scanned receipts, invoices, letters — all searchable in seconds.
Quick Start
mkdir paperless && cd paperless
wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/docker-compose.sqlite.yml -O docker-compose.yml
docker compose up -d
docker compose run --rm webserver createsuperuser
Open http://localhost:8000.
The REST API
export PL_URL="http://localhost:8000/api"
export PL_TOKEN="your-token"
Upload Documents
curl -X POST "$PL_URL/documents/post_document/" \
-H "Authorization: Token $PL_TOKEN" \
-F "document=@invoice.pdf" \
-F "title=March Invoice" \
-F "correspondent=Acme Corp" \
-F "tags=2,5"
Paperless automatically: OCRs the document, extracts text, classifies it, and makes it searchable.
Search Documents
# Full-text search
curl -s "$PL_URL/documents/?query=invoice+march+2026" \
-H "Authorization: Token $PL_TOKEN" | jq '.results[] | {title, correspondent, created}'
# Filter by tags
curl -s "$PL_URL/documents/?tags__id__in=2,5" \
-H "Authorization: Token $PL_TOKEN"
# Filter by date range
curl -s "$PL_URL/documents/?created__date__gt=2026-01-01&created__date__lt=2026-04-01" \
-H "Authorization: Token $PL_TOKEN"
# Filter by correspondent
curl -s "$PL_URL/documents/?correspondent__name=Acme" \
-H "Authorization: Token $PL_TOKEN"
Get Document
# Get metadata
curl -s "$PL_URL/documents/DOC_ID/" \
-H "Authorization: Token $PL_TOKEN" | jq '{title, content, tags, correspondent, created}'
# Download original
curl -o document.pdf "$PL_URL/documents/DOC_ID/download/" \
-H "Authorization: Token $PL_TOKEN"
# Download thumbnail
curl -o thumb.png "$PL_URL/documents/DOC_ID/thumb/" \
-H "Authorization: Token $PL_TOKEN"
# Get OCR'd text
curl -s "$PL_URL/documents/DOC_ID/" \
-H "Authorization: Token $PL_TOKEN" | jq -r '.content'
Tags, Correspondents, Document Types
# Create tag
curl -X POST "$PL_URL/tags/" \
-H "Authorization: Token $PL_TOKEN" \
-d '{"name": "Tax 2026", "color": "#3498db", "is_inbox_tag": false}'
# Create correspondent
curl -X POST "$PL_URL/correspondents/" \
-H "Authorization: Token $PL_TOKEN" \
-d '{"name": "Acme Corp", "matching_algorithm": 1, "match": "acme"}'
# Create document type
curl -X POST "$PL_URL/document_types/" \
-H "Authorization: Token $PL_TOKEN" \
-d '{"name": "Invoice", "matching_algorithm": 1, "match": "invoice"}'
Auto-Matching
Paperless can automatically assign tags, correspondents, and types:
# Set matching rules
curl -X PATCH "$PL_URL/tags/TAG_ID/" \
-H "Authorization: Token $PL_TOKEN" \
-d '{"matching_algorithm": 3, "match": "receipt OR invoice"}'
Matching algorithms: exact, any word, all words, regex, fuzzy, auto (ML).
Features
- OCR: Tesseract-based text extraction (100+ languages)
- Full-text search: Find any word in any document
- Auto-classification: ML-based tag/type assignment
- Email consumption: Scan email attachments automatically
- Mobile scanning: Upload from phone camera
- Workflows: Rules for automatic processing
Paperless vs Google Drive
| Feature | Paperless-ngx | Google Drive |
|---|---|---|
| OCR | Automatic | Manual |
| Full-text search | Yes (OCR'd) | Yes (native PDFs) |
| Auto-classify | ML-based | No |
| Self-hosted | Yes | No |
| API | Full REST | Yes |
| Storage | Unlimited | 15 GB |
Need document automation or data extraction tools?
📧 spinov001@gmail.com
🔧 My tools on Apify Store
How do you manage documents? Physical or digital-first?
Top comments (0)