Docling is good at taking a file or url and converting it to markdown. You have the option to use it as an API, which is pretty nice.
Here is how you can add it to your docker compose stack:
yourappname-docling-serve:
image: quay.io/docling-project/docling-serve-cpu
container_name: yourappname-docling-serve
restart: unless-stopped
ports:
- 5001:5001
env_file:
- .env
In the env file or environment you can choose to enable the UI for tests: DOCLING_SERVE_ENABLE_UI=1
If you have nvidia available you can use a different docker image than quay.io/docling-project/docling-serve-cpu.
In production make sure to remove this part which exposes the port to outside traffic.
...
ports:
- 5001:5001
...
Here is an example on how you can make a request to docling serve api:
def get_markdown_from_facsimil_image_url(url: str):
url = "http://localhost:5001/v1/convert/source"
payload = {
"options": {
"from_formats": [
"docx",
"pptx",
"html",
"image",
"pdf",
"asciidoc",
"md",
"xlsx",
],
"to_formats": ["md", "json", "html", "text", "doctags"],
"image_export_mode": "placeholder",
"ocr": True,
"force_ocr": True,
"ocr_engine": "easyocr",
"ocr_lang": ["ro"],
"pdf_backend": "dlparse_v2",
"table_mode": "accurate",
"abort_on_error": False,
},
"sources": [{"kind": "http", "url": url}],
}
response = requests.post(
url, json=payload, headers={"X-Api-Key": DOCLING_SERVE_API_KEY}
)
data = response.json()
return data
More endpoints are documented at: http://localhost:5001/docs
Docling serve is a FastAPI api so you can view the code and see what else is possible. For example: you can set RQ for processing jobs by setting env: DOCLING_SERVE_ENG_RQ_REDIS_URL=redis://:password@docker-compose-redis:6379/1 and setting DOCLING_SERVE_ENG_KIND=rq (by default is local).
More envs here: docling settings
More on this here: dockling-serve
Top comments (0)