Marker, hosted: a scientific PDF parser API with LaTeX equations preserved

#api #llm #rag #tooling

The problem

I kept hitting the same wall when building RAG pipelines over research papers: every generic PDF parser I tried mangled the equations.

Adobe Extract, AWS Textract, pdfplumber, PyMuPDF — they all collapse display math into plain-text garbage. Attention(Q,K,V) = softmax(QKT / √dk) V becomes something like:

QKT √dk

Attention(Q,K,V ) = softmax(

)V (1)

Unusable. Your embedding model sees a soup of tokens. Your LLM has no idea what the equation means. Your RAG answers are wrong on anything math-heavy.

What I tried

I benchmarked the obvious options on a handful of arxiv papers I cared about:

Docling (IBM): drops every display equation as a placeholder. ~5/12 on a controlled equation-extraction benchmark.
Nougat (Meta): the results were actually good when it worked, but the repo is essentially unmaintained and the dependency tree is a minefield.
Mistral OCR: cheap and general-purpose, but equation fidelity is inconsistent on papers with dense notation.
LlamaParse: optimized for "give me RAG chunks", not "preserve the math".
Marker (github.com/datalab-to/marker): the only OSS tool that consistently produced clean LaTeX. Scored ~10.5/12 on the same benchmark Docling scored 5 on.

Why I didn't just use Marker directly

Marker is the right tool, but running it yourself is not trivial:

5GB of model weights to download on first run
CUDA + PyTorch + transformers + torchvision version dance
GPU server to host it (T4 or better — CPU inference takes ~10x longer)
A queue because parses take 60–180 seconds and you can't block an HTTP request that long
Idle GPU bills when nobody is parsing anything

For a side project, this was 2+ days of yak shaving before I could POST my first PDF. I wanted a one-line API.

What I built

I wrapped Marker in a Modal deployment and put an async FastAPI on top of it. Two endpoints:

# Submit a paper
curl -X POST https://scientific-paper-parser1.p.rapidapi.com/parse-paper \
  -H "X-RapidAPI-Key: $KEY" \
  -F "url=https://arxiv.org/pdf/1706.03762"
# → {"call_id": "fc-01K...", "status": "queued"}

# Poll for the result
curl https://scientific-paper-parser1.p.rapidapi.com/parse-paper/$ID \
  -H "X-RapidAPI-Key: $KEY"
# → {"status": "done", "result": {"markdown": "...", ...}}

And on the same Attention paper, it returns:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) V$$

Clean. LaTeX. Embeds cleanly into any RAG pipeline. Renders in any markdown viewer with math support. Feeds Claude and GPT cleanly.

The Modal architecture

Three things made the economics work:

1. Persistent volume for model weights. First container ever downloads the 5GB of Marker weights to a Modal Volume. Every subsequent container mounts the volume and reuses them. Cold start on a warm volume is ~10 seconds instead of ~5 minutes.

models_volume = modal.Volume.from_name("marker-models", create_if_missing=True)

@app.cls(
    volumes={"/root/.cache/datalab": models_volume},
    gpu="T4",
    scaledown_window=300,
)
class Parser:
    @modal.enter()
    def load_models(self):
        from marker.converters.pdf import PdfConverter
        from marker.models import create_model_dict
        self.converter = PdfConverter(artifact_dict=create_model_dict())

2. spawn-and-poll for long parses. A 50-page paper takes 90–180 seconds. You can't hold an HTTP connection open that long, especially not behind a CDN. Modal's function.spawn() returns a FunctionCall object you can look up by ID later:

@api.post("/parse-paper")
async def submit(file: UploadFile = File(None), url: str = Form(None)):
    pdf_bytes = await _fetch_pdf(file, url)
    call = Parser().parse.spawn(pdf_bytes)
    return {"call_id": call.object_id, "status": "queued"}

@api.get("/parse-paper/{call_id}")
async def poll(call_id: str):
    call = modal.FunctionCall.from_id(call_id)
    try:
        result = call.get(timeout=0)
        return {"status": "done", "result": result}
    except TimeoutError:
        return {"status": "processing"}

3. Scale-to-zero. scaledown_window=300 keeps a warm container for 5 minutes after the last request. After that, the container shuts down and idle cost drops to zero. I pay only for the seconds I'm actually parsing something.

The business side

I put it behind a RapidAPI listing so distribution is one click for anyone comparing parsing APIs. Free tier is 2 papers/month (no credit card) and paid plans start at $9/mo for 75 papers.

I'm not trying to beat Marker on quality (it IS Marker). I'm not trying to beat Mistral OCR on price (I can't). I'm solving one specific problem: "I want Marker quality without running a GPU server."

Honest about what this is not

Not my model. It's Marker (Apache 2.0), hosted. I'm explicit about this on the landing page.
Not the cheapest per-page option. Mistral OCR is cheaper if you don't care about equation fidelity.
Not for scanned PDFs. Typeset only — Marker doesn't do OCR.
Not for arxiv-only workflows. There's a free tool called arxiv2md that parses arxiv's HTML source if that's all you need.

Where this fits

If you're doing RAG over biorxiv, chemrxiv, published journal PDFs, internal research docs, or any scientific PDF that isn't on arxiv, and equation fidelity matters for your answers, this saves you a weekend.

Landing: https://paper-parser-landing.vercel.app
API: https://rapidapi.com/kjyounai/api/scientific-paper-parser1

Feedback welcome — especially if you've tried self-hosting Marker before or have opinions on the async polling pattern. Happy to answer questions about the Modal setup or the Marker tradeoffs in the comments.