SentinelCipher

Posted on Jun 7

How I built PACE: an open source content analysis pipeline with parallel LLM batching (and what I learned)

#python #opensource #llm #showdev

I built PACE because I was drowning in content I needed to process.
Research papers, YouTube talks, long articles. I kept pasting things into AI chat interfaces one piece at a time, getting inconsistent output with no repeatable structure. It worked, but it did not scale and it certainly did not feel like a system.
So I built one.
PACE (Precise Analysis and Compilation of Extracts) is an open source Streamlit app that ingests content from 5 sources and outputs a structured 10-section report. This post covers the architecture decisions, what worked, and what did not.
Repo: github.com/AshayK003/PACE

The pipeline overview

Input (YouTube / PDF / Article / Audio / Text)
-> Ingest
-> Clean + Chunk
-> Parallel LLM Analysis (3 batches, 10 sections)
-> Final Synthesis
-> Markdown or PDF report
Every stage is modular. Ingestors live in app/ingestors/, each inheriting from BaseIngestor and implementing validate() and ingest(). Adding a new source means adding one file and inheriting the base class.

The ingestor choices

YouTube: youtube-transcript-api. No API key, no OAuth, just a URL. Works for anything with auto-generated or manual captions.
PDF: PyMuPDF4LLM combined with pdfplumber for table extraction. PyMuPDF4LLM runs at 0.09 seconds per page and stays under 1GB RAM, which matters a lot on Streamlit Community Cloud where memory is limited.
Articles: trafilatura. I tested several extractors against each other. trafilatura consistently had the best signal to noise ratio on real world news articles and blog posts. It's not the most popular library but it outperforms readability and newspaper3k on F1 score in published benchmarks.
Audio: faster-whisper for local speech to text. This tab is disabled on Streamlit Cloud because it requires local compute. Worth including for self-hosters.

Semantic chunking without embeddings

Long content needs to be chunked before going into an LLM context window. Most approaches either split naively by character count (destroys semantic coherence) or use embeddings to find meaningful boundaries (adds an API call and a vector dependency).
I used semchunk, which does semantic splitting based on sentence structure and content similarity without requiring embeddings. It keeps related content together and stays cheap to run. For a tool designed to work with free-tier LLMs this was the right call.

The parallel batching decision

This was the biggest performance unlock.
The naive approach is sequential: call the LLM, get section 1, call again, get section 2, repeat 10 times. At 2 to 3 seconds per call, that is 20 to 30 seconds minimum.
PACE groups the 10 analysis sections into 3 batches and fires them concurrently with asyncio. Each batch handles multiple sections in a single LLM call, and the 3 batches run in parallel.
Result: total analysis time dropped from 45 seconds to under 20 seconds. Around 60% faster in practice.
The tradeoff is that prompt construction gets more complex. You have to instruct the model to return multiple labeled sections in one response, then parse them back out reliably. The parser in app/analyzers/parser.py handles this and has 9 dedicated tests covering edge cases.

LLM provider strategy

I built the LLM client against the OpenAI-compatible API interface which every major provider now supports. This means the same client code works with Gemini, Groq, Cerebras, Mistral, DeepSeek, and OpenRouter without any provider-specific logic.
There is a built-in free tier key for people who want to try the tool without signing up anywhere. For heavier use, BYOK from the sidebar. The key stays in Streamlit session state and never hits disk.
The LRU cache (50 entries, 1 hour TTL) means re-analyzing the same content costs zero LLM calls on repeat runs.

Security was not optional
PACE makes HTTP requests based on user-supplied URLs. That is a classic SSRF vector. I added DNS resolution with IP blocking before any outbound request goes through. Private IP ranges, cloud metadata endpoints, and localhost are all blocked.
Other security layers:

File upload validates magic bytes, not just extension
50k character input cap prevents prompt stuffing
Prompt injection detection on user inputs
Error sanitization strips file paths, API keys, and internal details from any error message the user sees

All of this is covered in app/security.py with 40 tests in test_security.py.

Testing: 215 tests across 9 modules

ModuleTeststest_analyzers.py30test_security.py40test_ingestors.py31test_output.py38test_cleaner.py20test_chunker.py10test_config.py14test_parser.py9test_integration.py16
The integration tests were the most valuable. They test full pipeline runs with various content types and failure modes. Every time I changed the batching logic or the parser, the integration tests caught regressions before I manually tested anything.

Deployment

Streamlit Community Cloud is zero cost and handles multi-user sessions automatically. Deployment steps:

Push to GitHub
Go to share.streamlit.io
Set OPENCODE_ZEN_KEY in secrets

Done. The only caveat is that audio transcription requires local compute so that tab is hidden on cloud deployments.

Contributing

The codebase is designed to be easy to extend in three specific ways:
New ingestor: add app/ingestors/my_source.py, inherit BaseIngestor, implement validate() and ingest().
New analysis step: add a prompt to app/analyzers/prompts.py, register it in ALL_PROMPTS.
New LLM preset: add an entry to the presets dict in app/ui/sidebar.py.
All contributions need tests. Run pytest before opening a PR. All 215 must pass.

What I would do differently
The prompt engineering took way longer than expected. Getting LLMs to return structured multi-section output consistently across different providers required many iterations. If I rebuilt this, I would have started with a dedicated output validation layer earlier rather than treating it as a late-stage concern.
I would also add a web scraping fallback for paywalled articles sooner. Right now trafilatura fails gracefully, but a secondary fetch strategy would improve reliability.

Links

Repo: github.com/AshayK003/PACE
MIT license. Stars and PRs welcome.

DEV Community