I was recently automating my real-estate newsletter and needed the LLM to:
find daily articles, read them, extract facts, write in a premade structured format
Surprisingly, the hard part wasn’t prompting or controlling output it was getting the relevant articles into the context window.
Raw articles were too large, so I ended up scraping → distilling → passing only facts/claims to the LLM.
It made me wonder: How are others handling large context in real-world pipelines?
Progressive summarization? Fact extraction? Retrieval + synthesis? Just bigger context models? Curious what’s actually working in practice.
Anyways I was thinking of building a library or api anyone can use for this where you send a request with a query and get articles that are summarised into just facts that the llm can write upon instead of raw articles, all for less api calls and cheaper
Top comments (0)