What it does
Patents to Markdown for RAG pulls patents from Google Patents (US, EP, and WO families) and converts them into clean, chunked Markdown ready for retrieval-augmented generation and LLM pipelines. It extracts the abstract, claims, and full description, then segments the text into token-sized chunks so you can embed and index without any HTML or PDF cleanup.
Who it's for
Built for AI engineers building patent search, IP analysts assembling prior-art corpora, and legal-tech teams who need patent text in a format an LLM can actually consume.
Sample fields / output
patent_numbertitleabstractclaimsdescriptionassigneeinventorsfiling_datepublication_datejurisdictionmarkdownchunk_idtoken_count
Example use cases
- Build a prior-art RAG knowledge base for a patent-search assistant.
- Feed chunked claims and descriptions into an embeddings index for semantic search.
- Generate LLM-ready context for freedom-to-operate and invalidity analysis.
Try Patents to Markdown for RAG on Apify»
Related actors
FAQ
Which patent offices are covered?
US, EP, and WO families via Google Patents, including abstract, claims, and full description.
What format is the output?
Clean Markdown split into token-sized chunks, each with a chunk_id and token_count for direct embedding.
Do I need a Google Patents login?
No login is required.
See also: New -- Patents to Markdown for RAG
Top comments (0)