Create account

DEV Community

Fahmi Noor Fiqri

Posted on Nov 10, 2024

AI Systematic Literature Review with KawanPaper

#devchallenge #pgaichallenge #database #ai

This is a submission for the Open Source AI Challenge with pgai and Ollama

What I Built

This is a conversational RAG app where all the RAG pipelines are entirely built in PostgreSQL procedures using PL/pgSQL!

The idea behind this app stems from my master's thesis work. I have to do systematic literature review and doing it manually is boring. So, I created this small app so I can just upload the full text paper and chat with it, create summaries, highlights, and key results. Massively streamlining the process of systematic literature review.

Of course, this app would work with any kind of data, we just need to change the system prompt a bit!😏

Key Features:

Summarize research papers (journal articles, conference papers, etc.)
Create highlights/key insights
Automatic processing using pgai Vectorizer
Chat with independent paper
Save multiple chat sessions

Initially I want to fully use Ollama, but pgai Vectorizer currently do not support Ollama, so I opted to use Open AI.

Demo

Demo video here

fahminlb33 / devto-timescale-pgai

KawanPaper

KawanPaper is your go-to app for chatting mainly with research papers (journal articles, conference papers, etc.)

Features:

PDF upload and automatic parsing
Generate key insights from research papers
Chat with a specific paper

Setup

Make sure you have an up to date Docker instalation and then clone this repo. We will divide the installation process into 3 parts, minio setup, database migration, and launching the app.

Configuration

Main configuration: copy the .env.example file to .env
Docker compose configuration: copy the docker.env.example to docker.env

These config have a predefined values to make it easier to deploy. Note there are some env vars that we need to define:

.env

VITE_MINIO_ACCESS_KEY
VITE_MINIO_SECRET_KEY

docker.env

OPENAI_API_KEY

You can add your Open AI key in the docker.env and for the minio credentials, we will create one in the next step.

Minio Setup

This is a new thing for me, back in the day we can…

View on GitHub

Tools Used

TimescaleDB as the main database to store the documents and its embeddings
pgai to access Open AI services in database
pgvector to store document embeddings
pgvectorscale to create indexes on the embeddings
pgai Vectorizer to automatically create embeddings from the uploaded papers

Prize Categories

Vectorizer Vibe, All the Extensions

Tech Stack

PostgreSQL (TimescaleDB)
Minio
Remix

So little tech stack for a RAG app😊 We can make it smaller by storing blobs in Postgres but I don't like that idea.

Conversational RAG in PL/pgSQL

In this SQL script I implemented two Postgres function to build the conversational RAG pipeline. This is the heart and soul of this app.

I got the idea from this LangChain tutorial.

CREATE FUNCTION contextualize_question(p_session_id VARCHAR(36), p_query TEXT) RETURNS TEXT

CREATE PROCEDURE chat_with_paper(p_session_id VARCHAR(36), p_chat_content TEXT)

I never thought I would be writing LLM chain/pipeline using SQL instead of Haystack, LangChain, or LlamaIndex, but here we are!

It's crazy what pgai could bring in the future for LLM in databases.

Final Thoughts

This has been an interesting journey because the idea of running LLM directly in database is really weird at first. But after learning it for the last 2 days, I found it really interesting and could possibly revolutionize data mining pipelines for non-AI engineers. I imagine data analysts and researchers could easily get insights from database systems without major changes to existing systems.

One of my favorite experiences in this project is I learned how to write Postgres procedures and functions using PL/pgSQL. It was a really interesting journey especially to write LLM apps that used to be written using LangChain, Haystack, or LammaIndex now I implemented it using pure PL/pgSQL to build a conversational RAG.