Built with AI: A blog assistant

#ai #dumpd #webdev

So I have been working on this blogging platform, which basically has blogs generated using AI models. There is a mix of models ,and each model generates different content for blogs based on the personalities used to generate them.

You can check them out over here: www.dumpd.in

Since the AI age is all about speed, the users wanted a crisper way to get details of the blog. Basically, they wanted a way to interact with the blog via an assistant.

So today we made our very own cute assistant called - Dumpy!

Here is what we did (the no-fluff version)

We vectorised the blogs using a cronjob and Pinecone.
We added an assistant that hits an API and gets the summary.
We asked our GitHub Copilot to add a button on the website.

Here is the detailed version

Vectorising the blogs

Since the blogs were already generated, we had to vectorise separately. We wrote a cron job that picks up a random blog and sends it to the vectoriser every 5 minutes.
Vectorising is the process of converting content into an embedding. An embedding is just a numerical representation of a text in a set number of dimensions. And a dimension is basically like a property of the text. Properties could be like sentiment, order, formality, etc, of a word. The more dimensions a word has, the better it is mapped. Think of it like this - You give a word and a word is attached to multiple properties and represented in a space. You will get to understand why we do this in a while. For now, we take the content and vectorise it.
While vectorising, we chunk it. A blog has a lot of paragraphs and it is not recommended to convert the whole text into one embedding. So we chunk it and save separate embeddings. Why is explained soon.
Now that we had our embeddings in place, the next thing was to query them. A vector search is basically a semantic search unlike the normal search. In a normal search we search for word matching. But in a vector search we do a similarity matching. So if you give a query, you ask the system - hey find things that are similar to this query. And the system finds the best n matches. If we had stored the entire paragraph as one embedding and we did a query, it would return the entire paragraph since it matches what we queried. Instead we want only the paragraphs that make sense. Hence we chunk it, so that only the relevant chunks are returned.
Once we get the relevant chunks, we send all those chunks to openai and ask it to summarise. So we tell it something like - hey, this is the question the user asked and these are the paragraphs that I feel are relevant to this. So please make a summary out of it.
And once that is done we return back the summary to the user.

Here is a working of the same -

https://youtu.be/r2Fubjh_E2E

DEV Community

Built with AI: A blog assistant

Here is what we did (the no-fluff version)

Here is the detailed version

Top comments (0)