DEV Community

Discussion on: Build an Article Recommendation Engine With AI/ML

gdenn profile image
Dennis Groß

hey Tyler, I am an absolute AI beginner but find this topic super interesting.

So if I got the idea correctly:

You use the GloVe algorithm to represent your articles as multidimensional vectors. The euclidean distance between two article vectors describes how relevant they are to each other.

So a small euclidean distance => similar article.

Then you use the k nearest algorithm which essentially compares the euclidean distance of articles to find the most relevant content.

What I don't understand completely are the details on how the code works.

(1) In which line of the do we transform the articles into vectors?
(2) And where do we compare the euclidean distance of the vectors to find the most relevant content?

Thanks for sharing, and sorry for the basic questions :))

cheers Dennis

thawkin3 profile image
Tyler Hawkins Author

Hey Dennis, thanks for reading! You've got the basic idea down, and these are good questions.

For the first question, the articles are transformed into "vector embeddings" on lines 52-53:

encoded_articles = model.encode(data['title_and_content'], show_progress_bar=True)
data['article_vector'] = pd.Series(encoded_articles.tolist())
Enter fullscreen mode Exit fullscreen mode

And then those vector embeddings get uploaded to the index on lines 59-60:

items_to_upload = [(, row.article_vector) for i, row in data.iterrows()]
Enter fullscreen mode Exit fullscreen mode

For your second question about where/how the similarity search is actually done, that's handled in the query_pinecone method. Specifically on line 83 is where we get the results:

query_results = pinecone_index.query(queries=[reading_history_vector], top_k=10)
Enter fullscreen mode Exit fullscreen mode

Now what's interesting about this is that since Pinecone is a managed similarity search service, it takes care of all this for you. If you were build something like this on your own without using Pinecone, then you'd have to write a lot more code to handle performing the search.

So Pinecone becomes sort of a facade over all the underlying details, which makes it look like magic, but also simplifies your job a whole bunch if you're not a machine learning expert.

Hope that helps!