Discussion on: Build an Article Recommendation Engine With AI/ML

View post

hey Tyler, I am an absolute AI beginner but find this topic super interesting.

So if I got the idea correctly:

You use the GloVe algorithm to represent your articles as multidimensional vectors. The euclidean distance between two article vectors describes how relevant they are to each other.

So a small euclidean distance => similar article.

Then you use the k nearest algorithm which essentially compares the euclidean distance of articles to find the most relevant content.

What I don't understand completely are the details on how the code works.

(1) In which line of the app.py do we transform the articles into vectors?
(2) And where do we compare the euclidean distance of the vectors to find the most relevant content?

Thanks for sharing, and sorry for the basic questions :))

cheers Dennis

Tyler Hawkins • Sep 3 '21

Hey Dennis, thanks for reading! You've got the basic idea down, and these are good questions.

For the first question, the articles are transformed into "vector embeddings" on lines 52-53:

encoded_articles = model.encode(data['title_and_content'], show_progress_bar=True)
data['article_vector'] = pd.Series(encoded_articles.tolist())

And then those vector embeddings get uploaded to the index on lines 59-60:

items_to_upload = [(row.id, row.article_vector) for i, row in data.iterrows()]
pinecone_index.upsert(items=items_to_upload)

For your second question about where/how the similarity search is actually done, that's handled in the query_pinecone method. Specifically on line 83 is where we get the results:

query_results = pinecone_index.query(queries=[reading_history_vector], top_k=10)

Now what's interesting about this is that since Pinecone is a managed similarity search service, it takes care of all this for you. If you were build something like this on your own without using Pinecone, then you'd have to write a lot more code to handle performing the search.

So Pinecone becomes sort of a facade over all the underlying details, which makes it look like magic, but also simplifies your job a whole bunch if you're not a machine learning expert.

Hope that helps!