DEV Community

Discussion on: Build an Article Recommendation Engine With AI/ML

Collapse
thawkin3 profile image
Tyler Hawkins Author

Hey Dennis, thanks for reading! You've got the basic idea down, and these are good questions.

For the first question, the articles are transformed into "vector embeddings" on lines 52-53:

encoded_articles = model.encode(data['title_and_content'], show_progress_bar=True)
data['article_vector'] = pd.Series(encoded_articles.tolist())
Enter fullscreen mode Exit fullscreen mode

And then those vector embeddings get uploaded to the index on lines 59-60:

items_to_upload = [(row.id, row.article_vector) for i, row in data.iterrows()]
pinecone_index.upsert(items=items_to_upload)
Enter fullscreen mode Exit fullscreen mode

For your second question about where/how the similarity search is actually done, that's handled in the query_pinecone method. Specifically on line 83 is where we get the results:

query_results = pinecone_index.query(queries=[reading_history_vector], top_k=10)
Enter fullscreen mode Exit fullscreen mode

Now what's interesting about this is that since Pinecone is a managed similarity search service, it takes care of all this for you. If you were build something like this on your own without using Pinecone, then you'd have to write a lot more code to handle performing the search.

So Pinecone becomes sort of a facade over all the underlying details, which makes it look like magic, but also simplifies your job a whole bunch if you're not a machine learning expert.

Hope that helps!