DEV Community

Fernando André Fernandes
Fernando André Fernandes

Posted on

Content-Based recommendations using sentence embeddings and Elasticsearch.

Content-based recommendations

Description

In this article, I attempt to give some clearance as to how we could produce content-based recommendations using sentence embeddings and using Elasticsearch capabilities.
I will try to be as clear as I can, giving an overall view of how we've been doing this at Jumpseller

A definition

Content-based recommendations are focused on using a set of attributes that help characterize what is known in the recommender-systems literature as an item (a song, film, product) to build a profile that represents it. They can also do the same for users which are the people registered in the system.

An example

A common example that illustrates this is if we had a movie system, we could use movie names, descriptions, categories, cast, and other attributes to build a profile according to some defined heuristic. Then we could potentially recommend products with similar descriptions and similar names.
For example, Star Wars(Episode IV – A New Hope) and Star Wars(Episode IV – A New Hope) a very much alike in terms of description, name, and category.

How do we build an item profile?

Item profiles can be built in multiple ways. One choice would be to use the sentences which characterize the items in our system. These sentences can, in turn, be used to output sentence embeddings which are vectors that represent sentences in a text corpus.
These sentence embeddings can be produced in multiple ways. I'll enumerate 3:

  1. BERT - Bidirectional Encoder Representations from Transformers. It is a language representation model designed and published by Google. It is pre-trained and is a good solution for producing sentence embeddings.
  2. Doc2Vec - It is an extension of Word2Vec for sentence embeddings.
  3. Word2Vec - A NLP algorithm that uses a neural network to output word embeddings. We can then use these word embeddings to produce sentence-level embeddings, by performing some operation. A simple trick would be simply to average all the word embeddings inside a sentence producing a final vector.

alt text
Kenter, Tom. (2017). Text Understanding for Computers.

How do we persist these item profiles?

alt text

A good choice of storage for operations like these is Elasticsearch. With Elasticsearch's dense_vector mapping type we are free to index our documents (items) with a vector field with a size of our liking.

How do we perform recommendations?

Recommendations are done by computing the nearest neighbors for each item. We start by choosing a similarity measure, i.e the cosine similarity. Elasticsearch makes this easier since it has a built-in cosineSimilarity function for searching.
Since we decided to store our items with Elasticsearch we could use this function as described in their documentation.

Agent.ai Challenge image

Congrats to the Agent.ai Challenge Winners 🏆

The wait is over! We are excited to announce the winners of the Agent.ai Challenge.

From meal planners to fundraising automators to comprehensive stock analysts, our team of judges hung out with a lot of agents and had a lot to deliberate over. There were so many creative and innovative submissions, it is always so difficult to select our winners.

Read more →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay