DEV Community: andre

Neural Search Tutorial. Part 2.

andre — Wed, 23 Feb 2022 19:39:10 +0000

In the first part, we've learned about the fundamentals of Neural Search. Let's now start with the hands-on tutorial.

Which model could be used?

However, not only specially trained models can be used. If the model is trained on a large enough dataset, its internal features can work as embeddings too. So, for instance, you can take any pre-trained on ImageNet model and cut off the last layer from it. In the penultimate layer of the neural network, as a rule, the highest-level features are formed, which, however, do not correspond to specific classes. The output of this layer can be used as an embedding.

What tasks is neural search good for?

Neural search has the greatest advantage in areas where the query cannot be formulated precisely. Querying a table in a SQL database is not the best place for neural search.

On the contrary, if the query itself is fuzzy, or it cannot be formulated as a set of conditions — neural search can help you. If the search query is a picture, sound file or long text, neural network search is almost the only option.

If you want to build a recommendation system, the neural approach can also be useful. The user’s actions can be encoded in vector space in the same way as a picture or text. And having those vectors, it is possible to find semantically similar users and determine the next probable user actions.

Let’s build our own

With all that said, let’s make our neural network search. As an example, let's make a search for startups by their description. In this demo, we will see the cases when text search works better and the cases when neural network search works better.
We will use data from startups-list.com. Each record contains the name, a paragraph describing the company, the location and a picture. Raw parsed data can be found at this link.

Prepare data for neural search

To be able to search for our descriptions in vector space, we must get vectors first. We need to encode the descriptions into a vector representation. As the descriptions are textual data, we can use a pre-trained language model. As mentioned above, for the task of text search there is a whole set of pre-trained models specifically tuned for semantic similarity.

One of the easiest libraries to work with pre-trained language models, in my opinion, is the sentence-transformers by UKPLab. It provides a way to conveniently download and use many pre-trained models, mostly based on transformer architecture. Transformers is not the only architecture suitable for neural search, but for our task, it is quite enough.

We will use a model called **distilbert-base-nli-stsb-mean-tokens**. DistilBERT means that the size of this model has been reduced by a special technique compared to the original BERT. This is important for the speed of our service and its demand for resources. The word stsb in the name means that the model was trained for the Semantic Textual Similarity task.
The complete code for data preparation with detailed comments can be found and run in Colab Notebook.

To be continued...

Neural Search Tutorial. Part 1.

andre — Wed, 16 Feb 2022 15:47:03 +0000

Information retrieval technology is one of the main technologies that enabled the modern Internet to exist. These days, search technology is the heart of a variety of applications. From web-pages search to product recommendations. For many years, this technology didn’t get much change until neural networks came into play.
In this tutorial we are going to find answers to these questions:

What is the difference between regular and neural search?
What neural networks could be used for search?
In what tasks is neural network search useful?
How to build and deploy own neural search service step-by-step?

What is neural search?

A regular full-text search, such as Google’s, consists of searching for keywords inside a document. For this reason, the algorithm can not take into account the real meaning of the query and documents. Many documents that might be of interest to the user are not found because they use different wording.

Neural search tries to solve exactly this problem — it attempts to enable searches not by keywords but by meaning. To achieve this, the search works in 2 steps. In the first step, a specially trained neural network encoder converts the query and the searched objects into a vector representation called embeddings. The encoder must be trained so that similar objects, such as texts with the same meaning or alike pictures get a close vector representation.

A neural encoder places cats closer together.

Having this vector representation, it is easy to understand what the second step should be. To find documents similar to the query you now just need to find the nearest vectors. The most convenient way to determine the distance between two vectors is to calculate the cosine distance. The usual Euclidean distance can also be used, but it is not so efficient due to the curse of dimensionality.

Which model could be used?

It is ideal to use a model specially trained to determine the closeness of meanings. For example, models trained on Semantic Textual Similarity (STS) datasets. Current state-of-the-art models could be found on this leaderboard.
However, not only specially trained models can be used. If the model is trained on a large enough dataset, its internal features can work as embeddings too. So, for instance, you can take any pre-trained on ImageNet model and cut off the last layer from it. In the penultimate layer of the neural network, as a rule, the highest-level features are formed, which, however, do not correspond to specific classes. The output of this layer can be used as an embedding.

What tasks is neural search good for?

Neural search has the greatest advantage in areas where the query cannot be formulated precisely. Querying a table in a SQL database is not the best place for neural search.
On the contrary, if the query itself is fuzzy, or it cannot be formulated as a set of conditions — neural search can help you. If the search query is a picture, sound file or long text, neural network search is almost the only option.
If you want to build a recommendation system, the neural approach can also be useful. The user’s actions can be encoded in vector space in the same way as a picture or text. And having those vectors, it is possible to find semantically similar users and determine the next probable user actions.

Let's get our hands dirty in the next part of the tutorial.

Qdrant raises pre-seed to build the future of open-source neural search technology

andre — Tue, 15 Feb 2022 18:38:23 +0000

Qdrant is a Berlin-based open-source based deep-tech start-up developing the leading neural search technology to bring applied AI solutions to the next level and make metric learning practical. Our flagship product - neural search engine provides a production-ready service with a convenient API to store, search, and manage vectors along with the additional payload. Qdrant engine is tailored to extended filtering support making it useful for all sorts of neural-network or semantic-based matching, recommendations, faceted search, etc. The area of application is quite broad and ranges from semantic search and product recommendations for e-commerce to image recognition and anomaly detection. The main focus is on processing unstructured data.

We want to announce a €2 million pre-seed round of financing from two European funds: 42CAP, an industry-specialized German fund based in Munich, and IBB Ventures, an early-stage VC fund based in Berlin, joined by several business angels with a deep know-how of the industry. With the investment, we plan to grow the community of early adopters and going to establish our technology as a standard by releasing the major version of its open-source neural search engine later this year.

Check out our GitHub repository for details: https://github.com/qdrant/qdrant
PS: We are looking for Rust Engineers 🦀 and ML Engineers 😉