DEV Community

Davide Santangelo
Davide Santangelo

Posted on

Python ML - Similarity with spaCy

Intro

I started learning ML and Data Science, and I want share with you a powerful python library for semantic text analysis called spaCy.

Description

Similarity is determined by comparing word vectors or “word embeddings”, multi-dimensional meaning representations of a word.

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.

spaCy is able to compare two objects, and make a prediction of how similar they are. Predicting similarity is useful for building recommendation systems or flagging duplicates. For example, you can suggest a user content that’s similar to what they’re currently looking at, or label a support ticket as a duplicate if it’s very similar to an already existing one.

Installation instructions

pip

Using pip, spaCy releases are available as source packages and binary wheels (as of v2.0.13).

# pip3 install spacy
Enter fullscreen mode Exit fullscreen mode

Models

spaCy’s models can be installed as Python packages. This means that they’re a component of your application, just like any other module. They’re versioned and can be defined as a dependency in your requirements.txt. Models can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system.

python3 -m spacy download en
Enter fullscreen mode Exit fullscreen mode

spaCy currently provides support for the following languages. Here is a complete list.

Example

here is a little python code that compare two string.

import spacy

nlp = spacy.load("en_core_web_sm")

first_text = nlp(input("insert first text: "))
second_text = nlp(input("insert second text: "))

print(f"similarity: {first_text.similarity(second_text)}")
Enter fullscreen mode Exit fullscreen mode

result

insert first text: i'm a software developer
insert second text: i'm a software web developer

similarity: 0.9302790237853475
Enter fullscreen mode Exit fullscreen mode

Enjoy!!

Top comments (1)

Collapse
 
denim06111991 profile image
denim06111991

Can we get in touch to discuss more on a book that we are planning to update?