DEV Community

loading...
Cover image for Python NLP libraries to learn and use in 2021

Python NLP libraries to learn and use in 2021

amananandrai
Data Science and Machine Learning Enthusiast
・4 min read

Natural Language Processing or NLP is one of the most sought after fields of Machine Learning. It is the technology behind Google Assitant, Amazon's Alexa, Microsoft's Cortana, and Apple's Siri. It can be used to make Chatbots, give logical answers to Search queries, translate different languages interchangeably, and help movie recommender systems by determining the review and assigning ratings.

Python is one of the most famous languages used in the field of Machine Learning and it can be used for NLP as well. It is a very popular language in the NLP community as well.

In this blog, we will come across some famous NLP python libraries which can be used for various NLP tasks like text summarization, question answering, sentiment analysis, POS tagging, tokenization, and named entity recognition, etc. The libraries which we will discuss are-

  • NLTK
  • spaCy
  • Gensim
  • HuggingFace's Transformer
  • Flair
  • TextBlob

If you are planning to learn NLP in 2021 using Python try these libraries listed below


NLTK

NLTK stands for Natural Language Toolkit it is one of the most popular libraries for NLP and provides support for many tasks like classification, tokenization, stemming, tagging, parsing, and semantic reasoning. Some of the modules in the NLTK library are discussed in the below article

It is one of the oldest libraries developed in 2001 by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. Although it is old but it's one of the most efficient libraries for NLP.

Alt Text

Link: https://www.nltk.org/


spaCy

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. Unlike NLTK, which is widely used for teaching and research, spaCy focuses on providing software for production usage. It was developed in 2015 by Matthew Honnibal and Ines Montani. It can be used for various tasks like Tokenization, Part-of-speech (POS) Tagging, Dependency Parsing, Lemmatization, Named Entity Recognition (NER), Entity Linking, and finding similarity etc. To know more about spaCY and how to use it visit this link.

Alt Text

Link: https://spacy.io/


Gensim

Gensim is an open-source python library for topic modelling in NLP. Gensim includes streamed parallelized implementations of fastText, word2vec and doc2vec algorithms, as well as latent semantic analysis (LSA, LSI, SVD), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), tf-idf and random projections. These are some of the topic modelling algorithms in NLP. It is used to represent texts as semantic vectors and find similarity and semantically related documents. It was developed in 2009 by Radim Řehůřek. The core concepts of gensim are:

Document: some text.
Corpus: a collection of documents.
Vector: a mathematically convenient representation of a document.
Model: an algorithm for transforming vectors from one representation to another.

Alt Text

Link: https://github.com/RaRe-Technologies/gensim


Transformers(Hugging Face)

It is one of the most famous and popular libraries of recent time. It is based on deep learning models and the most important thing is it is built basically on PyTorch and has support for Tensorflow models as well. It contains models for all new language models like BERT, RoBERTa, GPT, T5 which can be used for advanced NLP tasks like question-answering, text summarization, machine translation, sentiment analysis, etc. Some of the tasks are given in the below post.


It is named transformers because all models are based on Transformers architecture which is described in the paper "Attention is all you need."

Alt Text

Link: https://github.com/huggingface/transformers


Flair

Flair allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation and classification, with support for a rapidly growing number of languages. It is built on PyTorch and is a deep learning based library. It is developed by Alan Akbik in the year 2018.

Alt Text

Link: https://github.com/flairNLP/flair


TextBlob

TextBlob is an open-source Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and classification. NLTK library has been the base for building this library. It is used for trivial NLP tasks and is good for beginners.

Alt Text

Link: https://textblob.readthedocs.io/en/dev/


Hope you all liked this list of awesome NLP libraries and try some of them in 2021.

Discussion (1)

Collapse
gilbishkosma profile image
Gilbish

my friend Avikki has written a nice article on hugging face.
Give it a read:
dev.to/avikki/tutorial-state-of-th...