DEV Community

Keita Onabuta
Keita Onabuta

Posted on

Microsoft NLP Recipes

GitHub logo microsoft / nlp-recipes

Natural Language Processing Best Practices & Examples

NLP Best Practices

In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.

This repository contains examples and best practices for building NLP systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.

Overview

The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems The content is based on…

Microsoft が公開している NLP のベストプラクティス集をご紹介します。

NLP Recipes とは?

Microsoft の Data Scientist が主導して開発している自然言語処理(NLP) のベストプラクティスです。

コンテンツ

2020年3月28日時点で公開されているコンテンツは下記になります。

Scenario Models Description Languages
Text Classification BERT, XLNet, RoBERTa Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. English, Hindi, Arabic
Named Entity Recognition BERT Named entity recognition (NER) is the task of classifying words or key phrases of a text into predefined entities of interest. English
Text Summarization BERTSum Text summarization is a language generation task of summarizing the input text into a shorter paragraph of text. English
Entailment BERT, XLNet, RoBERTa Textual entailment is the task of classifying the binary relation between two natural-language texts, text and hypothesis, to determine if the text agrees with the hypothesis or not. English
Question Answering BiDAF, BERT, XLNet Question answering (QA) is the task of retrieving or generating a valid answer for a given query in natural language, provided with a passage related to the query. English
Sentence Similarity BERT, GenSen Sentence similarity is the process of computing a similarity score given a pair of text documents. English
Embeddings Word2Vec
fastText
GloVe
Embedding is the process of converting a word or a piece of text to a continuous vector space of real number, usually, in low dimension. English
Sentiment Analysis Dependency Parser
GloVe
Provides an example of train and use Aspect Based Sentiment Analysis with Azure ML and Intel NLP Architect . English

特徴

  • HuggingFace のモデルがベース
  • 少ないコーディング量で NLP モデルの開発が可能
  • 現状メインは英語のみ対応 (テキスト分類の日本語バージョンを後日公開します!)

Top comments (2)

Collapse
 
nbanitama profile image
Novandi Banitama

Not in english??

Collapse
 
konabuta profile image
Keita Onabuta

Sorry it's Japanese.