DEV Community

Cover image for GETTING STARTED WITH NATURAL LANGUAGE PROCESSING
Huynh-Chinh
Huynh-Chinh

Posted on • Edited on

1 1

GETTING STARTED WITH NATURAL LANGUAGE PROCESSING

Introduction

Natural language processing (NLP) is concerned with enabling computers to interpret, analyze, and approximate the generation of human speech. Typically, this would refer to tasks such as generating responses to questions, translating languages, identifying languages, summarizing documents, understanding the sentiment of text, spell checking, speech recognition, and many other tasks. The field is at the intersection of linguistics, AI, and computer science.

Roadmap of NLP for Machine Learning

1. Pre-processing

  • Sentence cleaning
  • Stop Words
  • Regular Expression
  • Tokenization
  • N-grams (Unigram, Bigram, Trigram)
  • Text Normalization
  • Stemming
  • Lemmatization

read more...

2. Linguistics

  • Part-of-Speech Tags
  • Constituency Parsing
  • Dependency Parsing
  • Syntactic Parsing
  • Semantic Analysis
  • Lexical Semantics
  • Coreference Resolution
  • Chunking
  • Entity Extraction/ Named Entity Recognition(NER)
  • Named Entity Disambiguation/ Entity Linking
  • Knowledge Graphs

3. Word Embeddings

a. Frequency-based Word Embedding

  • One Hot Encoding
  • Bag of Words or CountVectorizer()
  • TFIDF of TFIDFVectorizer()
  • Co-occurrence Matrix, Co-occurrence Vector
  • Hashing Vectorizer

b. Pretrained Word Embedding

  • Word2Vec (by Google): CBOW, Skip-Gram
  • GloVe (by Stanford)
  • fastText (by Facebook)

4. Topic Modeling

  • Latent Semantic Analysis (LSA)
  • Probabilistic Latent Semantic Analysis (pLSA)
  • Latent Dirichlet Allocation (LDA)
  • lda2Vec
  • Non-Negative Matrix Factorization (NMF)

5. NLP with Deep Learning

  • Machine Learning (Logistic Regression, SVM, Naïve Bayes)
  • Embedding Layer
  • Artificial Neural Network
  • Deep Neural Network
  • Convolution Neural Network
  • RNN/LSTM/GRU
  • Bi-RNN/Bi-LSTM/Bi-GRU
  • Pretrained Language Models: ELMo, ULMFiT
  • Sequence-to-Sequence/Encoder-Decoder
  • Transformers (attention mechanism)
  • Encoder-only Transformers: BERT
  • Decoder-only Transformers: GPT
  • Transfer Learning

6. Example Use cases

  • Sentiment Analysis
  • Question Answering
  • Language Translation
  • Text/Intent Classification
  • Text Summarization
  • Text Similarity
  • Text Clustering
  • Text Generation
  • Chatbots (DialogFlow, RASA, Self-made Bots)

7. Libraries

  • NLTK
  • Spacy
  • Gensim

Conclusion
Thank you very much for taking time to read this. I would really appreciate any comment in the comment section.
Enjoy🎉

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay