<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jakub Czakon</title>
    <description>The latest articles on DEV Community by Jakub Czakon (@jakubczakon).</description>
    <link>https://dev.to/jakubczakon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F297472%2F4e616e0f-0635-471f-a20b-f2afb03701de.jpeg</url>
      <title>DEV Community: Jakub Czakon</title>
      <link>https://dev.to/jakubczakon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jakubczakon"/>
    <language>en</language>
    <item>
      <title>Text Classification: All Tips and Tricks from 5 Kaggle Competitions
</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Fri, 29 May 2020 08:45:26 +0000</pubDate>
      <link>https://dev.to/jakubczakon/text-classification-all-tips-and-tricks-from-5-kaggle-competitions-34fh</link>
      <guid>https://dev.to/jakubczakon/text-classification-all-tips-and-tricks-from-5-kaggle-competitions-34fh</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally posted by &lt;a href="https://www.linkedin.com/in/shahules/?originalSubdomain=in"&gt;Shahul ES&lt;/a&gt; on &lt;a href="https://neptune.ai/blog/text-classification-tips-and-tricks-kaggle-competitions?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-text-classification-tips-and-tricks-kaggle-competitions"&gt;Neptune blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;In this article, I will discuss some great tips and tricks to improve the performance of your text classification model. These tricks are obtained from solutions of some of Kaggle’s top NLP competitions.&lt;/p&gt;

&lt;p&gt;Namely, I’ve gone through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification"&gt;Jigsaw Unintended Bias in Toxicity Classification&lt;/a&gt; – $65,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/"&gt;Toxic Comment Classification Challenge&lt;/a&gt; – $35,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/quora-insincere-questions-classification"&gt;Quora Insincere Questions Classification&lt;/a&gt; – $25,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/google-quest-challenge"&gt;Google QUEST Q&amp;amp;A Labeling&lt;/a&gt; – $25,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/tensorflow2-question-answering"&gt;TensorFlow 2.0 Question Answering&lt;/a&gt; – $50,000
and found a ton of great ideas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without much lag, let’s begin.&lt;/p&gt;

&lt;h1&gt;
  
  
  Dealing with larger datasets
&lt;/h1&gt;

&lt;p&gt;One issue you might face in any machine learning competition is the size of your data set. If the size of your data is large, that is 3GB + for Kaggle kernels and more basic laptops you could find it difficult to load and process with limited resources. Here is the link to some of the articles and kernels that I have found useful in such situations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimize the memory by &lt;a href="https://www.kaggle.com/shrutimechlearn/large-data-loading-trick-with-ms-malware-data"&gt;reducing the size of some attributes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use open-source libraries such as &lt;a href="https://www.kaggle.com/yuliagm/how-to-work-with-big-datasets-on-16g-ram-dask"&gt;Dask to readand manipulate the data&lt;/a&gt; , it performs parallel computing and saves up memory space&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/rapidsai/cudf"&gt;cudf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Convert data to &lt;a href="https://arrow.apache.org/docs/python/parquet.html"&gt;parquet&lt;/a&gt; format&lt;/li&gt;
&lt;li&gt;Convert data to &lt;a href="https://medium.com/@snehotosh.banerjee/feather-a-fast-on-disk-format-for-r-and-python-data-frames-de33d0516b03"&gt;feather&lt;/a&gt; format&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Small datasets and external data
&lt;/h1&gt;

&lt;p&gt;But, what can one do if the dataset is small? Let’s see some techniques to tackle this situation.&lt;/p&gt;

&lt;p&gt;One way to increase the performance of any machine learning model is to use some external data frame that contains some variables that influence the predicate variable.&lt;/p&gt;

&lt;p&gt;Let’s see some of the external datasets.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use of &lt;a href="https://rajpurkar.github.io/SQuAD-explorer/"&gt;squad&lt;/a&gt; data for Question Answering tasks&lt;/li&gt;
&lt;li&gt;Other &lt;a href="http://nlpprogress.com/english/question_answering.html"&gt;datasets&lt;/a&gt; for QA tasks&lt;/li&gt;
&lt;li&gt;Wikitext long term dependency language modeling &lt;a href="https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/"&gt;dataset&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://archive.org/download/stackexchange"&gt;Stackexchange data&lt;/a&gt;
Prepare a dictionary of commonly misspelled words and corrected words.&lt;/li&gt;
&lt;li&gt;Use of &lt;a href="https://www.kaggle.com/kyakovlev/jigsaw-general-helper-public"&gt;helper datasets&lt;/a&gt; for cleaning&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/cdeotte/pseudo-labeling-qda-0-969/"&gt;Pseudo labeling&lt;/a&gt; is the process of adding confidently predicted test data to your training data&lt;/li&gt;
&lt;li&gt;Use different data &lt;a href="https://www.kaggle.com/shahules/tackling-class-imbalance"&gt;sampling methods&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Text augmentation by &lt;a href="https://arxiv.org/pdf/1502.01710.pdf"&gt;Exchanging words with synonyms&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Text augmentation by &lt;a href="https://arxiv.org/pdf/1703.02573.pdf"&gt;noising in RNN&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Text augmentation by &lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/48038"&gt;translation to other languages and back&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Data Exploration and Gaining insights
&lt;/h1&gt;

&lt;p&gt;Data exploration always helps to better understand the data and gain insights from it. Before starting to develop machine learning models, top competitors always read/do a lot of exploratory data analysis for the data. This helps in feature engineering and cleaning of the data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Twitter data &lt;a href="https://www.kaggle.com/jagangupta/stop-the-s-toxic-comments-eda"&gt;exploration methods&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Simple &lt;a href="https://www.kaggle.com/nz0722/simple-eda-text-preprocessing-jigsaw"&gt;EDA for tweets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/tunguz/just-some-simple-eda"&gt;EDA&lt;/a&gt; for Quora data&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/kailex/r-eda-for-q-gru"&gt;EDA&lt;/a&gt; in  R for Quora data&lt;/li&gt;
&lt;li&gt;Complete &lt;a href="https://www.kaggle.com/codename007/start-from-here-quest-complete-eda-fe"&gt;EDA&lt;/a&gt; with stack exchange data&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;My previous article on &lt;a href="https://neptune.ai/blog/exploratory-data-analysis-natural-language-processing-tools?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-text-classification-tips-and-tricks-kaggle-competitions"&gt;EDA for natural language processing&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Data Cleaning
&lt;/h1&gt;

&lt;p&gt;Data cleaning is one of the important and integral parts of any NLP problem. Text data always needs some preprocessing and cleaning before we can represent it in a suitable form.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use this &lt;a href="https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing"&gt;notebook&lt;/a&gt; to clean social media data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.kaggle.com/kyakovlev/preprocessing-bert-public"&gt;Data cleaning&lt;/a&gt; for BERT&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use &lt;a href="https://textblob.readthedocs.io/en/dev/quickstart.html"&gt;textblob&lt;/a&gt; to correct misspellings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.kaggle.com/theoviel/improve-your-score-with-some-text-preprocessing"&gt;Cleaning&lt;/a&gt; for pre-trained embeddings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.pythonprogramming.in/language-detection-and-translation-using-textblob.html"&gt;Language detection and translation&lt;/a&gt; for multilingual tasks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Preprocessing for Glove &lt;a href="https://www.kaggle.com/christofhenkel/how-to-preprocessing-for-glove-part1-eda"&gt;part 1&lt;/a&gt; and &lt;a href="https://www.kaggle.com/christofhenkel/how-to-preprocessing-for-glove-part2-usage"&gt;part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.kaggle.com/sunnymarkliu/more-text-cleaning-to-increase-word-coverage"&gt;Increasing word coverage&lt;/a&gt; to get more from pre-trained word embeddings&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Text Representations
&lt;/h1&gt;

&lt;p&gt;Before we feed our text data to the Neural network or ML model, the text input needs to be represented in a suitable format. These representations determine the performance of the model to a large extent.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pretrained &lt;a href="https://nlp.stanford.edu/projects/glove/"&gt;Glove&lt;/a&gt; vectors&lt;/li&gt;
&lt;li&gt;Pretrained &lt;a href="https://fasttext.cc/docs/en/english-vectors.html"&gt;fasttext&lt;/a&gt; vectors&lt;/li&gt;
&lt;li&gt;Pretrained &lt;a href="https://radimrehurek.com/gensim/models/word2vec.html"&gt;word2vec&lt;/a&gt; vectors&lt;/li&gt;
&lt;li&gt;My previous article on these &lt;a href="https://neptune.ai/blog/document-classification-small-datasets?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-text-classification-tips-and-tricks-kaggle-competitions"&gt;3 embedding&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Combining &lt;a href="https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/71778/"&gt;pre-trained vectors&lt;/a&gt;. This can help in better representation of text and decreasing OOV words&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cogcomp.seas.upenn.edu/page/resource_view/106"&gt;Paragram&lt;/a&gt; embeddings&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tfhub.dev/google/universal-sentence-encoder/1"&gt;Universal Sentence Encoder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use USE to generate &lt;a href="https://www.kaggle.com/abhishek/distilbert-use-features-oof"&gt;sentence-level features&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;3 methods to &lt;a href="https://dev.to/gosia67316552/text-classification-all-tips-and-tricks-from-5-kaggle-competitions-1iig-temp-slug-882302/edit"&gt;combine embedding&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Contextual embeddings models&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/google-research/bert"&gt;BERT&lt;/a&gt; Bidirectional Encoder Representations from Transformers&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/finetune-transformer-lm"&gt;GPT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pytorch/fairseq/tree/master/examples/roberta"&gt;Roberta&lt;/a&gt; a Robustly Optimized BERT&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/google-research/ALBERT"&gt;Albert&lt;/a&gt; a Lite BERT for Self-supervised Learning of Language Representations&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/huggingface/transformers/tree/master/examples/distillation"&gt;Distilbert&lt;/a&gt; a lighter version of BERT&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/zihangdai/xlnet/"&gt;XLNET&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Modeling
&lt;/h1&gt;

&lt;p&gt;Model architecture&lt;/p&gt;

&lt;p&gt;Choosing the right architecture is important to develop a proper machine learning model, sequence to sequence models like LSTMs, GRUs perform well in NLP problems and is always worth trying. Stacking 2 layers of LSTM/GRU networks is a common approach.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52644/"&gt;Stacking Bidirectional CuDNNLSTM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/sakami/google-quest-single-lstm/"&gt;Stacking LSTM networks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/christofhenkel/keras-baseline-lstm-attention-5-fold/"&gt;LSTM and 5 fold Attention&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/80568/"&gt;Bidirectional LSTM with 1D convolutions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/80542/"&gt;Unfreeze and tune embedding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/wowfattie/3rd-place"&gt;BiLSTM with Global maxpooling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/80495"&gt;Attention weighted average&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/gmhost/gru-capsule"&gt;GRU+ Capsule network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/christofhenkel/inceptioncnn-with-flip"&gt;InceptionCNN with flip&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/yuval6967/toxic-bert-plain-vanila"&gt;Plain vanilla network with BERT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/taindow/simple-cudnngru-python-keras"&gt;CuDNNGRU network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52719"&gt;TextCNN with pooling layers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/christofhenkel/bert-embeddings-lstm"&gt;BERT embeddings with LSTM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1905.09788"&gt;Multi-sample dropouts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/google-quest-challenge/discussion/129978"&gt;Siamese transformer network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/akensert/bert-base-tf2-0-now-huggingface-transformer"&gt;Global Average pooling of hidden layers BERT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/92867"&gt;Different Bert based models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Distilling BERT — &lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/97135"&gt;BERT performance using Logistic Regression&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/nvidia-ai/a-guide-to-optimizer-implementation-for-bert-at-scale-8338cc7f45fd"&gt;Different learning rates among the layers of BERT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1905.05583"&gt;Finetuning Bert for text classification&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Loss functions
&lt;/h1&gt;

&lt;p&gt;Choosing a proper loss function for your NN model really enhances the performance of your model by allowing it to optimize well on the surface.&lt;/p&gt;

&lt;p&gt;You can try different loss functions or even write a custom loss function that matches your problem. Some of the popular loss functions are&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html"&gt;Binary cross-entropy&lt;/a&gt; for binary classification&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/categorical-crossentropy"&gt;Categorical cross-entropy&lt;/a&gt; for multi-class classification&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://leimao.github.io/blog/Focal-Loss-Explained/"&gt;Focal loss&lt;/a&gt; used for unbalanced datasets&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/andrijdavid/FocalLoss/blob/master/focalloss.py"&gt;Weighted focal loss&lt;/a&gt; for multilabel classification&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits"&gt;Weighted kappa&lt;/a&gt; for multiclass classification&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits"&gt;BCE with logit loss&lt;/a&gt; to get sigmoid cross-entropy&lt;/li&gt;
&lt;li&gt;Custom &lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/103280"&gt;mimic loss&lt;/a&gt; used in Jigsaw unintended bias classification competition&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/101630"&gt;MTL custom loss&lt;/a&gt; used in &lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview"&gt;jigsaw unintended&lt;/a&gt; bias classification competition&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Optimizers
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31"&gt;Stochastic gradient descent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/understanding-rmsprop-faster-neural-network-learning-62e116fcf29a"&gt;RMSprop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/konvergen/an-introduction-to-adagrad-f130ae871827"&gt;Adagrad&lt;/a&gt; allows the learning rate to adapt based on parameters&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/"&gt;Adam&lt;/a&gt; for fast and easy convergence&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/httpwwwfszyc/bert-keras-with-warmup-and-excluding-wd-parameters/"&gt;Adam with warmup&lt;/a&gt; to enable warmup state to Adam algorithm&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/transformers/migration.html#optimizers-bertadam-openaiadam-are-now-adamw-schedules-are-standard-pytorch-schedules"&gt;Bert Adam&lt;/a&gt; for Bert based models&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/pdf/1908.03265.pdf"&gt;Rectified Adam&lt;/a&gt; for stabilizing training and accelerating convergence&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Callback methods
&lt;/h1&gt;

&lt;p&gt;Callbacks are always useful to monitor the performance of your model while training and trigger some necessary actions that can enhance the performance of your model.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint"&gt;Model checkpoint&lt;/a&gt; for monitoring and saving weights&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler"&gt;Learning rate scheduler&lt;/a&gt; to change the learning rate based on model performance to help converge easily&lt;/li&gt;
&lt;li&gt;Simple custom callbacks using &lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback"&gt;lambda callbacks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/check-point-deep-learning-models-keras/"&gt;Custom Checkpointing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Building your &lt;a href="https://keunwoochoi.wordpress.com/2016/07/16/keras-callbacks/"&gt;custom callbacks&lt;/a&gt; for various use cases&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau"&gt;Reduce on plateau&lt;/a&gt; to reduce the learning rate when a metric has stopped improving&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping"&gt;Early Stopping&lt;/a&gt; to stop training when the model stops improving&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/snapshot-ensemble-deep-learning-neural-network/"&gt;Snapshot ensembling&lt;/a&gt; to get a variety of model checkpoints in one training&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1802.10026"&gt;Fast geometric ensembling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/google-quest-challenge/discussion/119371"&gt;Stochastic Weight Averaging (SWA)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1905.05583"&gt;Dynamic learning rate decay&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Evaluation and cross-validation
&lt;/h1&gt;

&lt;p&gt;Choosing a suitable validation strategy is very important to avoid huge shake-ups or poor performance of the model in the private test set.&lt;/p&gt;

&lt;p&gt;The traditional 80:20 split wouldn’t work for many cases. Cross-validation works in most cases over the traditional single train-validation split to estimate the model performance.&lt;/p&gt;

&lt;p&gt;There are different variations of KFold cross-validation such as group k-fold that should be chosen accordingly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/k-fold-cross-validation/"&gt;K-fold cross-validation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html"&gt;Stratified KFold cross-validation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html"&gt;Group KFold&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/konradb/adversarial-validation-and-other-scary-terms"&gt;Adversarial validation&lt;/a&gt; to check if train and test distributions are similar or not&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/ratthachat/quest-cv-analysis-on-different-splitting-methods/"&gt;CV analysis of different strategies&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Runtime tricks
&lt;/h1&gt;

&lt;p&gt;You can perform some tricks to decrease the runtime and also improve model performance at the runtime.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/bminixhofer/speed-up-your-rnn-with-sequence-bucketing/"&gt;Sequence bucketing&lt;/a&gt; to save runtime and improve performance&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/97443"&gt;Get sentences from its head and tail&lt;/a&gt; when the input sentence is larger than 512 tokens&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/89498"&gt;Use the GPU efficiently&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/96876"&gt;Free keras memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/save-load-keras-deep-learning-models/"&gt;Save and load models&lt;/a&gt; to save runtime and memory&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/93230"&gt;Don’t Save Embedding in RNN Solutions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Load &lt;a href="https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/77968"&gt;word2vec&lt;/a&gt; vectors without key vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Model ensembling
&lt;/h1&gt;

&lt;p&gt;If you’re in the competing environment one won’t get to the top of the leaderboard without ensembling. Selecting the appropriate ensembling/stacking method is very important to get the maximum performance out of your models.&lt;/p&gt;

&lt;p&gt;Let’s see some of the popular ensembling techniques used in Kaggle competitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/weighted-average-ensemble-for-deep-learning-neural-networks/"&gt;Weighted average ensemble&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/"&gt;Stacked generalization ensemble&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52224"&gt;Out of folds predictions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/suicaokhoailang/blending-with-linear-regression-0-688-lb"&gt;Blending with linear regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/optuna/optuna"&gt;optuna&lt;/a&gt; to determine blending weights&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/data-design/reaching-the-depths-of-power-geometric-ensembling-when-targeting-the-auc-metric-2f356ea3250e"&gt;Power average ensemble&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/100661"&gt;Power 3.5 blending strategy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final thoughts
&lt;/h1&gt;

&lt;p&gt;In this article, you saw many popular and effective ways to improve the performance of your NLP classification model. Hopefully, you will find them useful in your projects.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally &lt;a href="https://neptune.ai/blog/text-classification-tips-and-tricks-kaggle-competitions?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-text-classification-tips-and-tricks-kaggle-competitions"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt; &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>6 GAN Architectures You Really Should Know</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Mon, 25 May 2020 14:45:22 +0000</pubDate>
      <link>https://dev.to/jakubczakon/6-gan-architectures-you-really-should-know-5h4b</link>
      <guid>https://dev.to/jakubczakon/6-gan-architectures-you-really-should-know-5h4b</guid>
      <description>

&lt;p&gt;&lt;em&gt;This article was originally posted by &lt;a href="https://www.linkedin.com/in/shibsankardas/?originalSubdomain=in"&gt;Shibsankar Das&lt;/a&gt; on the &lt;a href="https://neptune.ai/blog/6-gan-architectures?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-6-gan-architectures"&gt;Neptune blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Generative Adversarial Networks (GANs) were first introduced in 2014 by Ian Goodfellow et. al. and since then this topic itself opened up a new area of research.&lt;/p&gt;

&lt;p&gt;Within a few years, the research community came up with plenty of papers on this topic some of which have very interesting names :). You have CycleGAN, followed by BiCycleGAN, followed by ReCycleGAN and so on.&lt;/p&gt;

&lt;p&gt;With the invention of GANs, Generative Models had started showing promising results in generating realistic images. GANs has shown tremendous success in Computer Vision. In recent times, it started showing promising results in Audio, Text as well.&lt;/p&gt;

&lt;p&gt;Some of the most popular GAN formulations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transforming an image from one domain to another(CycleGAN),&lt;/li&gt;
&lt;li&gt;Generating an image from a textual description (text-to-image),&lt;/li&gt;
&lt;li&gt;Generating very high-resolution images (ProgressiveGAN) and many more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we will talk about some of the most popular GAN architectures, particularly &lt;strong&gt;6 architectures that you should know&lt;/strong&gt; to have a diverse coverage on Generative Adversarial Networks (GANs).&lt;/p&gt;

&lt;p&gt;Namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CycleGAN&lt;/li&gt;
&lt;li&gt;StyleGAN&lt;/li&gt;
&lt;li&gt;pixelRNN&lt;/li&gt;
&lt;li&gt;text-2-image&lt;/li&gt;
&lt;li&gt;DiscoGAN&lt;/li&gt;
&lt;li&gt;lsGAN&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  GAN 101 and Vanilla GAN
&lt;/h1&gt;

&lt;p&gt;There are 2 kinds of models in the context of Supervised Learning, Generative and Discriminative Models. Discriminative Models are primarily used to solve the Classification task where the model usually learns a decision boundary to predict which class a data point belongs to. On the other side, Generative Models are primarily used to generate synthetic data points that follow the same probability distribution as training data distribution. Our topic od discussion, &lt;strong&gt;Generative Adversarial Networks(GANs) is an example of the Generative Model.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The primary objective of the Generative Model is to learn the unknown probability distribution of the population from which the training observations are sampled from. Once the model is successfully trained, you can sample new, “generated” observations that follow the training distribution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let’s discuss the core concepts of GAN formulation.&lt;/p&gt;

&lt;p&gt;GAN comprises of two independent networks, a Generator, and a Discriminator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Generator generates synthetic samples given a random noise [sampled from a latent space] and the Discriminator is a binary classifier that discriminates between whether the input sample is real [output a scalar value 1] or fake [output a scalar value 0].&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Samples generated by the Generator is termed as a fake sample. As you see in Fig1 and Fig2 that when a data point from the training dataset is given as input to the Discriminator, it calls it out as a Real sample whereas it calls out the other data point as fake when it’s generated by the Generator.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yi0XPfIv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/fig.1-Generator-and-Discriminator.png%3Fw%3D556%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yi0XPfIv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/fig.1-Generator-and-Discriminator.png%3Fw%3D556%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fig1: Generator and Discriminator as GAN building blocks&lt;/p&gt;

&lt;p&gt;The beauty of this formulation is the adversarial nature between the Generator and the Discriminator.&lt;/p&gt;

&lt;p&gt;The Discriminator wants to do its job in the best possible way. &lt;strong&gt;When a fake sample [which are generated by the Generator] is given to the Discriminator, it wants to call it out as fake but the Generator wants to generate samples in a way so that the Discriminator makes a mistake in calling it out as a real one. In some sense, the Generator is trying to fool the Discriminator.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7rzqIX5S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/fig2-Generator-and-Discriminator.png%3Fw%3D556%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7rzqIX5S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/fig2-Generator-and-Discriminator.png%3Fw%3D556%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fig2: Generator and Discriminator as GAN building blocks&lt;/p&gt;

&lt;p&gt;Let us have a quick look at the objective function and how does the optimization is done. &lt;strong&gt;It’s a min-max optimization formulation where the Generator wants to minimize the objective function whereas the Discriminator wants to maximize the same objective function.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Fig3 depicts the objective function being optimized. The Discriminator function is termed as D and the Generator function is termed as G. Pz is the probability distribution of the latent space which is usually a random Gaussian distribution. Pdata is the probability distribution of the training dataset. When x is sampled from Pdata , the Discriminator wants to classify it as a real sample. G(z) is a generated sample when G(z) is given as input to the Discriminator, it wants to classify it as a fake one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XahvgqHw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/fig3-Objective-function.png%3Fw%3D620%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XahvgqHw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/fig3-Objective-function.png%3Fw%3D620%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fig3: Objective function in GAN formulation&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Discriminator wants to drive the likelihood of D(G(z)) to 0. Hence it wants to maximize (1-D(G(z))) whereas the Generator wants to force the likelihood of D(G(z)) to 1 so that Discriminator makes a mistake in calling out a generated sample as real. Hence Generator wants to minimize (1-D(G(z)).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9gEILB6r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/fig4.-Objective-function-in-GAN-formulation.png%3Fw%3D636%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9gEILB6r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/fig4.-Objective-function-in-GAN-formulation.png%3Fw%3D636%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fig4: Objective function in GAN formulation&lt;/p&gt;

&lt;h1&gt;
  
  
  CycleGAN:
&lt;/h1&gt;

&lt;p&gt;CycleGAN is a very popular GAN architecture primarily being used to learn transformation between images of different styles.&lt;/p&gt;

&lt;p&gt;As an example, this kind of formulation can learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a map between artistic and realistic images,&lt;/li&gt;
&lt;li&gt;a transformation between images of horse and zebra,&lt;/li&gt;
&lt;li&gt;a transformation between winter image and summer image&lt;/li&gt;
&lt;li&gt;and so on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;FaceApp is one of the most popular examples of CycleGAN where human faces are transformed into different age groups. &lt;/p&gt;

&lt;p&gt;As an example, let’s say X is a set of images of horse and Y is a set of images of zebra.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The goal is to learn a mapping function G: X-&amp;gt; Y such that images generated by G(X) are indistinguishable from the image of Y. This objective is achieved using an Adversarial loss. This formulation not only learns G, but it also learns an inverse mapping function F: Y-&amp;gt;X and use cycle-consistency loss to enforce F(G(X)) = X and vice versa.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While training, 2 kinds of training observations are given as input.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One set of observations have paired images {Xi, Yi} for i where each Xi has it’s Yi counterpart.&lt;/li&gt;
&lt;li&gt;The other set of observations has a set of images from X and another set of images from Y without any match between Xi and Yi.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pukSaVIh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/fig5-training-procedure-for-cyclegan.png%3Fw%3D596%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pukSaVIh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/fig5-training-procedure-for-cyclegan.png%3Fw%3D596%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fig5: The training procedure for CycleGAN.&lt;/p&gt;

&lt;p&gt;As I have mentioned earlier there are 2 kinds of functions being learned, one of them is G which transforms X to Y and the other one is F which transforms Y to X and it comprises two individual GAN models. So, you will find 2 Discriminator function Dx, Dy.&lt;/p&gt;

&lt;p&gt;As part of Adversarial formulation, there is one Discriminator Dx that classifies whether the transformed Y is indistinguishable from Y. Similarly, there is one more Discriminator Dy that classifies whether is indistinguishable from X.&lt;/p&gt;

&lt;p&gt;Along with Adversarial Loss, CycleGAN uses cycle-consistency loss to enable training without paired images and this additional loss help the model to minimize reconstruction loss F(G(x)) ≈ X and G(F(Y)) ≈ Y&lt;/p&gt;

&lt;p&gt;So, All-in-all CycleGAN formulation comprises of 3 individual loss as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ElYB4fC1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/CycleGAN-formulation.png%3Fw%3D624%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ElYB4fC1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/CycleGAN-formulation.png%3Fw%3D624%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and as part of optimization, the following loss function is optimized.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TYoQD0-D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Optimized-loss-function-CycleGan.png%3Fw%3D380%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TYoQD0-D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Optimized-loss-function-CycleGan.png%3Fw%3D380%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s take a look at some of the results from CycleGAN. As you see, the model has learned a transformation to convert an image of a zebra to a horse, a summer time image to the winter counterpart and vice-versa.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JF7zcYsz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/Results-from-CycleGan.png%3Fw%3D601%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JF7zcYsz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/Results-from-CycleGan.png%3Fw%3D601%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following is a code snippet on the different loss functions. Please refer to the following reference for complete code flow.&lt;/p&gt;

&lt;p&gt;CycleGAN&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generator G translates X -&amp;gt; Y
# Generator F translates Y -&amp;gt; X.
&lt;/span&gt;&lt;span class="n"&gt;fake_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cycled_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fake_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fake_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cycled_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fake_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# same_x and same_y are used for identity loss.
&lt;/span&gt;&lt;span class="n"&gt;same_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;same_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;disc_real_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;discriminator_x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;disc_real_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;discriminator_y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;disc_fake_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;discriminator_x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fake_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;disc_fake_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;discriminator_y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fake_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# calculate the loss
&lt;/span&gt;&lt;span class="n"&gt;gen_g_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;disc_fake_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;gen_f_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;disc_fake_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;total_cycle_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;calc_cycle_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cycled_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; \
                   &lt;span class="n"&gt;calc_cycle_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;cycled_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Total generator loss = adversarial loss + cycle loss
&lt;/span&gt;&lt;span class="n"&gt;total_gen_g_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gen_g_loss&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;total_cycle_loss&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;identity_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;same_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;total_gen_f_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gen_f_loss&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;total_cycle_loss&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;identity_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;real_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;same_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;disc_x_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;discriminator_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;disc_real_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;disc_fake_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;disc_y_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;discriminator_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;disc_real_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;disc_fake_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Following is an example where an image of horse has been transformed into an image that looks like a zebra.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GNdUnolr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/input-and-predicted-image.png%3Fw%3D601%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GNdUnolr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/input-and-predicted-image.png%3Fw%3D601%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/1703.10593"&gt;Research Paper:&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tensorflow has a well-documented tutorial on CycleGAN. Please refer to the following URL as reference&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tensorflow.org/tutorials/generative/cyclegan"&gt;https://www.tensorflow.org/tutorials/generative/cyclegan&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;# StyleGAN:&lt;/p&gt;

&lt;p&gt;Can you guess which image (from the following 2 images) is real and which one is generated by GAN?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EmEthffy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/styleGan-images-example.png%3Fw%3D588%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EmEthffy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/styleGan-images-example.png%3Fw%3D588%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fact is that both the images are imagined by a GAN formulation called StyleGAN.&lt;/p&gt;

&lt;p&gt;StyleGAN is a GAN formulation which is capable of generating very high-resolution images even of 1024*1024 resolution. &lt;strong&gt;The idea is to build a stack of layers where initial layers are capable of generating low-resolution images (starting from 2*2) and further layers gradually increase the resolution.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The easiest way for GAN to generate high-resolution images is to remember images from the training dataset and while generating new images it can add random noise to an existing image. In reality, StyleGAN doesn’t do that rather it learn features regarding human face and generates a new image of the human face that doesn’t exist in reality. If this sounds interesting, visit &lt;a href="https://thispersondoesnotexist.com/"&gt;https://thispersondoesnotexist.com/&lt;/a&gt; Each visit to this URL will generate a new image of a human face who doesn’t exist in the universe.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--miukr2fI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/StyleGan-architecture.png%3Fresize%3D263%252C300%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--miukr2fI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/StyleGan-architecture.png%3Fresize%3D263%252C300%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This figure depicts the typical architecture of StyleGAN. The latent space vector z is passed through a mapping transformation comprises of 8 fully connected layers whereas the synthesis network comprises of 18 layers, where each layer produces image from 4 x 4 to 1024 x 1024. The output layer output RGB image through a separate convolution layer. This architecture has 26.2 million parameters and because of this very high number of trainable parameters, this model requires a huge number of training images to build a successful model.&lt;/p&gt;

&lt;p&gt;Each layer is normalized using Adaptive instance normalization (AdaIN) function as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Oq6D0-7L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/AdaIN-function.png%3Fresize%3D300%252C51%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Oq6D0-7L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/AdaIN-function.png%3Fresize%3D300%252C51%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;where each feature map xi is normalized separately, and then scaled and biased using the corresponding scalar components from style y. Thus the dimensionality of y is twice the number of feature maps on that layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Paper:  &lt;a href="https://arxiv.org/pdf/1812.04948.pdf"&gt;https://arxiv.org/pdf/1812.04948.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Github: &lt;a href="https://github.com/NVlabs/stylegan"&gt;https://github.com/NVlabs/stylegan&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  PixelRNN
&lt;/h1&gt;

&lt;p&gt;PixelRNN is an example of the auto-regressive Generative Model.&lt;/p&gt;

&lt;p&gt;In the era of social media, plenty of images are out there. But it’s extremely difficult to learn the distribution of natural images in an unsupervised setting. PixelRNN is capable of modeling the discrete probability distribution of image and predict the pixel of an image in two spatial dimensions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YNTYd4-L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/PixelRNN.png%3Fresize%3D163%252C159%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YNTYd4-L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/PixelRNN.png%3Fresize%3D163%252C159%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We all know that RNNs are powerful in learning conditional distribution, especially  LSTM is good at learning the long-term dependency in a series of pixels. This formulation works in a progressive fashion where the model predicts the next pixel Xi+1 when all pixels X0 to Xi are provided.&lt;/p&gt;

&lt;p&gt;Compared to GANs, Auto-regressive models like PixelRNN learn an explicit data distribution where GANs learn implicit probability distribution. Because of that GAN doesn’t explicitly expose the probability distribution rather allows us to sample observation from the learned probability distribution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rrf-A29s--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/pixelRNN-LSTM.png%3Fresize%3D256%252C219%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rrf-A29s--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/pixelRNN-LSTM.png%3Fresize%3D256%252C219%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Figure depicts the individual residual blocks of pixelRNN. It’s trained up to several depths of layers. The input map to the PixelRNN LSTM layer has 2h features. The input-to-state component reduces the number of features by producing h features per gate. After applying the recurrent layer, the output map is upsampled back to 2h features per position via a 1 × 1 convolution and the input map is added to the output map.&lt;/p&gt;

&lt;p&gt;[Source:&lt;a href="https://arxiv.org/pdf/1601.06759.pdf#page=9&amp;amp;zoom=100,0,0"&gt;https://arxiv.org/pdf/1601.06759.pdf#page=9&amp;amp;zoom=100,0,0&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Paper: &lt;a href="https://arxiv.org/pdf/1601.06759.pdf"&gt;https://arxiv.org/pdf/1601.06759.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Github: &lt;a href="https://github.com/carpedm20/pixel-rnn-tensorflow"&gt;https://github.com/carpedm20/pixel-rnn-tensorflow&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  text-2-image
&lt;/h1&gt;

&lt;p&gt;Generative Adversarial Networks are good at generating random images. As an example, a GAN which was trained on images of cats can generate random images of a cat having two eyes, two ears, whiskers. But the color pattern on the cat could be very random. So, random images are often not useful to solve business use cases. Now, asking GAN to generate an image based on our expectation, is an extremely difficult task.&lt;/p&gt;

&lt;p&gt;In this section, we will talk about a GAN architecture that made significant progress in generating meaningful images based on an explicit textual description. This GAN formulation takes a textual description as input and generates an RGB image that was described in the textual description.&lt;/p&gt;

&lt;p&gt;As an example, given “this flower has a lot of small round pink petals” as input, it will generate an image of a flower having round pink petals.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RzLi2OSC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text-2-image.png%3Fw%3D601%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RzLi2OSC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text-2-image.png%3Fw%3D601%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In this formulation, instead of giving only noise as input to the Generator, the textual description is first transformed into a text embedding, concatenated with noise vector and then given as input to the Generator.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As an example, the textual description has been transformed into a 256-dimensional embedding and concatenated with a 100-dimensional noise vector [which was sampled from a latent space which is usually a random Normal distribution].&lt;/p&gt;

&lt;p&gt;This formulation will help the Generator to generate images that are aligned with the input description instead of generating random images.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MzhtPEgb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text-2-image-Generator.png%3Fw%3D601%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MzhtPEgb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text-2-image-Generator.png%3Fw%3D601%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the Discriminator, instead of having the only image as input, a pair of image and text embedding are sent as input. Output signals are either 0 or 1. Earlier the Discriminator’s responsibility was just to predict whether a given image is real or fake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now, the Discriminator has one more additional responsibility. Along with identifying the given image is read or fake, it also predicts the likelihood of whether the given image and text aligned with each other.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This formulation force the Generator to not only generate images that look real but also to generate images that are aligned with the input textual description.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bDM3IR8d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/text-2-image-Generator-real-images.png%3Fw%3D601%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bDM3IR8d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/text-2-image-Generator-real-images.png%3Fw%3D601%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To fulfill the purpose of the 2-fold responsibility of the Discriminator, during training time, a series of different (image, text) pairs are given as input to the model which are as follows:&lt;/p&gt;

&lt;p&gt;1.Pair of (Real Image, Real Caption) as input and target variable is set to 1&lt;br&gt;
2.Pair of (Wrong Image, Real Caption) as input and target variable is set to 0&lt;br&gt;
3.Pair of (Fake Image, Real Caption) as input and target variable is set to 0 &lt;br&gt;
The pair of Real Image and Real Caption are given so that the model learns whether a given image and text pair are aligned with each other. The wrong Image, Read Caption means the image is not as described in the caption. In this case, the target variable is set to 0 so that the model learns that the given image and caption are not aligned. Here Fake Image means an image generated by the Generator, in this case, the target variable is set to 0 so that the Discriminator model can distinguish between real and fake images.&lt;/p&gt;

&lt;p&gt;The training dataset used for the training has image along with 10 different textual description that describes properties of the image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hpe3CX7z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Image-Generator-textual-description.png%3Fw%3D624%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hpe3CX7z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Image-Generator-textual-description.png%3Fw%3D624%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The followings are some of the results from a trained text-2-image model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hClojlhC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/Image-Generator-textual-description-2.png%3Fw%3D601%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hClojlhC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/Image-Generator-textual-description-2.png%3Fw%3D601%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research Paper:&lt;/strong&gt; &lt;a href="https://arxiv.org/pdf/1605.05396.pdf"&gt;https://arxiv.org/pdf/1605.05396.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Github:&lt;/strong&gt; &lt;a href="https://github.com/paarthneekhara/text-to-image"&gt;https://github.com/paarthneekhara/text-to-image&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  DiscoGAN
&lt;/h1&gt;

&lt;p&gt;In recent times, DiscoGAN became very popular because of its ability to learn cross-domain relations given unsupervised data.&lt;/p&gt;

&lt;p&gt;For humans, cross-domain relations are very natural. Given images of two different domains, a human can figure out how they are related to each other. As an example, in the following figure,  we have images from 2 different domains and just by one glance at these images, we can figure out very easily that they are related by the nature of their exterior color.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HIJBJubv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Different-images-ML.png%3Fw%3D559%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HIJBJubv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Different-images-ML.png%3Fw%3D559%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, building a Machine Learning model to figure out such relation given unpaired images from 2 different domains is an extremely difficult task.&lt;/p&gt;

&lt;p&gt;In recent times, DiscoGAN had shown promising results in learning such a relation across 2 different domains.&lt;/p&gt;

&lt;p&gt;The core concept of DiscoGAN is very much similar to CycleGAN:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both learn 2 individual transformation function, one learns a transformation from domain X to domain Y whereas the other one learns a reverse mapping and both uses reconstruction loss as a measure of how well the original image is reconstructed after twice transformation across domains.&lt;/li&gt;
&lt;li&gt;Both follow the principle that if we transform an image from one domain1 to domain2 and then back to domain1 again then it should match the original image.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The primary difference between DiscoGAN and CycleGAN is that DiscoGAN uses two reconstruction loss, one for both the domain whereas CycleGAN uses single cycle-consistency loss.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n-ZGSkQ_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/DiscoGan-vs-CycleGAN.png%3Fw%3D593%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n-ZGSkQ_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/DiscoGan-vs-CycleGAN.png%3Fw%3D593%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure: (a) Vanilla GAN (b) GAN with reconstruction loss (c) DiscoGAN architecture&lt;/p&gt;

&lt;p&gt;Like CycleGAN, DiscoGAN is also built on the fundamental of reconstruction loss. The idea is that when an image is transformed from one domain to another and then transformed back to the original domain, the generated image should be as close as the original one. In this case, the quantitative difference is considered as the reconstruction loss and during training, the model tries to minimize this loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Or4S7dnt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Gab-and-Gba.png%3Fw%3D624%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Or4S7dnt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Gab-and-Gba.png%3Fw%3D624%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, the model comprises of two GAN networks called GAB  and GBA . In the above figure, the model is trying to learn the cross-domain relation in terms of their direction. After the reconstruction of an image, the direction should be the same as the original one. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research Paper:&lt;/strong&gt; &lt;a href="https://arxiv.org/pdf/1703.05192.pdf"&gt;https://arxiv.org/pdf/1703.05192.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Github:&lt;/strong&gt; &lt;a href="https://github.com/SKTBrain/DiscoGAN"&gt;https://github.com/SKTBrain/DiscoGAN&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  lsGAN
&lt;/h1&gt;

&lt;p&gt;In recent times, Generative Adversarial Networks have demonstrated impressive performance for unsupervised tasks.&lt;/p&gt;

&lt;p&gt;In regular GAN, the discriminator uses cross-entropy loss function which sometimes leads to vanishing gradient problems. &lt;strong&gt;Instead of that lsGAN proposes to use the least-squares loss function for the discriminator.&lt;/strong&gt; This formulation provides a higher quality of images generated by GAN.&lt;/p&gt;

&lt;p&gt;Earlier, in vanilla GAN, we have seen following min-max optimization formulation where the Discriminator is a binary classifier and is using sigmoid cross-entropy loss during optimization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qQoFdRRb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/GAN-formulation.png%3Fw%3D606%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qQoFdRRb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/GAN-formulation.png%3Fw%3D606%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As mentioned earlier, often this formulation causes vanishing gradient problems for data point which are at the correct side of the decision boundary but far away from the dense area. The Least Square formulation addresses this issue and provides more stable learning of the model and generate better images.&lt;/p&gt;

&lt;p&gt;Following is the reformulated optimization formulation for lsGAN where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a is the label for fake sample,&lt;/li&gt;
&lt;li&gt;b is the label for real sample and&lt;/li&gt;
&lt;li&gt;c denotes the value that the Generator wants the Discriminator to believe for a fake sample.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JlCOe_CY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/GAN-formulation-reformulated.png%3Fw%3D606%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JlCOe_CY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/GAN-formulation-reformulated.png%3Fw%3D606%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we have 2 individual loss functions that are being optimized. One is being minimized with respect to the Discriminator and the other one is being minimized with respect to the Generator.&lt;/p&gt;

&lt;p&gt;lsGAN has a huge advantage compared to vanilla GAN. In vanilla GAN, as the Discriminator uses binary cross-entropy loss, the loss for an observation is 0 as long as it’s at the correct side of the decision boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But in the case of lsGAN, the model penalizes an observation if it’s a long way from the decision boundary even if it’s at the correct side of the decision boundary.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This penalization forces the Generator to generate samples towards the decision boundary. Along with that it also removes the problem of vanishing gradient as the far-away point generates more gradients while updating the Generator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research Paper:&lt;/strong&gt; &lt;a href="https://arxiv.org/pdf/1611.04076.pdf"&gt;https://arxiv.org/pdf/1611.04076.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Github:&lt;/strong&gt; &lt;a href="https://github.com/xudonmao/LSGAN"&gt;https://github.com/xudonmao/LSGAN&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Final thoughts
&lt;/h1&gt;

&lt;p&gt;One thing is common in all the GAN architectures we have talked about. Each one of them is built on the principle of adversarial loss and they all have Generator and Discriminator which follows the adversarial nature to fool each other. GANs has shown tremendous success over the last few years and became one of the most popular research topics in machine learning research community. In future, we will see a lot of progress in this domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The following Git repository has consolidated an exclusive list of GAN papers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hindupuravinash/the-gan-zoo"&gt;https://github.com/hindupuravinash/the-gan-zoo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/hindupuravinash/the-gan-zoo"&gt;https://github.com/hindupuravinash/the-gan-zoo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1703.10593.pdf"&gt;https://arxiv.org/pdf/1703.10593.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.tensorflow.org/tutorials/generative/cyclegan"&gt;https://www.tensorflow.org/tutorials/generative/cyclegan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thispersondoesnotexist.com/"&gt;https://thispersondoesnotexist.com/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1812.04948.pdf"&gt;https://arxiv.org/pdf/1812.04948.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NVlabs/stylegan"&gt;https://github.com/NVlabs/stylegan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1601.06759.pdf#page=9&amp;amp;zoom=100,0,0"&gt;https://arxiv.org/pdf/1601.06759.pdf#page=9&amp;amp;zoom=100,0,0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1601.06759.pdf"&gt;https://arxiv.org/pdf/1601.06759.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/carpedm20/pixel-rnn-tensorflow"&gt;https://github.com/carpedm20/pixel-rnn-tensorflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1605.05396.pdf"&gt;https://arxiv.org/pdf/1605.05396.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/paarthneekhara/text-to-image"&gt;https://github.com/paarthneekhara/text-to-image&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1703.05192.pdf"&gt;https://arxiv.org/pdf/1703.05192.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/SKTBrain/DiscoGAN"&gt;https://github.com/SKTBrain/DiscoGAN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1703.05192.pdf"&gt;https://arxiv.org/pdf/1703.05192.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/SKTBrain/DiscoGAN"&gt;https://github.com/SKTBrain/DiscoGAN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1611.04076.pdf"&gt;https://arxiv.org/pdf/1611.04076.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/xudonmao/LSGAN"&gt;https://github.com/xudonmao/LSGAN&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Image Segmentation: Tips and Tricks from 39 Kaggle Competitions</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Tue, 19 May 2020 13:39:22 +0000</pubDate>
      <link>https://dev.to/jakubczakon/image-segmentation-tips-and-tricks-from-39-kaggle-competitions-l97</link>
      <guid>https://dev.to/jakubczakon/image-segmentation-tips-and-tricks-from-39-kaggle-competitions-l97</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally posted by &lt;a href="https://www.linkedin.com/in/mwitiderrick/"&gt;Derrick Mwiti&lt;/a&gt; on the &lt;a href="https://neptune.ai/blog/image-segmentation-tips-and-tricks-from-kaggle-competitions?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-image-segmentation-tips-and-tricks-from-kaggle-competitions"&gt;Neptune blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt; &lt;/p&gt;




&lt;p&gt;Imagine if you could get all the tips and tricks you need to hammer a Kaggle competition. I have gone over 39 Kaggle competitions including&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/data-science-bowl-2017/"&gt;Data Science Bowl 2017&lt;/a&gt; – $1,000,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/intel-mobileodt-cervical-cancer-screening"&gt;Intel &amp;amp; MobileODT Cervical Cancer Screening&lt;/a&gt; – $100,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/data-science-bowl-2018"&gt;2018 Data Science Bowl&lt;/a&gt; – $100,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/airbus-ship-detection"&gt;Airbus Ship Detection Challenge&lt;/a&gt;  – $60,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/planet-understanding-the-amazon-from-space"&gt;Planet: Understanding the Amazon from Space&lt;/a&gt; – $60,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/aptos2019-blindness-detection"&gt;APTOS 2019 Blindness Detection&lt;/a&gt; – $50,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification"&gt;Human Protein Atlas Image Classification&lt;/a&gt; – $37,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation"&gt;SIIM-ACR Pneumothorax Segmentation&lt;/a&gt; – $30,000&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/inclusive-images-challenge"&gt;Inclusive Images Challenge&lt;/a&gt; – $25,000&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;– and extracted that knowledge for you. Dig in.&lt;/p&gt;

&lt;h1&gt;
  
  
  Contents
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;External Data &lt;/li&gt;
&lt;li&gt;Preprocessing &lt;/li&gt;
&lt;li&gt;Data Augmentations &lt;/li&gt;
&lt;li&gt;Modeling &lt;/li&gt;
&lt;li&gt;Hardware Setups &lt;/li&gt;
&lt;li&gt;Loss Functions &lt;/li&gt;
&lt;li&gt;Training Tips &lt;/li&gt;
&lt;li&gt;Evaluation and Cross-validation &lt;/li&gt;
&lt;li&gt;Ensembling Methods &lt;/li&gt;
&lt;li&gt;Post Processing &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  External Data
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Use of the &lt;a href="https://luna16.grand-challenge.org/"&gt;LUng Node Analysis Grand Challenge&lt;/a&gt; data because it contains detailed annotations from radiologists&lt;/li&gt;
&lt;li&gt;Use of the &lt;a href="https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI"&gt;LIDC-IDRI&lt;/a&gt; data because it had radiologist descriptions of each tumor that they found&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.flickr.com/creativecommons/00"&gt;Flickr CC, Wikipedia Commons datasets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.proteinatlas.org/about/download"&gt;Human Protein Atlas Dataset&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.mdpi.com/2306-5729/3/3/25"&gt;IDRiD&lt;/a&gt; dataset&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Data Exploration and Gaining insights
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.2721&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;Clustering of 3d segmentation&lt;/a&gt; with the 0.5 threshold&lt;/li&gt;
&lt;li&gt;Identify if there is a &lt;a href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450#433005"&gt;substantial difference in train/test label distributions&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Preprocessing
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Perform blob Detection using the &lt;a href="https://en.wikipedia.org/wiki/Blob_detection#The_difference_of_Gaussians_approach"&gt;Difference of Gaussian (DoG) method&lt;/a&gt;. Used the implementation available in skimage package.&lt;/li&gt;
&lt;li&gt;Use of &lt;a href="https://www.mdpi.com/2072-4292/11/2/114/pdf-vor"&gt;patch-based inputs for training&lt;/a&gt; in order to reduce the time of training&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/rapidsai/cudf"&gt;cudf&lt;/a&gt; for loading data instead of &lt;a href="https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673"&gt;Pandas&lt;/a&gt; because it has a faster reader&lt;/li&gt;
&lt;li&gt;Ensure that all the images have the &lt;a href="https://github.com/albumentations-team/albumentations"&gt;same orientation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Apply contrast limited adaptive &lt;a href="https://towardsdatascience.com/histogram-equalization-5d1013626e64"&gt;histogram equalization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.opencv.org/master/d9/df8/tutorial_root.html"&gt;OpenCV&lt;/a&gt; for all general image preprocessing&lt;/li&gt;
&lt;li&gt;Employ &lt;a href="https://towardsdatascience.com/review-suggestive-annotation-deep-active-learning-framework-biomedical-image-segmentation-e08e4b931ea6"&gt;automatic active learning&lt;/a&gt; and adding manual annotations&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/albumentations-team/albumentations"&gt;Resize all images to the same resolution&lt;/a&gt; in order to apply the same model to scans of different thicknesses&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/albumentations-team/albumentations"&gt;Convert scan images&lt;/a&gt; into normalized 3D numpy arrays&lt;/li&gt;
&lt;li&gt;Apply single &lt;a href="http://kaiminghe.com/"&gt;Image Haze Removal&lt;/a&gt; using Dark Channel Prior&lt;/li&gt;
&lt;li&gt;Convert all data to &lt;a href="https://www.ncbi.nlm.nih.gov/books/NBK547721/"&gt;Hounsfield units&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Find duplicate images using &lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77269#583768"&gt;pair-wise correlation on RGBY&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Make labels more balanced by &lt;a href="https://www.sebastiansylvan.com/post/importancesampling/"&gt;developing a sampler&lt;/a&gt;
Apply pseudo labeling to test data in order &lt;a href="https://arxiv.org/abs/1908.02983"&gt;to improve score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/albumentations-team/albumentations"&gt;Scale down images/masks to 320×480&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://towardsdatascience.com/histogram-equalization-5d1013626e64"&gt;Histogram equalization&lt;/a&gt; (CLAHE) with kernel size 32×32&lt;/li&gt;
&lt;li&gt;Convert &lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/97120#560788"&gt;DCM to PNG&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Calculate the &lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45798"&gt;md5 hash for each image&lt;/a&gt; when there are duplicate images&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Data Augmentations
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;a href="https://github.com/albumentations-team/albumentations"&gt;albumentations package&lt;/a&gt; for augmentations&lt;/li&gt;
&lt;li&gt;Apply random &lt;a href="https://github.com/albumentations-team/albumentations"&gt;rotation by 90 degrees&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/albumentations-team/albumentations"&gt;horizontal, vertical or both flips&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Attempt &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226"&gt;heavy geometric transformations&lt;/a&gt;: Elastic Transform, PerspectiveTransform,  Piecewise Affine transforms, pincushion distortion&lt;/li&gt;
&lt;li&gt;Apply &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226"&gt;random HSV&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use of &lt;a href="http://juliandewit.github.io/kaggle-ndsb2017/"&gt;loss-less augmentation&lt;/a&gt; for generalization to prevent loss of useful image information&lt;/li&gt;
&lt;li&gt;Apply &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226"&gt;channel shuffling&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Do &lt;a href="https://www.kdnuggets.com/2018/05/data-augmentation-deep-learning-limited-data.html"&gt;data augmentation&lt;/a&gt; based on class &lt;a href="https://towardsdatascience.com/deep-learning-unbalanced-training-data-solve-it-like-this-6c528e9efea6"&gt;frequency&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Apply &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226"&gt;gaussian noise&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://en.wikipedia.org/wiki/Octahedral_symmetry#The_isometries_of_the_cube"&gt;lossless permutations of 3D images&lt;/a&gt; for data augmentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/albumentations-team/albumentations"&gt;Rotate&lt;/a&gt; by a random angle from 0 to 45 degrees&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/albumentations-team/albumentations"&gt;Scale&lt;/a&gt; by a random factor from 0.8 to 1.2&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/albumentations-team/albumentations"&gt;Brightness&lt;/a&gt; changing
Randomly change hue, saturation and value
Apply D4 augmentations
Contrast limited adaptive histogram equalization
Use the AutoAugment   augmentation strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Modeling
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Architectures
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Use of a &lt;a href="https://arxiv.org/abs/1505.04597"&gt;U-net&lt;/a&gt; based architecture. Adopted the concepts and applied them to 3D input tensors&lt;/li&gt;
&lt;li&gt;Employing automatic active learning and adding &lt;a href="https://medium.com/dataturks/manually-annotated-open-datasets-5cea7b1e5890"&gt;manual annotations&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://ai.googleblog.com/2016/08/improving-inception-and-image.html"&gt;inception-ResNet v2 architecture&lt;/a&gt; for training features with different receptive fields&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf"&gt;Siamese networks&lt;/a&gt; with adversarial training&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33"&gt;ResNet50&lt;/a&gt;, &lt;a href="https://arxiv.org/abs/1610.02357"&gt;Xception&lt;/a&gt;, &lt;a href="https://arxiv.org/abs/1602.07261"&gt;Inception ResNet&lt;/a&gt;,  v2 x 5 with Dense (FC) layer as the final layer&lt;/li&gt;
&lt;li&gt;Use of a &lt;a href="https://keras.io/layers/pooling/"&gt;global max-pooling layer&lt;/a&gt; which returns a fixed-length output no matter the input size&lt;/li&gt;
&lt;li&gt;Use of &lt;a href="https://arxiv.org/abs/1904.03076"&gt;stacked dilated convolutions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1711.06396"&gt;VoxelNet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/discussion/36887#207397"&gt;Replace plus sign in LinkNet skip connections with concat and conv1x1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1711.02512.pdf"&gt;Generalized mean pooling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Keras &lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/applications/NASNetLarge"&gt;NASNetLarge&lt;/a&gt; to train the model from scratch using 224x224x3&lt;/li&gt;
&lt;li&gt;Use of the &lt;a href="https://towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610"&gt;3D convnet&lt;/a&gt; to slide over the images&lt;/li&gt;
&lt;li&gt;Imagenet-pre-trained &lt;a href="https://towardsdatascience.com/review-resnet-winner-of-ilsvrc-2015-image-classification-localization-detection-e39402bfa5d8"&gt;ResNet152&lt;/a&gt; as the feature extractor
*&lt;a href="https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/discussion/36887#207397"&gt;Replace the final fully-connected layers of ResNet by 3 fully connected layers with dropout&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107518#619543"&gt;ConvTranspose&lt;/a&gt; in the decoder&lt;/li&gt;
&lt;li&gt;Applying the VGG baseline architecture&lt;/li&gt;
&lt;li&gt;Implementing the &lt;a href="http://vlg.cs.dartmouth.edu/c3d/"&gt;C3D&lt;/a&gt; network with adjusted receptive fields and a 64 unit bottleneck layer on the end of the network&lt;/li&gt;
&lt;li&gt;Use of &lt;a href="https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47"&gt;UNet&lt;/a&gt; type architectures with pre-trained weights to improve convergence and performance of binary segmentation on 8-bit RGB input images&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/1707.03718"&gt;LinkNet&lt;/a&gt; since it’s fast and memory efficient&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/matterport/Mask_RCNN"&gt;MASKRCNN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/CNTK/tree/master/Examples/Image/Classification/GoogLeNet/BN-Inception"&gt;BN-Inception&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1908.02990"&gt;Fast Point R-CNN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/osmr/imgclsmob/blob/master/pytorch/pytorchcv/models/seresnext.py"&gt;Seresnext&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47"&gt;UNet&lt;/a&gt; and &lt;a href="https://arxiv.org/abs/1706.05587"&gt;Deeplabv3&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1506.01497"&gt;Faster RCNN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://paperswithcode.com/paper/squeeze-and-excitation-networks"&gt;SENet154&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/pytorch/resnet152"&gt;ResNet152&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1707.07012.pdf"&gt;NASNet-A-Large&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107795#619987"&gt;EfficientNetB4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/pytorch/resnet101"&gt;ResNet101&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.groundai.com/project/gapnet-graph-attention-based-point-neural-network-for-exploiting-local-feature-of-point-cloud/1"&gt;GAPNet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/1712.00559.pdf"&gt;PNASNet-5-Large&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/pytorch/densenet121"&gt;Densenet121&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/how-to-develop-an-auxiliary-classifier-gan-ac-gan-from-scratch-with-keras/"&gt;AC-GAN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49602#282979"&gt;XceptionNet (96)(, XceptionNet (299), Inception v3 (139), InceptionResNet v2 (299), DenseNet121 (224)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824#650999"&gt;AlbuNet (resnet34)&lt;/a&gt; from &lt;a href="https://github.com/ternaus/TernausNet"&gt;ternausnets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/the-downlinq/a-deep-dive-into-the-spacenet-4-winning-algorithms-8d611a5dfe25"&gt;SpaceNet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824#650999"&gt;Resnet50&lt;/a&gt; from &lt;a href="https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions/tree/master/selim_sef/zoo"&gt;selim_sef SpaceNet 4&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824"&gt;SCSEUnet (seresnext50)&lt;/a&gt; from &lt;a href="https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions/tree/master/selim_sef/zoo"&gt;selim_sef SpaceNet 4&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A custom &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54835#320935"&gt;Unet and Linknet&lt;/a&gt; &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54835#320935"&gt;architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872"&gt;FPNetResNet50 (5 folds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872"&gt;FPNetResNet101 (5 folds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872"&gt;FPNetResNet101 (7 folds with different seeds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872"&gt;PANetDilatedResNet34 (4 folds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872"&gt;PANetResNet50 (4 folds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872"&gt;EMANetResNet101 (2 folds)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/fizyr/keras-retinanet"&gt;RetinaNet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/msracver/Deformable-ConvNets"&gt;Deformable R-FCN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/msracver/Relation-Networks-for-Object-Detection"&gt;Deformable Relation Networks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Hardware Setups
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45724"&gt;Use of the AWS GPU instance p2.xlarge with a NVIDIA K80 GPU&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45724"&gt;Pascal Titan-X GPU&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40121#226179"&gt;Use of 8 TITAN X GPUs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40121#226179"&gt;6 GPUs: 21080Ti + 41080&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45724"&gt;Server with 8×NVIDIA Tesla P40, 256 GB RAM and 28 CPU cores&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45850"&gt;Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450"&gt;GCP 1x P100, 8x CPU, 15 GB RAM, SSD or 2x P100, 16x CPU, 30 GB RAM
NVIDIA Tesla P100 GPU with 16GB of RAM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45850"&gt;Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77325"&gt;980Ti GPU, 2600k CPU, and 14GB RAM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Loss Functions
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2"&gt;Dice Coefficient&lt;/a&gt; because it works well with imbalanced data&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/lyakaap/weighing-boundary-pixels-loss-script-by-keras2"&gt;Weighted boundary loss&lt;/a&gt; whose aim is to reduce the distance between the predicted segmentation and the ground truth&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pytorch.org/docs/stable/nn.html?highlight=multilabelsoftmarginloss#torch.nn.MultiLabelSoftMarginLoss"&gt;MultiLabelSoftMarginLoss&lt;/a&gt; that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input  and target&lt;/li&gt;
&lt;li&gt;Balanced cross entropy (BCE) [with logit loss]( that involves weighing the positive and negative examples by a certain coefficient&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107981"&gt;Lovasz&lt;/a&gt; that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107687"&gt;FocalLoss + Lovasz&lt;/a&gt; obtained by summing the Focal and Lovasz losses&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/1801.07698"&gt;Arc margin loss&lt;/a&gt; that incorporates margin in order to maximise face class separability&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/addons/api_docs/python/tfa/losses/npairs_loss"&gt;Npairs loss&lt;/a&gt; that computes the npairs loss between y_true and y_pred.&lt;/li&gt;
&lt;li&gt;A combination of &lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40199"&gt;BCE and Dice loss&lt;/a&gt; functions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/pdf/1704.03135.pdf"&gt;LSEP&lt;/a&gt; – a pairwise ranking that is is smooth everywhere and thus is easier to optimize&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ydwen.github.io/papers/WenECCV16.pdf"&gt;Center loss&lt;/a&gt; that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/1803.00130"&gt;Ring Loss&lt;/a&gt; that augments standard loss functions such as Softmax&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tensorflow.org/addons/tutorials/losses_triplet"&gt;Hard triplet loss&lt;/a&gt; that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107687"&gt;1 + BCE – Dice&lt;/a&gt; that involves subtracting the BCE and DICE losses then adding 1&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40144"&gt;Binary cross-entropy&lt;/a&gt; –  log(dice) that is the binary cross-entropy minus the log of the dice loss&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions/blob/master/selim_sef/training/losses.py"&gt;Combinations&lt;/a&gt; of BCE, dice and focal&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107981"&gt;Lovasz Loss&lt;/a&gt; that  loss performs direct optimization of the mean intersection-over-union loss&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107795#619987"&gt;BCE + DICE&lt;/a&gt; -Dice loss is  obtained by calculating smooth dice coefficient function&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77320"&gt;Focal loss with Gamma 2&lt;/a&gt; that is an improvement to the standard cross-entropy criterion&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107546"&gt;BCE + DICE + Focal&lt;/a&gt; – this is basically a summation of the three loss functions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107981"&gt;Active Contour Loss&lt;/a&gt; that incorporates the area and size information and integrates the information in a dense deep learning model&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107603"&gt;1024 * BCE(results, masks) + BCE(cls, cls_target)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/aptos2019-blindness-detection/discussion/108058"&gt;Focal + kappa&lt;/a&gt; – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/pdf/1801.07698v1.pdf"&gt;ArcFaceLoss&lt;/a&gt; — Additive Angular Margin Loss for Deep Face Recognition&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107522"&gt;soft Dice trained on positives only&lt;/a&gt; – Soft Dice uses predicted probabilities&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/108397"&gt;2.7 * BCE(pred_mask, gt_mask) + 0.9 * DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty)&lt;/a&gt; which is a custom loss used by the Kaggler&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/aptos2019-blindness-detection/discussion/108065"&gt;nn.SmoothL1Loss()&lt;/a&gt; that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise&lt;/li&gt;
&lt;li&gt;Use of the &lt;a href="https://towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7"&gt;Mean Squared Error objective function&lt;/a&gt; in scenarios where it seems to work better than &lt;a href="https://machinelearningmastery.com/cross-entropy-for-machine-learning/"&gt;binary-cross entropy objective function.&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Training tips
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/"&gt;Try different learning rates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/"&gt;Try different batch sizes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/38125#2139200"&gt;SDG with momentum with manual rate scheduling&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Too much &lt;a href="https://medium.com/secure-and-private-ai-writing-challenge/data-augmentation-increases-accuracy-of-your-model-but-how-aa1913468722"&gt;augmentation&lt;/a&gt; will reduce the accuracy&lt;/li&gt;
&lt;li&gt;Train on image &lt;a href="https://www.kaggle.com/c/understanding_cloud_organization/discussion/115115"&gt;crops and predict&lt;/a&gt; on full images&lt;/li&gt;
&lt;li&gt;Use of Keras’s &lt;a href="https://www.kaggle.com/c/bengaliai-cv19/discussion/135998"&gt;ReduceLROnPlateau()&lt;/a&gt; to the learning rate&lt;/li&gt;
&lt;li&gt;Train &lt;a href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450"&gt;without augmentation until plateau&lt;/a&gt; then apply soft and hard augmentation to some epochs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65763"&gt;Freeze all layers except the&lt;/a&gt; last one and use 1000 images from &lt;a href="https://www.kaggle.com/c/quickdraw-doodle-recognition/discussion/72892"&gt;Stage1 for tuning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Make labels more balanced by &lt;a href="https://www.sebastiansylvan.com/post/importancesampling/"&gt;developing a sampler&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use of &lt;a href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49314"&gt;class aware sampling&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use dropout and augmentation while tuning the last layer&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/38298"&gt;Pseudo Labeling&lt;/a&gt; to improve score&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49299"&gt;Adam reducing LR on plateau with patience 2–4&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49299"&gt;Cyclic LR with SGD&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Reduce the &lt;a href="https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/"&gt;learning rate&lt;/a&gt; by a factor of two if validation loss does not improve for two consecutive epochs&lt;/li&gt;
&lt;li&gt;Repeat &lt;a href="https://medium.com/kaggle-blog/carvana-image-masking-challenge-1st-place-winners-interview-78fcc5c887a8"&gt;the worst batch&lt;/a&gt; out of 10 batches&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40126"&gt;Train with default UNET&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40126"&gt;Overlap tiles&lt;/a&gt; so that each edge pixel is covered twice&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/71022"&gt;Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Remove &lt;a href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70505"&gt;low bounding box&lt;/a&gt; with low confidence score&lt;/li&gt;
&lt;li&gt;Train different &lt;a href="https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/"&gt;convolutional neural networks&lt;/a&gt; then build an ensemble&lt;/li&gt;
&lt;li&gt;Stop &lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77320"&gt;training when the F1 score is decreasing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.slavv.com/differential-learning-rates-59eff5209a4f"&gt;Differential learning rate&lt;/a&gt; with gradual reducing&lt;/li&gt;
&lt;li&gt;Train ANNs [in a stacking way using 5 folds](&lt;a href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion/48207"&gt;https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion/48207&lt;/a&gt; and 30 repeats&lt;/li&gt;
&lt;li&gt;Track of your experiments using &lt;a href="https://docs.neptune.ai/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-image-segmentation-tips-and-tricks-from-kaggle-competitions"&gt;Neptune&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Evaluation and cross-validation
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Split on &lt;a href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/71433"&gt;non-uniform stratified&lt;/a&gt; by classes&lt;/li&gt;
&lt;li&gt;Avoid &lt;a href="https://elitedatascience.com/overfitting-in-machine-learning"&gt;overfitting&lt;/a&gt; by applying &lt;a href="https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/"&gt;cross-validation&lt;/a&gt; while &lt;a href="https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/"&gt;tuning&lt;/a&gt; the last layer&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70421"&gt;10-fold CV ensemble for classification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Combination &lt;a href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70421"&gt;of 5 10-fold CV&lt;/a&gt; ensembles for detection&lt;/li&gt;
&lt;li&gt;Sklearn’s &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html"&gt;stratified K fold function&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/k-fold-cross-validation/"&gt;5 KFold Cross-Validation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Adversarial &lt;a href="https://www.kaggle.com/c/PLAsTiCC-2018/discussion/75011"&gt;Validation &amp;amp; Weighting&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Ensembling methods
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Use simple &lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77256"&gt;majority voting&lt;/a&gt; for ensemble&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/"&gt;XGBoost&lt;/a&gt; on the &lt;a href="https://www.kaggle.com/c/data-science-bowl-2017/discussion/31551"&gt;max malignancy at 3 zoom levels&lt;/a&gt;, the z-location and the &lt;a href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion/48207"&gt;amount of strange tissue&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/"&gt;LightGBM&lt;/a&gt; for models &lt;a href="https://www.kaggle.com/c/quickdraw-doodle-recognition/discussion/73738"&gt;with too many&lt;/a&gt; classes. This was done for raw data features only.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/"&gt;CatBoost&lt;/a&gt; for &lt;a href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45733"&gt;a second-layer model&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Training with 7 features for the &lt;a href="https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/"&gt;gradient boosting classifier&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use ‘curriculum learning’ to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.&lt;/li&gt;
&lt;li&gt;Ensemble with &lt;a href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450"&gt;ResNet50, InceptionV3, and InceptionResNetV2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ahrnbom/ensemble-objdet"&gt;Ensemble method&lt;/a&gt; for object detection&lt;/li&gt;
&lt;li&gt;An ensemble of &lt;a href="https://www.analyticsvidhya.com/blog/2019/07/computer-vision-implementing-mask-r-cnn-image-segmentation/"&gt;Mask RCNN&lt;/a&gt;, &lt;a href="https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/"&gt;YOLOv3&lt;/a&gt;, and &lt;a href="https://towardsdatascience.com/faster-r-cnn-for-object-detection-a-technical-summary-474c5b857b46"&gt;Faster RCNN&lt;/a&gt; architectures n with a classification network — &lt;a href="https://towardsdatascience.com/understanding-and-visualizing-densenets-7f688092391a"&gt;DenseNet-121&lt;/a&gt; architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Post Processing
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Apply &lt;a href="https://towardsdatascience.com/test-time-augmentation-tta-and-how-to-perform-it-with-keras-4ac19b67fb4d"&gt;test time augmentation&lt;/a&gt; — presenting an image to a model several times with different random transformations and average the predictions you get&lt;/li&gt;
&lt;li&gt;Equalize test prediction &lt;a href="https://machinelearningmastery.com/probability-metrics-for-imbalanced-classification/"&gt;probabilities&lt;/a&gt; instead of only using predicted classes&lt;/li&gt;
&lt;li&gt;Apply &lt;a href="https://machinelearningmastery.com/arithmetic-geometric-and-harmonic-means-for-machine-learning/"&gt;geometric mean&lt;/a&gt; to the &lt;a href="https://medium.com/@flawnsontong1/what-is-geometric-deep-learning-b2adb662d91d"&gt;predictions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40126"&gt;Overlap tiles during inferencing so that each edge pixel&lt;/a&gt; is covered at least thrice because UNET tends to have bad predictions around edge areas.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70632"&gt;Non-maximum suppression&lt;/a&gt; and bounding box shrinkage&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741"&gt;Watershed post processing&lt;/a&gt; to detach objects in instance segmentation problems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Hopefully, this article gave you some background into image segmentation tips and tricks and given you some tools and frameworks that you can use to start competing.&lt;/p&gt;

&lt;p&gt;We’ve covered tips on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;architectures&lt;/li&gt;
&lt;li&gt;training tricks,&lt;/li&gt;
&lt;li&gt;losses,&lt;/li&gt;
&lt;li&gt;pre-processing,&lt;/li&gt;
&lt;li&gt;post processing&lt;/li&gt;
&lt;li&gt;ensembling&lt;/li&gt;
&lt;li&gt;tools and frameworks.
If you want to go deeper down the rabbit hole, simply follow the links and see how the best image segmentation models are built.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy segmenting!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Image segmentation in 2020: Architectures, Losses, Datasets, and Frameworks</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Mon, 11 May 2020 10:23:59 +0000</pubDate>
      <link>https://dev.to/jakubczakon/image-segmentation-in-2020-architectures-losses-datasets-and-frameworks-29b</link>
      <guid>https://dev.to/jakubczakon/image-segmentation-in-2020-architectures-losses-datasets-and-frameworks-29b</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally posted by Derrick Mwiti on &lt;a href="https://neptune.ai/blog/image-segmentation-in-2020?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-image-segmentation-2020"&gt;neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt; &lt;/p&gt;




&lt;p&gt;In this piece, we’ll take a plunge into the world of image segmentation using deep learning. We’ll talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what image segmentation is and the two main types of image segmentation&lt;/li&gt;
&lt;li&gt;Image segmentation architectures&lt;/li&gt;
&lt;li&gt;Loss functions used in image segmentation&lt;/li&gt;
&lt;li&gt;Frameworks that you can use for your image segmentation projects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's dive in.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is Image Segmentation?
&lt;/h1&gt;

&lt;p&gt;As the term suggests this is the process of dividing an image into multiple segments. In this process, every pixel in the image is associated with an object type. There are two major types of image segmentation - semantic segmentation and instance segmentation.&lt;br&gt;
In semantic segmentation, all objects of the same type are marked using one class label while in instance segmentation similar objects get their own separate labels.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v6EhpDF9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A1tUeRSJFHD4eRL0T" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v6EhpDF9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A1tUeRSJFHD4eRL0T" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Image Segmentation Architectures
&lt;/h1&gt;

&lt;p&gt;The basic architecture in image segmentation consists of an encoder and a decoder.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2LaRRV1j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AcK1yrnwMlWoJiNND" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2LaRRV1j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AcK1yrnwMlWoJiNND" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
The encoder extracts features from the image through filters. The decoder is responsible for generating the final output which is usually a segmentation mask containing the outline of the object. Most of the architectures have this architecture or a variant of it.&lt;br&gt;
Let's look at a couple.&lt;/p&gt;

&lt;h3&gt;
  
  
  U-Net
&lt;/h3&gt;

&lt;p&gt;U-Net is a convolutional neural network originally developed for segmenting biomedical images. When visualized its architecture looks like the letter U and hence the name U-Net. Its architecture is made up of two parts, the left part - the contracting path and the right part - the expansive path. The purpose of the contracting path is to capture context while the role of the expansive path is to aid in precise localization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aCQqpsD5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2ASV44NgY9D_QoRwON" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aCQqpsD5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2ASV44NgY9D_QoRwON" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
U-Net is made up of an expansive path on the right and a contracting path on the left. The contracting path is made up of two three-by-three convolutions. The convolutions are followed by a rectified linear unit and a two-by-two max-pooling computation for downsampling.&lt;br&gt;
U-Net's full implementation can be found &lt;a href="https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  FastFCN - Fast Fully-connected network
&lt;/h3&gt;

&lt;p&gt;In this architecture, a Joint Pyramid Upsampling(JPU) module is used to replace &lt;a href="https://arxiv.org/pdf/1808.08931.pdf"&gt;dilated convolutions since they consume a lot of memory and time&lt;/a&gt;. It uses a fully-connected network at its core while applying JPU for upsampling. JPU upsamples the low-resolution feature maps to high-resolution feature maps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4jrnNXAA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AClXddCzzpAP_nsge" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4jrnNXAA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AClXddCzzpAP_nsge" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Gated-SCNN
&lt;/h3&gt;

&lt;p&gt;This architecture consists of a two-stream CNN architecture. In this model, a separate branch is used to process image shape information. The shape stream is used to process boundary information.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--psh6ooQ2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A15SrrunB5P-nzdMa" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--psh6ooQ2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A15SrrunB5P-nzdMa" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
You can implement it by checking out the code here.&lt;/p&gt;

&lt;h3&gt;
  
  
  DeepLab
&lt;/h3&gt;

&lt;p&gt;In this architecture, convolutions with upsampled filters are used for tasks that involve dense prediction. Segmentation of objects at multiple scales is done via atrous spatial pyramid pooling. Finally, DCNNs are used to improve the localization of object boundaries. Atrous convolution is achieved by upsampling the filters through the insertion of zeros or sparse sampling of input feature maps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ui7PLzFS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AjchTboNpy0YnuKNi" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ui7PLzFS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AjchTboNpy0YnuKNi" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
You can try its implementation on either &lt;a href="https://github.com/fregu856/deeplabv3"&gt;PyTorch&lt;/a&gt; or &lt;a href="https://github.com/sthalles/deeplab_v3"&gt;TensorFlow&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mask R-CNN
&lt;/h3&gt;

&lt;p&gt;In this &lt;a href="https://github.com/facebookresearch/Detectron"&gt;architecture&lt;/a&gt;, objects are classified and localized using a bounding box and semantic segmentation that classifies each pixel into a set of categories. Every region of interest gets a segmentation mask. A class label and a bounding box are produced as the final output. The architecture is an extension of the Faster R-CNN. The Faster R-CNN is made up of a deep convolutional network that proposes the regions and a detector that utilizes the regions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--f6j04DaT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AHTH_L3LddlEvNF_s" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--f6j04DaT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AHTH_L3LddlEvNF_s" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
Here is an image of the result obtained on the COCO test set.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2SHTfS67--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A8HoWpLqEchV8GHqv" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2SHTfS67--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A8HoWpLqEchV8GHqv" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Image Segmentation Loss functions
&lt;/h1&gt;

&lt;p&gt;Semantic segmentation models usually use a simple cross-categorical entropy loss function during training. However, if you are interested in getting the granular information of an image, then you have to revert to slightly more advanced loss functions. '&lt;br&gt;
Let's go through a couple of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Focal Loss
&lt;/h3&gt;

&lt;p&gt;This loss is an improvement to the standard cross-entropy criterion. This is done by changing its shape such that the loss assigned to well-classified examples is down-weighted. Ultimately, this ensures that there is no class imbalance. In this loss function, the cross-entropy loss is scaled with the scaling factors decaying at zero as the confidence in the correct classes increases. The scaling factor automatically down weights the contribution of easy examples at training time and focuses on the hard ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jhF1fy9i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A3Q669UYXaeoMXwUf" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jhF1fy9i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A3Q669UYXaeoMXwUf" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Dice loss
&lt;/h3&gt;

&lt;p&gt;This loss is obtained by calculating smooth dice coefficient function. This loss is the most commonly used loss is segmentation problems.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jgKldFMO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A7Qx3lNciGIx7JDFt" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jgKldFMO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A7Qx3lNciGIx7JDFt" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Intersection over Union (IoU)-balanced Loss
&lt;/h3&gt;

&lt;p&gt;The IoU-balanced classification loss aims at increasing the gradient of samples with high IoU and decreasing the gradient of samples with low IoU. In this way, the localization accuracy of machine learning models is increased.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--E52-7MQR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AuDzMNd_p34UZQmeQ" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--E52-7MQR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AuDzMNd_p34UZQmeQ" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Boundary loss
&lt;/h3&gt;

&lt;p&gt;One variant of the boundary loss is applied to tasks with highly unbalanced segmentations. This loss's form is that of a distance metric on space contours and not regions. In this manner, it tackles the problem posed by regional losses for highly imbalanced segmentation tasks.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bi6p8Kb6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A2sicSw4kigyrxL1R" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bi6p8Kb6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A2sicSw4kigyrxL1R" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Weighted cross-entropy
&lt;/h3&gt;

&lt;p&gt;In one variant of cross-entropy, all positive examples are weighted by a certain coefficient. It is used in scenarios that involve class imbalance.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XATUtczm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A4-CteV23pCnpS1py" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XATUtczm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A4-CteV23pCnpS1py" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Lovász-Softmax loss
&lt;/h3&gt;

&lt;p&gt;This loss performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DuRO8I35--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AiUjI9M8gCrkrr-_-" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DuRO8I35--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AiUjI9M8gCrkrr-_-" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other losses worth mentioning are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TopK loss&lt;/strong&gt; whose aim is to ensure that networks concentrate on hard samples during the training process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distance penalized CE loss&lt;/strong&gt; that directs the network to boundary regions that are hard to segment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensitivity-Specificity (SS)&lt;/strong&gt; loss that computes the weighted sum of the mean squared difference of specificity and sensitivity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hausdorff distance(HD) loss&lt;/strong&gt; that estimated the Hausdorff distance from a convolutional neural network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are just a couple of loss functions used in image segmentation. To explore many more check out this repo.&lt;/p&gt;

&lt;h1&gt;
  
  
  Image Segmentation Datasets
&lt;/h1&gt;

&lt;p&gt;If you are still here, chances are that you might be asking yourself where you can get some datasets to get started.&lt;br&gt;
Let's look at a few.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Objects in COntext - Coco Dataset
&lt;/h3&gt;

&lt;p&gt;COCO is a large-scale object detection, segmentation, and captioning dataset. The dataset contains 91 classes. It has 250,000 people with key points. Its download size is 37.57 GiB. It contains 80 object categories. It is available under the &lt;a href="https://medium.com/r/?url=https%3A%2F%2Fwww.apache.org%2Flicenses%2FLICENSE-2.0"&gt;Apache 2.0 License&lt;/a&gt; and can be downloaded from &lt;a href="http://cocodataset.org/#download"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  PASCAL Visual Object Classes (PASCAL VOC)
&lt;/h3&gt;

&lt;p&gt;PASCAL has 9963 images with 20 different classes. The training/validation set is a 2GB tar file. The dataset can be downloaded from the official website.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cityscapes Dataset
&lt;/h3&gt;

&lt;p&gt;This dataset contains images of city scenes. It can be used to evaluate the performance of vision algorithms in urban scenarios. The dataset can be downloaded from here.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cambridge-driving Labeled Video Database - CamVid
&lt;/h3&gt;

&lt;p&gt;This is a motion-based segmentation and recognition dataset. It contains 32 semantic classes. This link contains further explanations and download links to the dataset.&lt;/p&gt;

&lt;h1&gt;
  
  
  Image Segmentation Frameworks
&lt;/h1&gt;

&lt;p&gt;Now that you are armed with possible datasets, let's mention a few tools/frameworks that you can use to get started.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.fast.ai/vision.image.html#ImageSegment"&gt;FastAI library&lt;/a&gt; - given an image this library is able to create a mask of the objects in the image.&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://www.fexovi.com/sefexa.html"&gt;Sefexa Image Segmentation Tool&lt;/a&gt; - Sefexa is a free tool that can be used for Semi-automatic image segmentation, analysis of images, and creation of ground truth&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/facebookresearch/deepmask"&gt;Deepmask&lt;/a&gt; - Deepmask by Facebook Research is a Torch implementation of DeepMask and SharpMask&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/facebookresearch/multipathnet"&gt;MultiPath&lt;/a&gt; - This a Torch implementation of the object detection network from "A MultiPath Network for Object Detection".
OpenCV - This is an open-source computer vision library with over 2500 optimized algorithms.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/frankkramer-lab/MIScnn/wiki"&gt;MIScnn&lt;/a&gt; - is a medical image segmentation open-source library. It allows setting up pipelines with state-of-the-art convolutional neural networks and deep learning models in a few lines of code.
&lt;a href="https://www.fritz.ai/image-segmentation/"&gt;Fritz&lt;/a&gt;: Fritz offers several computer vision tools including image segmentation tools for mobile devices.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Hopefully, this article gave you some background into image segmentation and given you some tools and frameworks that you can use to get started.&lt;/p&gt;

&lt;p&gt;We’ve covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what image segmentation is,&lt;/li&gt;
&lt;li&gt;a couple of image segmentation architectures,&lt;/li&gt;
&lt;li&gt;some image segmentation losses,&lt;/li&gt;
&lt;li&gt;image segmentation tools and frameworks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more information check out the links attached to each of the architectures and frameworks.&lt;/p&gt;

&lt;p&gt;Happy segmenting!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally &lt;a href="https://neptune.ai/blog/image-segmentation-in-2020?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-image-segmentation-2020"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt; &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Document Classification: 7 pragmatic approaches for small datasets
</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Thu, 30 Apr 2020 07:59:25 +0000</pubDate>
      <link>https://dev.to/jakubczakon/document-classification-7-pragmatic-approaches-for-small-datasets-5ej7</link>
      <guid>https://dev.to/jakubczakon/document-classification-7-pragmatic-approaches-for-small-datasets-5ej7</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally posted by Shahul ES on &lt;a href="https://neptune.ai/blog/document-classification-small-datasets?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-document-classification-small-datasets"&gt;neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt; &lt;/p&gt;




&lt;p&gt;Document or text classification is one of the predominant tasks in Natural language processing. It has many applications including news type classification, spam filtering, toxic comment identification, etc.&lt;/p&gt;

&lt;p&gt;In big organizations the datasets are large and training deep learning text classification models from scratch is a feasible solution but &lt;strong&gt;for the majority of real-life problems your dataset is small and if you want to build your machine learning model you need to be smart.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article, I will talk about pragmatic approaches towards text representation which make document classification on small datasets doable.&lt;/p&gt;

&lt;h1&gt;
  
  
  Text Classification 101
&lt;/h1&gt;

&lt;p&gt;The text classification workflow begins by cleaning and preparing the corpus out of the dataset. Then this corpus is represented by any of the different text representation methods which are then followed by modeling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fWayhKwv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Screenshot-from-2020-02-28-15-56-49.png%3Fw%3D810%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fWayhKwv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Screenshot-from-2020-02-28-15-56-49.png%3Fw%3D810%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, &lt;strong&gt;we will focus on the “Text Representation”&lt;/strong&gt; step of this pipeline.&lt;/p&gt;

&lt;h1&gt;
  
  
  Example text classification dataset
&lt;/h1&gt;

&lt;p&gt;We will use the data from &lt;a href="https://www.kaggle.com/c/nlp-getting-started"&gt;Real or Not? NLP with disaster tweets&lt;/a&gt; kaggle competition. Here, the &lt;strong&gt;task is to predict which tweets are about real disasters and which ones are not.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to follow the article step-by-step you may want to install all the libraries that I used for the analysis. &lt;/p&gt;

&lt;p&gt;Let’s take a look at our data,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../input/nlp-getting-started/train.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../input/nlp-getting-started/test.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JDJFhHe4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output1-1.png%3Fw%3D643%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JDJFhHe4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output1-1.png%3Fw%3D643%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data contains of id, keyword, location, text, and target which is binary. We will only consider the tweets to predict the target.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'There are {} rows and {} columns in train'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'There are {} rows and {} columns in test'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X7vkdw5o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output2-1.png%3Fw%3D954%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X7vkdw5o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output2-1.png%3Fw%3D954%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The training dataset has less than 8000 tweets. That, combined with the fact that tweets are 280 characters tops make it a tricky, small(ish) dataset.&lt;/p&gt;

&lt;h1&gt;
  
  
  Text data preparation
&lt;/h1&gt;

&lt;p&gt;Before we get into any NLP task, we need to do some data preprocessing and basic cleaning. It is not a focus of this article but if you want to read more about this step check out this &lt;a href="https://towardsdatascience.com/nlp-for-beginners-cleaning-preprocessing-text-data-ae8e306bef0f"&gt;article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In short, we will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokenize&lt;/strong&gt;: the process by which sentences are converted to a list of tokens or words.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove stopwords&lt;/strong&gt;: drop words like ‘a’ or ‘the’&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lemmatize&lt;/strong&gt;: reduce the inflectional forms of each word into a common base or root (“studies”, “studying” -&amp;gt; “study”).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;preprocess_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;'''Function to preprocess and create corpus'''&lt;/span&gt;
    &lt;span class="n"&gt;new_corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;lem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WordNetLemmatizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"question_text"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

        &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lemmatize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;new_corpus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;new_corpus&lt;/span&gt;

&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;preprocess_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, let’s see how to represent this corpus so that we can feed this into any machine learning algorithm.&lt;/p&gt;

&lt;h1&gt;
  
  
  Text Representation
&lt;/h1&gt;

&lt;p&gt;Text cannot be used directly as input to a machine learning model but needs to be represented in the numeric format first. This is known as text representation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Countvectorizer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Countvectorizer provides an easy method to vectorize and represent a collection of text documents. It tokenizes the input text and builds a vocabulary of known words and then represents the documents using this vocabulary.&lt;/p&gt;

&lt;p&gt;Let’s understand it by using an example,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"She sells seashells in the seashore"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# create the transform
&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# tokenize and build vocab
&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# summarize
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vocabulary_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# encode document
&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# summarize encoded vector
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--q6fvZfvB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output3-1.png%3Fw%3D936%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--q6fvZfvB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output3-1.png%3Fw%3D936%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that the Coutvectorizer has built a vocabulary out of the given text and then represented the words using a numpy sparse matrix. We can try and transfer another text using this vocabulary and observe the output to get a better understanding.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;"I sell seashells in the seashore"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ilY1keUM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output4-1.png%3Fw%3D889%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ilY1keUM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output4-1.png%3Fw%3D889%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The index positions 3 and 4 have zeroes meaning that these two words are not present in our vocabulary and all other positions have 1 meaning that these words are present in our vocabulary.
The corresponding words missing from the vocabulary are “sells” and “she”.
Now that you understand how Coutvectorizer works, we can fit and transform our corpus using it.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;max_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;question_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;question_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You should know that Countvectorizer has a few important parameters that you should adjust to your problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;max_features&lt;/strong&gt;: build a vocabulary that only considers the top n tokens ordered by term frequency across the corpus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;min_df&lt;/strong&gt;: When building the vocabulary ignore terms that have a token frequency strictly lower than the given threshold&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max_df&lt;/strong&gt;: When building the vocabulary ignore terms that have a token frequency strictly higher than the given threshold.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What usually helps with selecting reasonable values (or ranges for hyperparameter optimization methods) is good exploratory data analysis. Check out my other article to read about it.&lt;/p&gt;

&lt;h1&gt;
  
  
  TfidfVectorizer
&lt;/h1&gt;

&lt;p&gt;One issue with Countvectorizer is that common words like “the” will appear many times (unless you remove them at the preprocessing stage) and these words are not actually important. One popular alternative is Tfidfvectorizer. It is an acronym for &lt;strong&gt;Term frequency-inverse document frequency.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Term Frequency&lt;/strong&gt;: This summarizes how often a given word appears within a document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inverse Document Frequency&lt;/strong&gt;: This downscales words that appear a lot across documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s look at an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.feature_extraction.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;
&lt;span class="c1"&gt;# list of text documents
&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"She sells seashells by the seashore"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"The sea."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"The seashore"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# create the transform
&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# tokenize and build vocab
&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# summarize
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vocabulary_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;idf_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# encode document
&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;span class="c1"&gt;# summarize encoded vector
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KadGgrGJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output5-2.png%3Fw%3D918%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KadGgrGJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output5-2.png%3Fw%3D918%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The vocabulary again consists of 6 words and the inverse document frequency is calculated for each word, assigning the lowest score to “the” which occurred 4 times.&lt;/p&gt;

&lt;p&gt;Then the scores are normalized between 0 and 1 and this text representation can be used as input into any machine learning model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Word2vec
&lt;/h1&gt;

&lt;p&gt;The big issue with the above approaches is that the context of the word is lost when representing it. Word embeddings provide a much better representation of the words in NLP by encoding some context information. It provides a &lt;strong&gt;mapping from a word to a corresponding n-dimensional vector.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y6TbE6Uj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/word2vec.png%3Fresize%3D1024%252C608%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y6TbE6Uj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/word2vec.png%3Fresize%3D1024%252C608%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Word2Vec was developed at Google by Tomas Mikolov, et al. and uses a &lt;strong&gt;shallow neural network&lt;/strong&gt; to learn word embeddings. The vectors are learned by understanding the context in which the word occurs. Specifically, it looks at co-occurring words.&lt;/p&gt;

&lt;p&gt;Given below is the co-occurrence matrix for the sentence “The cat sat on the mat”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lGjsFNkY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/coocurencematrix.png%3Fw%3D347%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lGjsFNkY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/coocurencematrix.png%3Fw%3D347%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Word2vec is composed of two different models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Bag of Words&lt;/strong&gt; (CBOW) model can be thought of as learning word embeddings by training a model to &lt;strong&gt;predict a word given its context&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip-Gram&lt;/strong&gt; model is the opposite, learning word embeddings by training a model to &lt;strong&gt;predict context given a word&lt;/strong&gt;.
The basic idea of word embedding is words that occur in similar context tend to be closer to each other in vector space. Let’s check how to implement word2vec in python.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;gensim&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;gensim.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Word2Vec&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gensim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word2Vec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                               &lt;span class="n"&gt;min_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now you have created your word2vec model, some of the important parameters that you can actually change and observe the differences are,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;size&lt;/strong&gt;: this indicates the embedding size of the resulting vector for each word.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;min_count&lt;/strong&gt;: When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;window&lt;/strong&gt;: The number of words surrounding the word is considered when building the representation. Also known as the window size.
In this article, we focus on pragmatic approaches for small datasets and we will use pre-trained word vectors instead of training vectors from our corpus. This method is guaranteed to yield better performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First, you will have to download the trained vectors from here. Then you can load the vectors using gensim.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt;  &lt;span class="nn"&gt;gensim.models.KeyedVectors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_word2vec_format&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_word2vec&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;word2vecDict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_word2vec_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;'../input/word2vec-google/GoogleNews-vectors-negative300.bin'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;binary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unicode_errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'ignore'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embeddings_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;word2vecDict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vocab&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;word2vecDict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_vec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddings_index&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let’s check the embedding,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;w2v_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;load_word2vec&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;w2v_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'London'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---INpolFq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output6-1.png%3Fw%3D763%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---INpolFq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output6-1.png%3Fw%3D763%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that the word is represented using a 300-dimensional vector. So every word in your corpus can be represented like this and this embedding matrix is used to train your model.&lt;/p&gt;

&lt;h1&gt;
  
  
  FastText
&lt;/h1&gt;

&lt;p&gt;Now, let’s learn about fastText which is an extremely useful module available in &lt;a href="https://radimrehurek.com/gensim/"&gt;gensim&lt;/a&gt;. FastText has been developed by Facebook and yields great performance and speed in text classification tasks.&lt;/p&gt;

&lt;p&gt;It supports both Continuous Bag of Words and Skip-Gram models. The main difference between previous models and &lt;strong&gt;FastText is that it breaks the word in several n-grams&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let’s take the word orange for example.&lt;/p&gt;

&lt;p&gt;The trigrams of word orange are,org,ran,ang,nge(ignoring the starting and ending boundaries of the word).&lt;/p&gt;

&lt;p&gt;The word &lt;strong&gt;embedding vector (text representation)for orange will be the sum of these n-grams&lt;/strong&gt;. Rare words or typos can now be properly represented since it is highly likely that some of their n-grams also appears in other words.&lt;/p&gt;

&lt;p&gt;For example, for a word like stupedofantabulouslyfantastic, which might never have been in any corpus, gensim might return any two of the following solutions: a zero vector or a random vector with low magnitude.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FastText&lt;/strong&gt;, however, &lt;strong&gt;can produce better vectors by breaking the word&lt;/strong&gt; into chunks and using the vectors for those chunks to create a final vector for the word. In this particular case, the final vector might be closer to the vectors of fantastic and fantabulous.&lt;/p&gt;

&lt;p&gt;Again, we will use a pre-trained model rather than training our own word embeddings.&lt;/p&gt;

&lt;p&gt;For this, you can download pre-trained vectors from here.&lt;/p&gt;

&lt;p&gt;Each line of this file contains a word and it’s a corresponding n-dimensional vector. We will create a dictionary using this file for mapping each word to its vector representation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;gensim.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastText&lt;/span&gt; 

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_fasttext&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'loading word embeddings...'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embeddings_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../input/fasttext/wiki.simple.vec'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'utf-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;coefs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'float32'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;coefs&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'found %s word vectors'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddings_index&lt;/span&gt;

&lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;load_fastext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Tq4cnkNz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output7-1.png%3Fw%3D512%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Tq4cnkNz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output7-1.png%3Fw%3D512%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s check the embedding for a word,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'london'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---INpolFq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output6-1.png%3Fw%3D763%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---INpolFq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output6-1.png%3Fw%3D763%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  GloVe ( Global vectors for word representation)
&lt;/h1&gt;

&lt;p&gt;GloVe stands for global vectors for word representation. It is an unsupervised learning algorithm developed by Stanford. The basic idea of GloVe is to derive a semantic relationship between words using a co-occurrence matrix. The idea is very similar to word2vec but there are slight differences. Go here to read more.&lt;/p&gt;

&lt;p&gt;For this, we will use pre-trained glove vectors which are trained on large corpora. This is guaranteed to perform better in almost any situation.You can download it from here.&lt;/p&gt;

&lt;p&gt;After downloading we can load our pre-trained word model. Before that, you should understand the format in which it is made available. Each line contains a word and its corresponding n-dimensional vector representation. Like this,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pJ5Qk6T5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/glove_vectors.png%3Fw%3D684%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pJ5Qk6T5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/glove_vectors.png%3Fw%3D684%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, to use this you should first prepare a dictionary that contains the mapping between word and corresponding vector. This can be called an embedding dictionary.&lt;/p&gt;

&lt;p&gt;Let’s create one for our purpose.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_glove&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;embedding_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'../input/glove-global-vectors-for-word-representation/glove.6B.100d.txt'&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'r'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="s"&gt;'float32'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;embedding_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embedding_dict&lt;/span&gt;


&lt;span class="n"&gt;embeddings_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_glove&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, we have a dictionary containing every word in the glove pre-trained vectors and their corresponding vector in a dictionary. Let’s check the embedding for a word.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'london'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yb9cgCWi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/glove_vector_shape.png%3Fw%3D760%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yb9cgCWi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/glove_vector_shape.png%3Fw%3D760%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Universal Sentence Encoding&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1C2pLVar--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/universal_sentence_encoding.png%3Fw%3D512%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1C2pLVar--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/universal_sentence_encoding.png%3Fw%3D512%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Till now we were dealing with representing words and these techniques are most useful for word-level operations. Sometimes we need to explore sentence level operations. These encoders are called sentence encoders.&lt;/p&gt;

&lt;p&gt;A good sentence encoder is expected to encode sentences in such a way that the vectors of &lt;strong&gt;similar sentences have a minimal distance between them in the vector space.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It is sunny today&lt;/li&gt;
&lt;li&gt;It is rainy today&lt;/li&gt;
&lt;li&gt;It is cloudy today.
These sentences will be encoded and represented so that they are close to each other in the vector space.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s move on and check how to implement universal sentence encoder and find similar sentences using it.&lt;/p&gt;

&lt;p&gt;You can download the pertained vectors from here.&lt;/p&gt;

&lt;p&gt;We will load the module using the TensorFlow hub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;module_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"../input/universalsentenceencoderlarge4"&lt;/span&gt;
&lt;span class="c1"&gt;# Import the Universal Sentence Encoder's TF Hub module
&lt;/span&gt;&lt;span class="n"&gt;embed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Next, we will create the embedding for each sentence in our list.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sentence_list&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;question_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;sentence_emb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence_list&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s"&gt;'outputs'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Here is an article to &lt;a href="https://www.dlology.com/blog/keras-meets-universal-sentence-encoder-transfer-learning-for-text-data/"&gt;read more about universal sentence encoder.&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Elmo, BERT, and others.
&lt;/h1&gt;

&lt;p&gt;When using any of the above embedding methods one thing we forget about is the context in which the word was used. This is &lt;strong&gt;one of the main drawbacks of such word representation models.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, the word word “stick” will be represented using the same vector independent of the context in which it was used which doesn’t make much sense. With the recent developments in the field of NLP and models like BERT (bidirectional encoder representation from transformers), this has been made possible. &lt;a href="http://jalammar.github.io/illustrated-bert/"&gt;Here is an article to read more.&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Text Classification
&lt;/h1&gt;

&lt;p&gt;In this section, we will prepare the embedding matrix which is passed to the Keras Embedding layer to learn text representations. You can use the same steps to prepare the corpus for any word-level embedding methods.&lt;/p&gt;

&lt;p&gt;Let’s create a word index and fix a maximum sentence length, pad each sentence in our corpus using &lt;strong&gt;Keras Tokenizer&lt;/strong&gt; and pad_sequences.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MAX_LEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;tokenizer_obj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Tokenizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tokenizer_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit_on_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sequences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;texts_to_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tweet_pad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pad_sequences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sequences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;maxlen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MAX_LEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;truncating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'post'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'post'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let’s check the number of unique words in our corpus,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Number of unique words:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Using this word index dictionary and embedding dictionary you can create an &lt;strong&gt;embedding matrix&lt;/strong&gt; for our corpus. This embedding matrix is passed on to the &lt;strong&gt;embedding layer&lt;/strong&gt; of the neural network to learn word representations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;prepare_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;num_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embedding_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;num_words&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb_size&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;num_words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="n"&gt;emb_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;emb_vec&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;embedding_matrix&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;emb_vec&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embedding_matrix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We can define our neural network and pass this embedding index to the Embedding layer of the network. We pass the vectors onto the Embedding layer and set &lt;strong&gt;trainable=False&lt;/strong&gt; to prevent the weights from being updated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;new_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding_matrix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_LEN&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;

    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_words&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;embedding_matrix&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                  &lt;span class="n"&gt;trainable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Bidirectional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_sequences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'lstm_layer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
             &lt;span class="n"&gt;dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recurrent_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;))(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GlobalAveragePool1D&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"sigmoid"&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'binary_crossentropy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'adam'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'accuracy'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;For example, to run the model using word2vec embeddings,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;load_word2vec&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;embedding_matrix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prepare_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding_matrix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                  &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x31GkbZ0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output8-1.png%3Fw%3D816%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x31GkbZ0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output8-1.png%3Fw%3D816%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can call your desired type of embeddings and follow the same steps to implement any of them.&lt;/p&gt;

&lt;h1&gt;
  
  
  Comparison
&lt;/h1&gt;

&lt;p&gt;So which text classification method worked best in our example problem?&lt;/p&gt;

&lt;p&gt;You can use Neptune to compare the performance of our model using different embeddings by simply setting up an experiment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EdWCJ7xA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Screenshot-from-2020-02-28-17-09-51.png%3Fresize%3D1024%252C433%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EdWCJ7xA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/Screenshot-from-2020-02-28-17-09-51.png%3Fresize%3D1024%252C433%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Glove embeddings performed a little better in test sets when compared to the other two embeddings. You may be able to get better results by doing extensive cleaning on the data and tuning the model.&lt;/p&gt;

&lt;p&gt;You can &lt;a href="https://ui.neptune.ai/shared/blog-text-classification/experiments?viewId=a68c3b2a-65c0-412d-94ca-c0c749e0a84f&amp;amp;utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-document-classification-small-datasets"&gt;explore experiments&lt;/a&gt; here if you want to.&lt;/p&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;In this article, we discussed and implemented different feature representation methods for text classification that you can use for smaller datasets.&lt;/p&gt;

&lt;p&gt;Hopefully, you will find them useful in your projects.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally &lt;a href="https://neptune.ai/blog/document-classification-small-datasets?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-document-classification-small-datasets"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt; &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>8 Creators and Core Contributors Talk About Their Model Training Libraries From PyTorch Ecosystem</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Tue, 21 Apr 2020 10:41:57 +0000</pubDate>
      <link>https://dev.to/jakubczakon/8-creators-and-core-contributors-talk-about-their-model-training-libraries-from-pytorch-ecosystem-5fc5</link>
      <guid>https://dev.to/jakubczakon/8-creators-and-core-contributors-talk-about-their-model-training-libraries-from-pytorch-ecosystem-5fc5</guid>
      <description>&lt;p&gt;I started using Pytorch to train my models back in early 2018 with 0.3.1 release. I got hooked by the Pythonic feel, ease of use and flexibility.&lt;/p&gt;

&lt;p&gt;It was just so much easier to do things in Pytorch than in Tensorflow or Theano.&lt;br&gt;
But &lt;strong&gt;something I missed was the Keras-like high-level interface to PyTorch&lt;/strong&gt; and there was not much out there back then.&lt;/p&gt;

&lt;p&gt;Fast-forward to 2020, and &lt;strong&gt;we have 6 high-level training APIs in the &lt;a href="https://pytorch.org/ecosystem/"&gt;PyTorch Ecosystem.&lt;/a&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skorch&lt;/li&gt;
&lt;li&gt;Catalyst&lt;/li&gt;
&lt;li&gt;Fastai&lt;/li&gt;
&lt;li&gt;PyTorch Ignite&lt;/li&gt;
&lt;li&gt;PyTorch Lightning&lt;/li&gt;
&lt;li&gt;TorchBearer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But which one should you choose?&lt;br&gt;
What are the pros and cons of using each one?&lt;/p&gt;

&lt;p&gt;I thought: who can &lt;strong&gt;explain the differences between those libraries&lt;/strong&gt; better than the authors themselves?&lt;br&gt;
I picked up my proverbial phone and asked them to write an article with me. They all agreed and this is how this post was created!&lt;/p&gt;

&lt;p&gt;So, I’ve asked authors to talk about the following aspects of their libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Philosophy of the project&lt;/li&gt;
&lt;li&gt;API structure&lt;/li&gt;
&lt;li&gt;The learning curve for new users&lt;/li&gt;
&lt;li&gt;Build-in features (what you get out-of-the-box)&lt;/li&gt;
&lt;li&gt;Extension capabilities (simplicity of integration in research)&lt;/li&gt;
&lt;li&gt;Reproducibility&lt;/li&gt;
&lt;li&gt;Distributed training&lt;/li&gt;
&lt;li&gt;Productionalization&lt;/li&gt;
&lt;li&gt;Popularity
… and they really did answer thoroughly 🙂&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  &lt;a href="https://github.com/skorch-dev/skorch"&gt;Skorch&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--88fDNi-z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AicnImNpQhb6DUfhPAZEQ9g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--88fDNi-z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AicnImNpQhb6DUfhPAZEQ9g.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;philosophy&lt;/strong&gt; behind skorch development can be summarized as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;follow the sklearn API&lt;/li&gt;
&lt;li&gt;don’t hide PyTorch&lt;/li&gt;
&lt;li&gt;don’t reinvent the wheel&lt;/li&gt;
&lt;li&gt;be hackable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These principles laid out the design space within which we operate. Regarding the &lt;strong&gt;scikit-learn API&lt;/strong&gt;, it presents itself, most obviously, in how you &lt;strong&gt;train and predict&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;skorch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NeuralNetClassifier&lt;/span&gt;

&lt;span class="n"&gt;net&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NeuralNetClassifier&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Because skorch is using this &lt;strong&gt;simple and well-established API&lt;/strong&gt; everyone should be able to start using it very quickly.&lt;/p&gt;

&lt;p&gt;But the &lt;strong&gt;sklearn integration goes deeper&lt;/strong&gt; than calling “fit” and “predict”. You can seamlessly integrate your skorch model within sklearn &lt;code&gt;Pipeline&lt;/code&gt;s, use sklearn’s numerous metrics (no need to re-implement F1, R², etc.), and use it with GridSearchCV.&lt;/p&gt;

&lt;p&gt;When it comes to &lt;strong&gt;parameter sweeps&lt;/strong&gt;: you can use any other hyperparameter search strategy as long as there is a sklearn-compatible implementation.&lt;/p&gt;

&lt;p&gt;We are especially proud that &lt;strong&gt;you can search on almost any hyper-parameter without additional work&lt;/strong&gt;. For example, if your module has an initialization parameter called num_units, you can grid search that parameter right away.&lt;/p&gt;

&lt;p&gt;Here is a list of things you can grid search out-of-the-box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;any parameter on your Module (number of units and layers, nonlinearity, dropout rate, …)&lt;/li&gt;
&lt;li&gt;optimizer (learning rate, momentum…)&lt;/li&gt;
&lt;li&gt;criterion&lt;/li&gt;
&lt;li&gt;DataLoader (batch size, shuffling, …)&lt;/li&gt;
&lt;li&gt;callbacks (any parameter, even on your custom callbacks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how it looks like in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;'lr'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'max_epochs'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'module__num_units'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'optimizer__momentum'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'iterator_train__shuffle'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'callbacks__mycallback__someparam'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;net&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NeuralNetClassifier&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="n"&gt;gs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'accuracy'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;gs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As far as I’m aware, no other framework provides this flexibility. On top of that, by using the dask parallel backend, you can &lt;strong&gt;&lt;a href="https://skorch.readthedocs.io/en/stable/user/parallelism.html"&gt;distribute the hyper-parameter search&lt;/a&gt;&lt;/strong&gt; across your cluster without too much hassle.&lt;/p&gt;

&lt;p&gt;Using the mature sklearn API, skorch users can &lt;strong&gt;avoid the boilerplate code&lt;/strong&gt; that is typically seen when writing train loops, validation loops, and hyper-parameter search in pure PyTorch.&lt;/p&gt;

&lt;p&gt;From the PyTorch side, we decided not to hide the backend behind an abstraction layer, as is the case in keras, for example. Instead, &lt;strong&gt;we expose numerous components known from PyTorch&lt;/strong&gt;. As a user, you can use PyTorch’s Dataset (think torchvision, including TTA), DataLoader, and learning rate schedulers. Most importantly, you can use PyTorch Modules with almost no restrictions.&lt;/p&gt;

&lt;p&gt;We thus made a conscious effort to &lt;strong&gt;re-use as many existing features from sklearn and PyTorch as possible&lt;/strong&gt; instead of re-inventing the wheel. This makes skorch &lt;strong&gt;easy to use on top of your existing codebase&lt;/strong&gt; or to remove it after your initial experimentation phase without any lock-in effect.&lt;/p&gt;

&lt;p&gt;For instance, you can replace the neural net with any sklearn model or you can extract the PyTorch module and use it without skorch.&lt;/p&gt;

&lt;p&gt;On top of re-using existing features, we added some of our own. Most notably, skorch &lt;strong&gt;works with many common data types&lt;/strong&gt; out-of-the-box. On top of Datasets, you can use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;numpy arrays,&lt;/li&gt;
&lt;li&gt;torch tensors,&lt;/li&gt;
&lt;li&gt;pandas DataFrames,&lt;/li&gt;
&lt;li&gt;Python dictionaries holding heterogeneous data,&lt;/li&gt;
&lt;li&gt;external/custom datasets like &lt;a href="https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Transfer_Learning.ipynb"&gt;ImageFolder from torchvision&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ve put extra effort to make these work well with sklearn.&lt;/p&gt;

&lt;p&gt;Additionally, we implemented a simple yet &lt;strong&gt;powerful callback system&lt;/strong&gt;, which you can use to &lt;strong&gt;adapt most of skorch’s behavior to your liking&lt;/strong&gt;. Some of the callbacks that we provide are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;learning rate schedulers,&lt;/li&gt;
&lt;li&gt;scoring functions (using custom or sklearn metrics),&lt;/li&gt;
&lt;li&gt;early stopping,&lt;/li&gt;
&lt;li&gt;checkpointing,&lt;/li&gt;
&lt;li&gt;parameter freezing,&lt;/li&gt;
&lt;li&gt;and TensorBoard and Neptune integration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this is not enough to satisfy your customization needs, &lt;strong&gt;we took pains to facilitate implementing your own callbacks or your own model trainers&lt;/strong&gt;. Our documentation contains examples of how to implement &lt;a href="https://skorch.readthedocs.io/en/stable/user/callbacks.html#callback-base-class"&gt;custom callbacks&lt;/a&gt; and custom trainers, modifying every possible behavior right down to the training step.&lt;/p&gt;

&lt;p&gt;The philosophy of not re-inventing the wheel should make skorch easy to learn for anyone who is familiar with sklearn and PyTorch. And since we designed skorch around customization and flexibility, it shouldn’t be too hard to master. To learn more about skorch check out these &lt;a href="https://github.com/skorch-dev/skorch/tree/master/examples"&gt;examples&lt;/a&gt; and &lt;a href="https://skorch.readthedocs.io/en/stable/user/callbacks.html#callback-base-class"&gt;notebooks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Skorch is &lt;strong&gt;geared towards, and used in, production&lt;/strong&gt;. We addressed some common issues regarding productionalization, specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we make sure to &lt;strong&gt;be backward compatible&lt;/strong&gt; and to give a sufficiently long deprecation period where necessary.&lt;/li&gt;
&lt;li&gt;you can &lt;strong&gt;train on GPU and serve on CPU&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;you can &lt;strong&gt;pickle a whole sklearn&lt;/strong&gt; Pipeline containing the skorch model for later re-use.&lt;/li&gt;
&lt;li&gt;we provide a helper function to &lt;strong&gt;&lt;a href="https://github.com/skorch-dev/skorch/tree/master/examples/cli"&gt;turn your training code into a command line script&lt;/a&gt;&lt;/strong&gt; that exposes all your model parameters, including their documentation, as command line arguments, with just three lines of extra code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That being said, I have implemented, or know people who have implemented, more &lt;strong&gt;research&lt;/strong&gt;-y stuff, like &lt;strong&gt;GANs&lt;/strong&gt; and numerous types of &lt;strong&gt;semi-supervised learning&lt;/strong&gt; techniques. This does require more profound knowledge of skorch, though, so you might have to dig deeper in the docs or ask us for pointers on github.&lt;/p&gt;

&lt;p&gt;I personally haven’t come across anyone using skorch with reinforcement learning, but I would like to hear what experience people had with that.&lt;/p&gt;

&lt;p&gt;Since our initial release of skorch in the summer of 2017, the project has matured a lot and an &lt;strong&gt;active community has grown&lt;/strong&gt; around it. In a typical week, a handful of issues are opened on github or a question is asked on stackoverflow. We answer most questions within a day, and if there is a good feature request or bug report, we try to guide the reporter towards implementing it themselves.&lt;/p&gt;

&lt;p&gt;This way, &lt;strong&gt;we have had more than 20 contributors over the project’s lifetime, with 3 of them being regulars&lt;/strong&gt;, which means the project’s health is not dependent on a single person.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The big difference between skorch and some other higher-level frameworks&lt;/strong&gt;, say fastai, is that skorch doesn’t come “batteries-included”. That means, it’s up to the user to implement their own modules or to use the modules of one of the many existing collections (say, torchvision). Skorch provides the skeleton, but you have to bring the meat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When not to use Skorch&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;super custom PyTorch code, possibly reinforcement learning&lt;/li&gt;
&lt;li&gt;backend agnostic code (switch between PyTorch, tensorflow, …)&lt;/li&gt;
&lt;li&gt;there is no need at all for the sklearn API&lt;/li&gt;
&lt;li&gt;avoid a very slight performance overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to use skorch&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gain sklearn API and all associated benefits like hyper-parameter search&lt;/li&gt;
&lt;li&gt;most PyTorch workflows just work&lt;/li&gt;
&lt;li&gt;avoid boilerplate, standardize code&lt;/li&gt;
&lt;li&gt;use some of the many utilities discussed above&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  &lt;a href="https://catalyst-team.github.io/catalyst/"&gt;Catalyst&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7uyQjzzI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AE7dVs1hLgLIVfYgDsCMGGw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7uyQjzzI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AE7dVs1hLgLIVfYgDsCMGGw.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Philosophy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The idea behind the &lt;a href="https://github.com/catalyst-team/catalyst"&gt;Catalyst&lt;/a&gt; is quite simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;collect all the technical, dev-heavy, Deep Learning stuff in a framework,&lt;/li&gt;
&lt;li&gt;make it easy to re-use boring day-to-day components,&lt;/li&gt;
&lt;li&gt;focus on research and hypothesis testing in our projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To make that happen we looked at a typical Deep Learning project, which usually has the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dataloader&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataloaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataloader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If you think about it, most of the time, all you need to do is specify the handle method for the new model and how batches of data should be fed to that model. &lt;strong&gt;Why then, so much of our time is spent implementing pipelines and debugging training loops rather than developing something new or testing a hypothesis?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We realized that it is possible to &lt;strong&gt;separate the engineering from the research&lt;/strong&gt; so that  we can &lt;strong&gt;invest our time once&lt;/strong&gt; in the high-quality, reusable &lt;strong&gt;engineering&lt;/strong&gt; backbone and &lt;strong&gt;use it across all the projects.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is how Catalyst was born: an Open Source PyTorch framework, that allows you to write compact but full-features pipelines, &lt;strong&gt;abstracts engineering boilerplate away&lt;/strong&gt; and lets you focus on the main part of your project.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Our mission at Catalyst. Team is to use our software engineering and deep learning expertise to standardize workflows and enable cross-domain communication between deep learning and reinforcement learning researchers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We believe that reduced development friction and free flow of ideas will lead to future breakthroughs in DL and such an R&amp;amp;D Ecosystem will help make that happen.&lt;br&gt;
&lt;strong&gt;The learning curve&lt;/strong&gt;&lt;br&gt;
Catalyst can be easily adopted by both DL newcomers and seasoned experts thanks to two APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Notebook API&lt;/strong&gt;, which was developed with a focus on &lt;strong&gt;easy experimentation and Jupyter Notebooks&lt;/strong&gt; usage - to start your path into reproducible DL research.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Config API&lt;/strong&gt;, which mostly focuses on &lt;strong&gt;scalability and CLI interface&lt;/strong&gt; - to bring the power of DL/RL even on large clusters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When it comes to PyTorch user experience we really want to keep it as simple as possible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You define your loaders, model, criterion, optimizer, and scheduler as you usually would:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;torch&lt;/span&gt;

&lt;span class="c1"&gt;# data
&lt;/span&gt;&lt;span class="n"&gt;loaders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"train"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="s"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...}&lt;/span&gt;

&lt;span class="c1"&gt;# model, criterion, optimizer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Net&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;criterion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CrossEntropyLoss&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Adam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;scheduler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lr_scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReduceLROnPlateau&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;and you pass those PyTorch objects to Catalyst Runner
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;catalyst.dl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SupervisedRunner&lt;/span&gt;

&lt;span class="c1"&gt;# experiment setup
&lt;/span&gt;&lt;span class="n"&gt;logdir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"./logdir"&lt;/span&gt;
&lt;span class="n"&gt;num_epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;

&lt;span class="c1"&gt;# model runner
&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SupervisedRunner&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# model training
&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;criterion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;criterion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;loaders&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;loaders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;logdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logdir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;Clearly decoupled engineering from deep learning with almost no boilerplate. This is how we feel deep learning code should look like.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To get started with both APIs you can follow our &lt;a href="https://github.com/catalyst-team/catalyst#docs-and-examples"&gt;tutorials and pipelines&lt;/a&gt; or if you don’t want to choose, just check out the most common ones: &lt;a href="https://colab.research.google.com/github/catalyst-team/catalyst/blob/master/examples/notebooks/classification-tutorial.ipynb"&gt;classification&lt;/a&gt; and &lt;a href="https://colab.research.google.com/github/catalyst-team/catalyst/blob/master/examples/notebooks/segmentation-tutorial.ipynb"&gt;segmentation&lt;/a&gt;.&lt;br&gt;
 &lt;strong&gt;Design and Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most interesting part about &lt;strong&gt;Notebook and Config API is that they use the same “backend” logic&lt;/strong&gt; – Experiment, Runner, State and Callback abstractions, which are the core features of Catalyst.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gosia67316552/8-creators-and-core-contributors-talk-about-their-model-training-libraries-from-pytorch-ecosystem-4af8-temp-slug-2976225/edit"&gt;&lt;strong&gt;Experiment&lt;/strong&gt;&lt;/a&gt;: an abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/catalyst-team/catalyst/blob/master/catalyst/core/runner.py"&gt;&lt;strong&gt;Runner&lt;/strong&gt;&lt;/a&gt;: a class that knows how to run an experiment. It contains all the logic of &lt;strong&gt;how&lt;/strong&gt; to run the experiment, stages (another distinctive feature of Catalyst), epoch and batches.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/catalyst-team/catalyst/blob/master/catalyst/core/state.py"&gt;&lt;strong&gt;State&lt;/strong&gt;&lt;/a&gt;: some intermediate storage between Experiment and Runner that saves the current &lt;strong&gt;state&lt;/strong&gt; of the Experiments – model, criterion, optimizer, schedulers, metrics, loggers, loaders, etc&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/catalyst-team/catalyst/blob/master/catalyst/core/callback.py"&gt;&lt;strong&gt;Callback&lt;/strong&gt;&lt;/a&gt;: a powerful abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility we allow callback execution anywhere in the training loop:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;on_stage_start&lt;/span&gt;
    &lt;span class="n"&gt;on_epoch_start&lt;/span&gt;
       &lt;span class="n"&gt;on_loader_start&lt;/span&gt;
           &lt;span class="n"&gt;on_batch_start&lt;/span&gt;
           &lt;span class="c1"&gt;# ... 
&lt;/span&gt;       &lt;span class="n"&gt;on_batch_end&lt;/span&gt;
    &lt;span class="n"&gt;on_epoch_end&lt;/span&gt;
&lt;span class="n"&gt;on_stage_end&lt;/span&gt;

&lt;span class="n"&gt;on_exception&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;By implementing these methods you can make any additional logic possible.&lt;/p&gt;

&lt;p&gt;As a result, you can &lt;strong&gt;implement any Deep Learning pipeline in a few lines of code&lt;/strong&gt; (and after Catalyst.RL 2.0 release – Reinforcement Learning pipeline), combining it from available primitives (thanks to the community, their number is growing every day).&lt;/p&gt;

&lt;p&gt;Everything else (Models, Criterions, Optimizers, Schedulers) are pure PyTorch primitives. &lt;strong&gt;Catalyst does not create any wrappers or abstractions&lt;/strong&gt; on top but rather makes it easy to reuse those building blocks between different frameworks and domains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extension capabilities / Simplicity of integration in research&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Thanks to flexible framework design and Callbacks-mechanism, Catalyst is easily extendable for a large number of DL-based projects. You can check out our Catalyst-powered repositories on &lt;a href="https://github.com/catalyst-team/awesome-catalyst-list#repositories"&gt;awesome-catalyst-list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you are interested in &lt;strong&gt;Reinforcement Learning&lt;/strong&gt; – there are a large number of RL-based repos and competition solutions also. To compare Catalyst.RL with other RL frameworks you could check out &lt;a href="https://docs.google.com/spreadsheets/d/1EeFPd-XIQ3mq_9snTlAZSsFY7Hbnmd7P5bbT8LPuMn0/edit#gid=0"&gt;Open Source RL list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other built-in features (what you get out of the box)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Knowing that you can extend it easily gives comfort but there are &lt;strong&gt;a ton of features that you get out-of-the-box&lt;/strong&gt;. Some of them include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Based on a flexible callback system, Catalyst has &lt;strong&gt;easily integrated&lt;/strong&gt; such &lt;strong&gt;common Deep Learning&lt;/strong&gt; best practices, such as gradient accumulation, gradient clipping, weight decay correction, top-K best checkpoints saving, tensorboard integration, and many other useful day-to-day deep learning utils.&lt;/li&gt;
&lt;li&gt;Thanks to our contributors and contrib modules, &lt;strong&gt;Catalyst has access to all recent SOTA features&lt;/strong&gt;, like AdamW, OneCycle, SWA, Ranger, LookAhead, and many other research developments.&lt;/li&gt;
&lt;li&gt;Moreover, &lt;strong&gt;we integrate with&lt;/strong&gt; such &lt;strong&gt;popular libraries&lt;/strong&gt; like Nvidia apex, &lt;a href="https://github.com/albumentations-team/albumentations"&gt;Albumentations&lt;/a&gt;, &lt;a href="https://github.com/qubvel/segmentation_models.pytorch"&gt;SMP&lt;/a&gt;, &lt;a href="https://github.com/huggingface/transformers"&gt;transformers&lt;/a&gt;, wandb, and neptune.ai just out of the box to make your research more user-friendly. Thanks to such integrations, Catalyst has full support for test-time augmentations, mixed precision, and distributed training.&lt;/li&gt;
&lt;li&gt;For the industry needs, we also have framework-wise support for PyTorch tracing which makes putting models in production easier. Furthermore, we deploy predefined Catalyst-based docker images with each release for easier integration.&lt;/li&gt;
&lt;li&gt;Finally, we support additional solutions for both model serving – ReAction (industry-oriented) and experiments monitoring – Alchemy (research-oriented).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is integrated into the library and covered by CI tests (we have a dedicated gpu-server for that). And thanks to Catalyst scripts, you can schedule a large number of experiments and run them in parallel over all available GPUs from the command line (check catalyst-parallel-run for more info).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ve put a lot of work to make experiments that you run with Catalyst reproducible. Thanks to library-wise determinism &lt;strong&gt;Catalyst-based experiments are reproducible not only between server runs on one server but also between several runs over different servers&lt;/strong&gt; and different hardware parts (with docker encapsulation, of course). See experiments here if interested.&lt;/p&gt;

&lt;p&gt;Moreover, Reinforcement Learning experiments are also reproducibility-oriented (as RL far as RL can be reproducible). For example, with synchronous experiment runs, you can achieve very close performance, thanks to determinism in sampled trajectories. This is notoriously hard and as far as I am aware Catalyst has the most reproducible RL pipelines out there.&lt;/p&gt;

&lt;p&gt;To achieve this new level of reproducibility in DL and RL we had to create several additional features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full source code dumping&lt;/strong&gt;: thanks to Experiments, Runner and Callbacks abstractions, it’s quite easy to save these primitive for further usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Catalyst source code dumpling&lt;/strong&gt;: with such feature even working with the dev version of Catalyst, you can always reproduce experiment results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment versioning&lt;/strong&gt;: Catalyst dumps pip and conda packages versions (it can be later used to define your docker images)&lt;/li&gt;
&lt;li&gt;Finally, Catalyst supports several &lt;strong&gt;monitoring tools&lt;/strong&gt;, like Alchemy, Neptune.ai, Wandb to store all your experiment metrics and additional info for better research progress tracking and reproducibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to those library-wise solutions, you can be sure that the pipelines you implement in Catalyst are reproducible with all the experiment logs and checkpoints saved for future reference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distributed training&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Based on our integrations, Catalyst already has native support for distributed training. Moreover, we support Slurm training and working on better Kubernetes integration for both DL and RL pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Productionalization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that we know how Catalyst helps with deep learning research we can talk about &lt;strong&gt;deploying trained models to production&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As was already mentioned, Catalyst &lt;strong&gt;supports model tracing out-of-the-box&lt;/strong&gt;. It lets you convert PyTorch models (that use Python code) to TorchScript model (that has everything integrated). TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.&lt;/p&gt;

&lt;p&gt;Additionally, to help Catalyst users deploy their pipelines into production systems, Catalyst.Team has a &lt;strong&gt;&lt;a href="https://github.com/catalyst-team/catalyst#docker"&gt;Docker Hub&lt;/a&gt; with pre-build Catalyst-based images&lt;/strong&gt;( including fp16 support).&lt;/p&gt;

&lt;p&gt;Moreover, to help researchers bring their ideas into production and real-world applications, we’ve created Catalyst.Ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/catalyst-team/reaction"&gt;&lt;strong&gt;Reaction&lt;/strong&gt;&lt;/a&gt;: our own &lt;strong&gt;PyTorch Serving solution&lt;/strong&gt; with sync/async API, batch mode support, quest, and all other typical backends that you would expect from a well-designed production system.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/catalyst-team/alchemy"&gt;&lt;strong&gt;Alchemy&lt;/strong&gt;&lt;/a&gt;: our &lt;strong&gt;monitoring tools&lt;/strong&gt; for experiment tracking, model comparison and research results sharing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Popularity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since the first pypi release 12 months ago Catalyst has gained ~1.5k stars on Github and over 100k downloads. We are proud to be part of such an Open Source Ecosystem and extremely grateful to all our users and contributors for constant support and feedback.&lt;/p&gt;

&lt;p&gt;One of the online communities that was especially helpful was ods.ai: one of the largest slack channels for Data Scientists and Machine learning practitioners in the world (40k+ users). Without their ideas and feedback, Catalyst wouldn’t get where it is today.&lt;/p&gt;

&lt;p&gt;Special thanks to our early-adopters,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/backaggle"&gt;Bac Nguyen Xuan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/bloodaxe"&gt;Eugene Khvedchenya&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/alexgaziev"&gt;Alex Gaziev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;and &lt;a href="https://github.com/catalyst-team/catalyst/graphs/contributors"&gt;contributors&lt;/a&gt;
that make it all worth it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Acknowledgments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since the beginning of the development of the Сatalyst, a lot of people have influenced it in a lot of different ways. As a token of my appreciation a HUGE THANK YOU to: I want to express personal thanks to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/roman-tezikov/"&gt;Roman Tezikov&lt;/a&gt; for great Catalyst tutorials&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/yauheni-kachan/"&gt;Eugene Kachan&lt;/a&gt; for many Config API improvements and pipelines&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/dkuryakin/"&gt;David Kuryakin&lt;/a&gt; for ReAction design&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.facebook.com/grinchuk.alexey"&gt;Aleksey Grinchuk&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/vkhrulkov/"&gt;Valentin Khrulkov&lt;/a&gt; for many RL algorithms implemented together&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/alexgaziev/"&gt;Alex Gaziev&lt;/a&gt; for a bunch of Config API improvements&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/andrey-zharkov-8554a1153/"&gt;Andrey Zharkov&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/artem-zolkin-b5155571/"&gt;Artem Zolkin&lt;/a&gt; for Catalyst.GAN initiative&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/kashnitskiy/"&gt;Yury Kashnitsky&lt;/a&gt; for Catalyst.NLP movement&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/ewan-semyonov/"&gt;Evgeny Semyonov&lt;/a&gt; for &lt;a href="https://github.com/catalyst-team/mlcomp"&gt;MLComp&lt;/a&gt; creation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/cvtalks/"&gt;Eugene Khvedchenya&lt;/a&gt; for &lt;a href="https://github.com/BloodAxe/pytorch-toolbelt"&gt;Pytorch-toolbelt&lt;/a&gt; library&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/bac-nguyen-xuan-70340b66/"&gt;Nguyen Xuan Bac&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/andlukyane/"&gt;Andrey Lukyanenko&lt;/a&gt; for many Kaggle Catalyst-based solutions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/vsevolod-poletaev-468071165/"&gt;Vsevolod Poletaev&lt;/a&gt; for Experiment idea and PoC&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/belskikh/"&gt;Aleksandr Belskikh&lt;/a&gt; for Callbacks-based system inspiration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/n01z3/"&gt;Artur Kuzin&lt;/a&gt; for multi-stage pipelines support requirement&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/iglovikov/"&gt;Vladimir Iglovikov&lt;/a&gt; for countless pieces of useful advice
and &lt;a href="https://www.facebook.com/istepanenko"&gt;Ivan Stepanenko&lt;/a&gt; for awesome Catalyst. Ecosystem design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to all that support, Catalyst has become a part of Kaggle docker image, was added to the &lt;a href="https://pytorch.org/ecosystem/"&gt;PyTorch Ecosystem&lt;/a&gt; and now we are developing &lt;a href="https://docs.google.com/presentation/d/1D-yhVOg6OXzjo9K_-IS5vSHLPIUxp1PEkFGnpRcNCNU/edit#slide=id.g6d52115af0_0_163"&gt;our own DL R&amp;amp;D Ecosystem&lt;/a&gt; to accelerate your research and production needs.&lt;/p&gt;

&lt;p&gt;To read more about &lt;strong&gt;Catalyst. Ecosystem&lt;/strong&gt;, please check &lt;a href="https://docs.google.com/presentation/d/1D-yhVOg6OXzjo9K_-IS5vSHLPIUxp1PEkFGnpRcNCNU/edit#slide=id.g6d52115af0_0_163"&gt;our vision&lt;/a&gt; and &lt;a href="https://github.com/catalyst-team/catalyst/blob/master/MANIFEST.md"&gt;project manifesto&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, we are always happy to help our &lt;a href="https://github.com/catalyst-team/awesome-catalyst-list#trusted-by"&gt;Catalyst.Friends&lt;/a&gt;: companies/startups/research labs, who are already using Catalyst or are considering using it for their next project.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Thanks for reading, and…&lt;br&gt;
Break the cycle – use Catalyst!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;When to use Catalyst&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To have flexible and reusable codebase without boilerplate.
You want to share your expertise with other researchers from different Deep Learning areas.&lt;/li&gt;
&lt;li&gt;Boost your research speed with Catalyst.Ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When not to use Catalyst&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have only started your deep learning path – in this way low-level PyTorch is a great introduction.&lt;/li&gt;
&lt;li&gt;You want to create very specific, custom, pipelines with a bunch of irreproducible tricks 🙂&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  &lt;a href="https://docs.fast.ai/"&gt;Fastai&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2kL7tC8Y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AImh2w0mMFDcAgIFh3LxIVg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2kL7tC8Y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AImh2w0mMFDcAgIFh3LxIVg.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What follows is about the &lt;strong&gt;version 2 of fastai that will be released in July 2020&lt;/strong&gt;. You can preview it &lt;a href="https://github.com/fastai/fastai2"&gt;here&lt;/a&gt; and it is documented &lt;a href="http://dev.fast.ai/"&gt;here&lt;/a&gt;. If you read this post after it has been released, it will be in the &lt;a href="https://github.com/fastai/fastai_"&gt;main repository&lt;/a&gt; and will be documented &lt;a href="https://docs.fast.ai/"&gt;there&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Fastai is a deep learning library which provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;practitioners&lt;/strong&gt;: with high-level components that can quickly and easily provide state of the art results in standard deep learning domains,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;researchers&lt;/strong&gt;: with low-level components that can be mixed and matched to build new things.
It aims to do both things without substantial compromises in ease of use, flexibility, or performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is &lt;strong&gt;possible thanks to a carefully layered architecture&lt;/strong&gt;. It expresses common underlying patterns of many deep learning and data processing techniques in terms of &lt;strong&gt;decoupled abstractions&lt;/strong&gt;. What is important is that these abstractions can be &lt;strong&gt;expressed clearly and concisely&lt;/strong&gt; which makes fastai approachable and &lt;strong&gt;rapidly productive, but also deeply hackable and configurable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A high-level API offers &lt;strong&gt;customizable models with sensible defaults&lt;/strong&gt;, which is built on top of a &lt;strong&gt;hierarchy of lower-level building blocks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article covers a representative subset of the features of the library. For details, see our the &lt;a href="https://arxiv.org/abs/2002.04688"&gt;fastai paper&lt;/a&gt;, and the documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When talking about fastai API one needs to distinguish &lt;strong&gt;High and Middle/Low-level API&lt;/strong&gt;.&lt;br&gt;
We will talk about both in the following sections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-level API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The high-level API is very useful to beginners and practitioners who are &lt;strong&gt;mainly interested in applying pre-existing deep learning methods&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It offers concise APIs for main application areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vision,&lt;/li&gt;
&lt;li&gt;text,&lt;/li&gt;
&lt;li&gt;tabular&lt;/li&gt;
&lt;li&gt;time-series analysis,&lt;/li&gt;
&lt;li&gt;recommendation (collaborative filtering)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These &lt;strong&gt;APIs choose intelligent default values&lt;/strong&gt; and behaviors based on all available information.&lt;/p&gt;

&lt;p&gt;For instance, fastai provides a Learner &lt;strong&gt;class&lt;/strong&gt; which brings together architecture, optimizer, and data, and &lt;strong&gt;automatically chooses an appropriate loss function where possible&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To give another example, generally, a training set should be shuffled, and a validation set should not be shuffled. fastai provides a single Dataloaders &lt;strong&gt;class&lt;/strong&gt; which automatically &lt;strong&gt;constructs validation and training data loaders with these details already handled&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To see how those “clear and concise code” principles in action let’s fine-tune an &lt;a href="http://www.image-net.org/"&gt;imagenet&lt;/a&gt; model on the &lt;a href="https://www.robots.ox.ac.uk/~vgg/data/pets/"&gt;Oxford IIT Pets dataset&lt;/a&gt; and achieve close to state-of-the-art accuracy within a couple of minutes of training on a single GPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;fastai.vision.all&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;

&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;untar_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URLs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PETS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageDataloaders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_name_re&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fnames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_image_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s"&gt;"images"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;r'/([^/]+)_\d+.jpg$'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;item_tfms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RandomResizedCrop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;batch_tfms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;aug_transforms&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_warp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
                &lt;span class="n"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;imagenet_stats&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

&lt;span class="n"&gt;learn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cnn_learner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resnet34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;error_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;learn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fine_tune&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is not an excerpt&lt;/strong&gt;. These are all of the lines of code necessary for this task.&lt;br&gt;
Each line of code does one important task, allowing the user to focus on what they need to do, rather than minor details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;fastai.vision.all&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;imports all the necessary pieces&lt;/strong&gt; from the library. It’s important to note that the library has been designed carefully to avoid these styles of imports cluttering the namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;untar_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URLs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PETS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;downloads a standard dataset&lt;/strong&gt; from the fast.ai datasets collection (if not previously downloaded) to a configurable location, extracts it (if not previously extracted), and returns a pathlib.Path object with the extracted location.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageDataloaders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_name_re&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fnames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_image_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s"&gt;"images"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;pat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;r'/([^/]+)_\d+.jpg$'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;item_tfms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RandomResizedCrop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;batch_tfms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;aug_transforms&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_warp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;imagenet_stats&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;sets up the Dataloaders. Note the &lt;strong&gt;separation of item level and batch level transforms&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;item&lt;/strong&gt; transforms are applied to &lt;strong&gt;individual images on the CPU&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;batch&lt;/strong&gt; transforms are applied &lt;strong&gt;to a mini batch on the GPU&lt;/strong&gt; (if available).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;aug_transforms() selects a set of data augmentations. As always in fastai, a default that works well across a variety of vision datasets is chosen but can be fully customized if needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;learn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cnn_learner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resnet34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;error_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;em&gt;reates a Learner, which **combines an optimizer, a model, and the data&lt;/em&gt;* to train on. &lt;strong&gt;Each application (vision, text, tabular) has a customized function that creates a Learner&lt;/strong&gt;, which automatically handles whatever details it can for the user. For instance, in this image classification problem, it will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;download an ImageNet-pretrained model, if not already available,&lt;/li&gt;
&lt;li&gt;remove the classification head of the model,&lt;/li&gt;
&lt;li&gt;replace it with a head appropriate for this particular dataset,&lt;/li&gt;
&lt;li&gt;set appropriate optimizer, weight decay, learning rate, and so forth
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;learn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fine_tune&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;fine-tunes the model. In this case, it is using the 1-cycle policy, which is a recent best practice for training deep learning models but is not widely available in other libraries. A lot of things happen under the hood in .fine_tune():&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;annealing both the learning rates and the momentums,&lt;/li&gt;
&lt;li&gt;printing metrics on the validation set,&lt;/li&gt;
&lt;li&gt;displaying results in an HTML or console table&lt;/li&gt;
&lt;li&gt;recording losses and metrics after every batch and so forth.&lt;/li&gt;
&lt;li&gt;A GPU will be used if one is available.&lt;/li&gt;
&lt;li&gt;It will first train the head for one epoch while the body of the model is frozen, then fine-tunes for as many epochs given (here 4) using discriminative learning rates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the &lt;strong&gt;strengths of the fastai library is how consistent the API is across applications&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, fine-tuning a pretrained model on the IMDB dataset (a text classification task) using ULMFiT can be done in 6 lines of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;fastai2.text.all&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;

&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;untar_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URLs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IMDB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TextDataloaders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_folder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'test'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;learn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_classifier_learner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AWD_LSTM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_mult&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;learn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fine_tune&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Users get a very &lt;strong&gt;similar experience in other domains&lt;/strong&gt; like tabular, time series or recommendation systems. Once a Learner has been trained, you can explore the results with the command learn.show_results(). How those results are presented depends on the application, in vision you get labeled pictures, in text you get a dataframe summarizing samples, targets and predictions. In our pets classification example you would get something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zuRIGZY4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/vision_fastai.png%3Fw%3D704%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zuRIGZY4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/vision_fastai.png%3Fw%3D704%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the IMDb classification problem, you’d get something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4as9TvBv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/text.png%3Fw%3D796%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4as9TvBv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/text.png%3Fw%3D796%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another important high-level API component is the &lt;strong&gt;data block API&lt;/strong&gt;, which is an expressive API for data loading. It is the first attempt we are aware of, to systematically define all of the steps necessary to prepare data for a deep learning model, and give users a mix and match recipe book for combining these pieces (which we refer to as data blocks).&lt;/p&gt;

&lt;p&gt;Here is an example of how to use the data block API to get the &lt;a href="http://yann.lecun.com/exdb/mnist/"&gt;MNIST&lt;/a&gt; dataset ready for modeling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mnist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DataBlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ImageBlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PILImageBW&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;CategoryBlock&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;get_items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;get_image_files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GrandparentSplitter&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;get_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parent_label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;databunch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;untar_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URLs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MNIST_TINY&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;batch_tfms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mid and low-level API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the previous section, you saw how you can get a lot done quickly with the high-level api which has a ton of out-of-the-box functionalities. However, there are situations, &lt;strong&gt;when you need to tweak things or extend what is already there&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is where middle and low-level APIs come into the picture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;mid-level API&lt;/strong&gt; provides the core deep learning and data-processing methods for each of these applications,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;low-level API&lt;/strong&gt; provide a library of optimized primitives and functional and object-oriented foundations, which allows the mid-level to be developed and customized.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The training loop can be customized using the Learner &lt;strong&gt;novel two-way callback system&lt;/strong&gt;. It allows gradients, data, losses, control flow, and** anything** else &lt;strong&gt;to be read and changed at any point during training.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is a rich history of using callbacks to allow for customization of numeric software, and today nearly all modern deep learning libraries provide this functionality. However, fastai’s callback system is the first that we are aware of that supports the design principles necessary for &lt;strong&gt;complete two-way callbacks&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A callback **should be available at every single point during training&lt;/em&gt;* which gives users full flexibility. Every callback should be able to &lt;strong&gt;access every piece of information available&lt;/strong&gt; at that stage in the training loop, including hyper-parameters, losses, gradients, input and target data, and so forth ;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every callback should be &lt;strong&gt;able to modify all these pieces of information&lt;/strong&gt;, at any time before they are used,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All the tweaks of the training loop (different schedulers, mixed-precision training, reporting on &lt;a href="https://www.tensorflow.org/tensorboard"&gt;TensorBoard&lt;/a&gt;, &lt;a href="https://www.wandb.com/"&gt;wandb&lt;/a&gt;, &lt;a href="https://neptune.ai/"&gt;Neptune&lt;/a&gt;, or equivalent, &lt;a href="https://arxiv.org/abs/1710.09412"&gt;MixUp&lt;/a&gt;, oversampling strategies, distributed training, GAN training…) are implemented in callbacks that the &lt;strong&gt;end-user can mix and match with their own, making it easier to experiment with things&lt;/strong&gt; and do ablation studies. Convenience methods are there to add those callbacks for the user, making training in mixed precision as easy as saying&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;learn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;learn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_fp16&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;or &lt;strong&gt;training in a distributed environment&lt;/strong&gt; as easy as&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;learn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;learn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_distributed&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;fastai also provides a &lt;strong&gt;new, generic optimizer abstraction&lt;/strong&gt; that allows recent optimization techniques, like LAMB, RAdam or AdamW, to be implemented in a few lines of code.&lt;/p&gt;

&lt;p&gt;It is possible thanks to &lt;strong&gt;refactoring optimizer abstractions&lt;/strong&gt; into two basic pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;stats&lt;/strong&gt;, which track and aggregate statistics such as gradient moving averages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;steppers&lt;/strong&gt;, which combine stats and hyper-parameters to “step” the weights using some function.
This foundation has allowed us to write most of fastai’s optimizers in 2-3 lines of code, while in other popular libraries that would take you 50+.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are many other mid-tier and low-level APIs that &lt;strong&gt;make it easy for researchers and developers to build new methods on top of a fast and flexible foundation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The library is already in &lt;strong&gt;wide use in research, industry, and teaching&lt;/strong&gt;.&lt;br&gt;
We have used it to create a complete, and very popular deep learning course: &lt;a href="https://course.fast.ai/"&gt;Practical deep learning for coders&lt;/a&gt; (the first video of the last iteration has 256k views). &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/fastai/fastai"&gt;repository&lt;/a&gt; has &lt;strong&gt;16.9k stars and is used in more than 2,000 projects&lt;/strong&gt; at the time of writing. The community is very active on the &lt;a href="https://forums.fast.ai/"&gt;fast.ai forum&lt;/a&gt;, be it to clarify points of the course that are unclear, help with debugging or team up to tackle a new deep learning project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use fastai&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The goal is to have something easy enough for beginners but flexible enough for researchers/practitioners.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When not to use fastai&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The only thing I can think of is that you wouldn’t use fastai to serve in production a model you trained in a different framework, since we don’t deal with that aspect.&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  &lt;a href="https://pytorch.org/ignite/"&gt;PyTorch Ignite&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sOZlpNDS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AMZaglxrstCX6_9T0t4cDpQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sOZlpNDS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AMZaglxrstCX6_9T0t4cDpQ.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pytorch.org/ignite/"&gt;Pytorch Ignite&lt;/a&gt; is a high-level library that helps with training neural networks in PyTorch. Since its beginning in 2018, our goal has been to:&lt;/p&gt;
&lt;h1&gt;
  
  
  “make the common things easy and the hard things possible”.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Why use Ignite?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ignite’s high level of abstraction &lt;strong&gt;assumes little about the type of model or multiple models&lt;/strong&gt; that user is training. We only require the user to &lt;strong&gt;define the closure to be run in the training and optional validation loop&lt;/strong&gt;. It gives users a lot of flexibility and allows them to use Ignite in tasks such as co-training multiple models (i.e. GANs) or tracking multiple losses and metrics in your training loop&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignite concepts and API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are a few core objects in the Ignite’s API that you need to learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Engine&lt;/strong&gt;: the essence of the library&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Events &amp;amp; Handlers&lt;/strong&gt;: interaction with the Engine (e.g. early stopping, checkpoints, logging)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics&lt;/strong&gt;: out-of-the-box metrics for various tasks
We will present some basics to understand the main ideas but feel free to dig deeper into &lt;a href="https://pytorch.org/ignite/"&gt;examples&lt;/a&gt; in the repository.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Engine&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It simply loops over provided data, executes a processing function and returns a result.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Trainer is an Engine with model’s weights update&lt;/strong&gt; as processing function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ignite.engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zero_grad&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prepare_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;criterion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;An &lt;strong&gt;Evaluator (object to validate model) is an Engine with on-line metric computation logic&lt;/strong&gt; as processing function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ignite.engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;

&lt;span class="n"&gt;total_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;criterion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;total_loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;evaluator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compute_metrics&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="n"&gt;Loss&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_loss&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This code can silently train a model and compute total loss.&lt;/p&gt;

&lt;p&gt;In the next section we will see how to make the training and validation more user-friendly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Events &amp;amp; Handlers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In order &lt;strong&gt;to improve the flexibility of Engine&lt;/strong&gt; and allow users to interact at each step of the run, &lt;strong&gt;we introduced events and handlers&lt;/strong&gt;. The idea is that users could execute a custom code inside of the training loop as an event handler, similar to callbacks in other libraries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fire_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STARTED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_epochs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;fire_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EPOCH_STARTED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# run once on data
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;fire_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ITERATION_STARTED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;fire_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ITERATION_COMPLETED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fire_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EPOCH_COMPLETED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fire_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;At each fire_event call, all its event handlers are executed. For example, users may want to set up some run-dependent variables at the beginning of training (Events.STARTED) and update the learning rate on each iteration (Events.ITERATION_COMPLETED). With Ignite the code will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;train_loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;…&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;…&lt;/span&gt;
&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;…&lt;/span&gt;
&lt;span class="n"&gt;criterion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;lr_scheduler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;…&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# … user function to update model weights
&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_function&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STARTED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup_logging_folder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# create a folder for the run
&lt;/span&gt;    &lt;span class="c1"&gt;# set up some run dependent variables
&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ITERATION_COMPLETED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_lr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;lr_scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_loader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The cool thing with handlers (vs “callback” interfaces) is that it can be any function with the correct signature&lt;/strong&gt; (we only require the first argument to be engine), e.g. lambda, simple function, class method etc. We do not require to inherit from an interface and override possibly its abstract methods.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_event_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STARTED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Start training"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# attach handler with args, kwargs
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_training_ended&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Training is ended. mydata={}"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_event_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_training_ended&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Built-in events filtering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are cases when users would like to execute the code periodically/once or with a custom rule like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run the validation every 5 epochs,&lt;/li&gt;
&lt;li&gt;store a checkpoint every 1000 iterations,&lt;/li&gt;
&lt;li&gt;change a variable on 20th epoch,&lt;/li&gt;
&lt;li&gt;log gradients on the first 10 iterations.&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ignite provides such &lt;strong&gt;flexibility to separate “the code to execute” from the logic “when to execute the code”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, to &lt;strong&gt;run the validation every 5 epochs&lt;/strong&gt; it is simply coded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EPOCH_COMPLETED&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;every&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_validation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# run validation
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Similarly, to &lt;strong&gt;change some training variable once on 20th epoch&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EPOCH_STARTED&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;once&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;change_training_variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;More generally, user can provide its own events filtering function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;first_x_iters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ITERATION_COMPLETED&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;first_x_iters&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_gradients&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="c1"&gt;# …
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Out-of-the-box handlers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ignite provides a list of handlers and metrics to simplify user’s code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoint&lt;/strong&gt; : to save training checkpoints (composed of trainer, model(s), optimizer(s), lr scheduler(s), etc)
to save best models (by validation score)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EarlyStopping&lt;/strong&gt;: stops the training if no progress is done (by validation score)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TerminateOnNan&lt;/strong&gt;: stops the training if NaN is encountered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimizer Parameters Scheduling&lt;/strong&gt;: concatenate, add a warm-up, setup linear or cosine annealing, linear piecewise scheduling of any optimizer parameter (lr, momentum, betas, …)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XZukBdez--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/ignite_scheduling.png%3Fresize%3D1024%252C246%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XZukBdez--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/ignite_scheduling.png%3Fresize%3D1024%252C246%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Logging to common platforms: TensorBoard, Visdom, MLflow, Polyaxon or Neptune (batch losses, metrics GPU mem/utilization, optimizer parameters and more).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Hpd5cJIo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/ignite_logging.png%3Fresize%3D1024%252C463%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Hpd5cJIo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/ignite_logging.png%3Fresize%3D1024%252C463%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ignite also provides a &lt;strong&gt;list of out-of-the-box metrics for various tasks&lt;/strong&gt;: Precision, Recall, Accuracy, Confusion Matrix, IoU etc, ~20 regression metrics&lt;/p&gt;

&lt;p&gt;For example, below we compute validation accuracy on the validation dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ignite.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Accuracy&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_predictions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# …
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_true&lt;/span&gt;

&lt;span class="n"&gt;evaluator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compute_predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Accuracy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"val_accuracy"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val_loader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;val_accuracy&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.98765&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Go &lt;a href="https://pytorch.org/ignite/metrics.html#complete-list-of-metrics"&gt;here&lt;/a&gt; and &lt;a href="https://pytorch.org/ignite/contrib/metrics.html"&gt;here&lt;/a&gt; to see the full list of available metrics.&lt;/p&gt;

&lt;p&gt;Ignite metrics have this cool property that &lt;strong&gt;users can compose its own metric by using basic arithmetical operations&lt;/strong&gt; or torch methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Precision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;average&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;average&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;F1_per_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;F1_mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;F1_per_class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# torch mean method
&lt;/span&gt;&lt;span class="n"&gt;F1_mean&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"F1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h1&gt;
  
  
  Library structure
&lt;/h1&gt;

&lt;p&gt;The library is composed of two main modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core&lt;/strong&gt; module contains bases like Engine, metrics, some essential handlers. It has &lt;strong&gt;PyTorch as the only dependency&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contrib&lt;/strong&gt; module may depend on other libraries (e.g. scikit-learn, tensorboardX, visdom, tqdm, etc) and can potentially have backward compatibility breaking changes between versions.
Both modules are largely covered by unit tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Extension capabilities / Simplicity of integration in research&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We believe that our event/handler system is rather flexible and gives people the ability to interact with every part of the training process. Because of that, &lt;strong&gt;we’ve seen Ignite being used to train GANs&lt;/strong&gt; (we provide two basic examples to train &lt;a href="https://github.com/pytorch/ignite/tree/master/examples/gan"&gt;DCGAN&lt;/a&gt; and &lt;a href="https://github.com/pytorch/ignite/blob/master/examples/notebooks/CycleGAN.ipynb"&gt;CycleGAN&lt;/a&gt;) or &lt;strong&gt;Reinforcement Learning models&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;According to Github’s “Used by”, Ignite was &lt;strong&gt;used by researchers&lt;/strong&gt; for their papers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning, &lt;a href="https://github.com/BlackHC/BatchBALD"&gt;github&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A Model to Search for Synthesizable Molecules, &lt;a href="https://github.com/john-bradshaw/molecule-chef"&gt;github&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;Localised Generative Flows,  &lt;a href="https://github.com/jrmcornish/lgf"&gt;github&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;Extracting T Cell Function and Differentiation Characteristics from the Biomedical Literature, &lt;a href="https://github.com/hammerlab/t-cell-relation-extraction"&gt;github&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of those (and other research projects) we strongly believe that &lt;strong&gt;Ignite gives you enough flexibility to do deep learning research&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integrations with other libraries/frameworks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ignite &lt;strong&gt;plays nicely with other libraries or frameworks if their features do not overlap&lt;/strong&gt;. Some cool integrations that we have include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hyperparameter tuning with Ax (&lt;a href="https://github.com/pytorch/ignite/blob/master/examples/notebooks/Cifar10_Ax_hyperparam_tuning.ipynb"&gt;Ignite example&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;hyperparameter tuning with Optuna (&lt;a href="https://github.com/optuna/optuna/blob/master/examples/pytorch_ignite_simple.py"&gt;Optuna example&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;logging to TensorBoard, Visdom, MLflow, Polyaxon, Neptune (Ignite’s code), Chainer UI (Chainer’s code).&lt;/li&gt;
&lt;li&gt;Training with mixed precision using Nvidia Apex (&lt;a href="https://github.com/pytorch/ignite/tree/master/examples/references"&gt;Ignite’s examples&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ve put a lot of effort into making Ignite training reproducible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ignite’s &lt;strong&gt;Engine automatically handles the random states&lt;/strong&gt; and when it is possible forces the data loaders to provide same data samples on different runs;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ignite &lt;strong&gt;integrates with experiment tracking systems&lt;/strong&gt; like MLflow, Polyaxon, Neptune. This helps to keep track of software, parameter, and data dependencies of ML experiments;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We provide several examples and &lt;a href="https://github.com/pytorch/ignite/tree/master/examples/references"&gt;“references”&lt;/a&gt; (inspired from torchvision) of &lt;strong&gt;reproducible training on vision tasks&lt;/strong&gt; (e.g. classification on CIFAR10, ImageNet, and segmentation on Pascal VOC12).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Distributed training&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Distributed training is also &lt;strong&gt;supported by Ignite but we leave up to the user to set up its type of parallelism&lt;/strong&gt;: model or data.&lt;/p&gt;

&lt;p&gt;For example, in data distributed configuration users are required to correctly set up the distributed process group, wrap the model, use distributed sampler etc. Ignite handles metrics computation: reduction of the value across all processes.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;provide several examples&lt;/strong&gt; (&lt;a href="https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10#distributed-training"&gt;e.g. distributed CIFAR10&lt;/a&gt;) to display how to use Ignite in a distributed configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Popularity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the moment of writing, Ignite had about &lt;strong&gt;2.5k stars&lt;/strong&gt; and according to Github’s “Used by” feature is &lt;strong&gt;used by 205 repositories&lt;/strong&gt;.&lt;br&gt;
Some honorable mentions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/huggingface/transfer-learning-conv-ai"&gt;State-of-the-Art Conversational AI with Transfer Learning&lt;/a&gt; by HuggingFace&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/huggingface/naacl_transfer_learning_tutorial"&gt;Tutorial on Transfer Learning in NLP held at NAACL 2019&lt;/a&gt; by HuggingFace&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thomas Wolf from HuggingFace also left some awesome feedback for the library in &lt;a href="https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313"&gt;one of his blog articles&lt;/a&gt; (Thanks, Thomas!):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Using the awesome PyTorch ignite framework and the new API for Automatic Mixed Precision (FP16/32) provided by NVIDIA’s apex, we were able to distill our +3k lines of competition code in less than 250 lines of training code with distributed and FP16 options!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On-Second-Edition"&gt;Deep-Reinforcement-Learning-Hands-On-Second-Edition&lt;/a&gt;
by Max Lapan
This is a book on Deep Reinforcement Learning by Max Lapan wherein the second edition examples are made with Ignite.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Project-MONAI/MONAI"&gt;Project MONAI&lt;/a&gt;: AI Toolkit for Healthcare Imaging. This project primarily focused on the healthcare research to develop DL models for medical imaging uses Ignite for end-to-end training.
For other use-cases, please take a look at &lt;a href="https://github.com/pytorch/ignite#they-use-ignite"&gt;Ignite’s github page&lt;/a&gt; and its “Used by”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to use Ignite&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Remove boilerplate and standardize your code using highly customizable modules of Ignite’s API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When you require factorized code but don’t want to sacrifice on flexibility to support your complicated training strategies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the rich array of utilities like metrics, handlers, and loggers available to evaluate/debug your model with ease&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When not to use Ignite&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When there is a super custom PyTorch code where Ignite’s API is overhead.&lt;/li&gt;
&lt;li&gt;When completely satisfied by pure PyTorch API or another high-level library&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thank you for reading! &lt;strong&gt;Pytorch-Ignite&lt;/strong&gt; presented to you with love by the &lt;a href="https://github.com/pytorch/ignite/graphs/contributors"&gt;PyTorch community&lt;/a&gt;!&lt;/p&gt;


&lt;h1&gt;
  
  
  &lt;a href="https://github.com/PyTorchLightning/pytorch-lightning"&gt;PyTorch Lightning&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OWcQn7Bz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AADiqqoQmTkvs3l2mkJ6ErQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OWcQn7Bz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AADiqqoQmTkvs3l2mkJ6ErQ.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Philosophy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/PyTorchLightning/pytorch-lightning"&gt;PyTorch Lightning&lt;/a&gt; is a very lightweight wrapper on PyTorch which is more like a &lt;strong&gt;coding standard than a framework&lt;/strong&gt;. The format allows you to get rid of a ton of boilerplate code while &lt;strong&gt;keeping it easy to follow&lt;/strong&gt;&lt;br&gt;
.&lt;br&gt;
The use of hooks, standard across every part of the training, means you can override any part of the internal functionality down to how the backward pass is done - it is &lt;strong&gt;extremely flexible&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The result is a framework that &lt;strong&gt;gives researchers, students, and production teams the ultimate flexibility&lt;/strong&gt; to try crazy ideas without having to learn yet another framework while automating away all the engineering details.&lt;/p&gt;

&lt;p&gt;Lightning has two additional, more ambitious motivations: &lt;strong&gt;reproducibility of research and democratization of best practices&lt;/strong&gt; in the deep learning community.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Notable features&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train on CPU, GPU or TPUs without changing your code!&lt;/li&gt;
&lt;li&gt;Only library to support TPU training (Trainer(num_tpu_cores=8))&lt;/li&gt;
&lt;li&gt;Trivial multi-node training&lt;/li&gt;
&lt;li&gt;Trivial multi-GPU training&lt;/li&gt;
&lt;li&gt;Trivial 16 bit precision support&lt;/li&gt;
&lt;li&gt;Built-in performance profiler (Trainer(profile=True))&lt;/li&gt;
&lt;li&gt;Tons of integrations with libraries like tensorboard, comet.ml, neptune.ai, etc… (Trainer(logger=NeptuneLogger(...)))&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Team&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lightning has 90+ contributors and a &lt;a href="https://pytorch-lightning.readthedocs.io/en/latest/governance.html"&gt;core team of 8 contributors&lt;/a&gt; who make sure the project moves forward lightning fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://pytorch-lightning.readthedocs.io/en/latest/"&gt;Lightning documentation&lt;/a&gt; is extremely thorough yet simple and easy to use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the core, Lightning has an API that &lt;strong&gt;centers around two objects, the Trainer and the LightningModule&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Trainer abstracts away all the engineering details and the LightningModule captures all the science/research code. This decoupling makes the research code more readable and allows it to run on arbitrary hardware.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jaTg_eaE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AofLdA4IQ6TnQ3owo5wSzdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jaTg_eaE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AofLdA4IQ6TnQ3owo5wSzdg.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LightningModule&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All the &lt;strong&gt;research logic&lt;/strong&gt; goes into LightningModule.&lt;/p&gt;

&lt;p&gt;For example, in a cancer detection system, this part would handle the main things like the object detection model, data loaders for medical images etc.&lt;/p&gt;

&lt;p&gt;It groups the core &lt;strong&gt;ingredients you need to build a deep learning system&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The computations (init, forward).&lt;/li&gt;
&lt;li&gt;What happens in the training loop (training_step).&lt;/li&gt;
&lt;li&gt;What happens in the validation loop (validation_step).&lt;/li&gt;
&lt;li&gt;What happens in the testing loop (test_step).&lt;/li&gt;
&lt;li&gt;The optimizer(s) to use (configure_optimizers).&lt;/li&gt;
&lt;li&gt;The data to use (train, test, val dataloaders).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s take a look at the example from the docs and unpack what is happening there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MNISTExample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LightningModule&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CoolSystem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# not the best model...
&lt;/span&gt;        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;training_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# REQUIRED
&lt;/span&gt;        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_entropy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'train_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validation_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_entropy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validation_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;avg_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'avg_val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'test_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_entropy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;avg_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'test_loss'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'test_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'avg_test_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;configure_optimizers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# REQUIRED
&lt;/span&gt;        &lt;span class="c1"&gt;# can return multiple optimizers
&lt;/span&gt;        &lt;span class="c1"&gt;# and learning_rate schedulers
&lt;/span&gt;        &lt;span class="c1"&gt;# (LBFGS it is automatically supported,
&lt;/span&gt;        &lt;span class="c1"&gt;# no need for closure function)
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Adam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_loader&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_dataloader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# REQUIRED
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_loader&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;val_dataloader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_loader&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_dataloader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As you can see, the LightningModule &lt;strong&gt;builds on top of pure PyTorch code&lt;/strong&gt; and simply organizes them in nine methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;strong&gt;init&lt;/strong&gt;()&lt;/strong&gt;: Defines our model or multiple models, and initializes the weights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;forward()&lt;/strong&gt;:  You can think of it as your standard PyTorch forward method but with additional flexibility to define what you want to happen at the prediction/inference level.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;training_step()&lt;/strong&gt;: Defines what happens in the training loop. It combines a forward pass, loss calculation, and any other logic you want to execute during training.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;validation_step()&lt;/strong&gt;: Defines what happens in the validation loop. For example, you can go calculate loss or accuracy for each batch and store them in the logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;validation_end()&lt;/strong&gt;: Everything that you want to happen after the validation loop ends. For example, you may want to calculate the average loss or accuracy over validation batches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;test_step()&lt;/strong&gt;: What you want to happen to each batch at inference time. You can put your Test Time Augmentation logic or other things here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;test_end()&lt;/strong&gt;: Similarly to validation_end, you can use it to aggregate the batch results calculated during test_step&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;configure_optimizers()&lt;/strong&gt;: initialize an optimizer or multiple optimizers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;train/val/test_dataloader()&lt;/strong&gt;: returns your PyTorch DataLoaders for train, validation, and test sets.
Since every PytorchLightning system needs to implement those methods it is really easy to see exactly what is happening in the research.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, to understand what a paper is doing, all you have to do is look at the training_step of the LightningModule!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This readability and a close mapping between the core research concepts and implementation lies at the core of Lightning.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Trainer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;where the engineering part of deep learning happens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the cancer detection system, this might mean how many GPUs you use, when you save checkpoints when you stop training, etc… These are details that make up a lot of the “secret sauce” of research which are standard best practices across deep learning projects (ie: not hugely relevant to cancer detection).&lt;/p&gt;

&lt;p&gt;Notice that the LightningModule has nothing about GPUs or 16-bit precision or early stopping or logging or anything like that. &lt;strong&gt;All of that is automatically handled by the trainer&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MNISTExample&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# most basic trainer, uses good defaults
&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;That’s all it takes to train this model! The trainer handles everything for you including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Early stopping&lt;/li&gt;
&lt;li&gt;Automatic logging to Tensorboard (or comet, mlflow, neptune, etc…)&lt;/li&gt;
&lt;li&gt;Auto checkpointing&lt;/li&gt;
&lt;li&gt;And more (we’ll talk about that in the next sections)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this is free out of the box!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The learning curve&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since LightningModule is simply reorganizing pure Pytorch objects and everything is “out in the open” it is &lt;strong&gt;trivial to refactor your PyTorch code to the Lightning format&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For more information about making the switch from pure PyTorch to Lightning read &lt;a href="https://towardsdatascience.com/how-to-refactor-your-pytorch-code-to-get-these-42-benefits-of-pytorch-lighting-6fdd0dc97538"&gt;this article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build-in features (what you get out of the box)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lightning gives &lt;strong&gt;a ton of advanced features out-of-the-box.&lt;/strong&gt;&lt;br&gt;
For instance, it takes a one-liner to use things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-gpu training
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;TPU training
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_tpu_cores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Multi-node training
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distributed_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="n"&gt;ddp&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Gradient Clipping
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gradient_clip_val&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Accumulated Gradients
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accumulate_grad_batches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;16-bit precision
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_amp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Truncated back-propagation through time
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;truncated_bptt_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;and a lot more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you would like to see the full list of &lt;a href="https://pytorch-lightning.readthedocs.io/en/latest/"&gt;free-magic features go here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extension capabilities / Simplicity of integration in research&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Having a bunch of in-built functionalities is great but &lt;strong&gt;for researchers, it’s crucial to not have to learn yet another library, and directly control key parts of research&lt;/strong&gt; such as data-processing without having other abstractions operate on those.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This flexible format allows for the most freedom in training and validating.&lt;/strong&gt; This interface should be thought of as a system, not as a model. The system might have multiple models (GANs, seq-2-seq, etc…) or just one model, such as this simple MNIST example.&lt;/p&gt;

&lt;p&gt;Thus researchers are &lt;strong&gt;free to try as many crazy things as they want&lt;/strong&gt;, and ONLY have to worry about the LightningModule.&lt;/p&gt;

&lt;p&gt;But maybe you need even MORE flexibility. In this case, you can do things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Change how the backward step is done.&lt;/li&gt;
&lt;li&gt;Change how 16-bit is initialized.&lt;/li&gt;
&lt;li&gt;Add your own way of doing distributed training.&lt;/li&gt;
&lt;li&gt;Add Learning rate schedulers.&lt;/li&gt;
&lt;li&gt;Use multiple optimizers.&lt;/li&gt;
&lt;li&gt;Change the frequency of optimizer updates.&lt;/li&gt;
&lt;li&gt;And many many more things.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, &lt;strong&gt;everything in Lightning is implemented as hooks that can be overridden by the user&lt;/strong&gt;. This makes EVERY single aspect of training highly configurable — which is exactly the flexibility a research or production team needs.&lt;/p&gt;

&lt;p&gt;But wait you say… this is too simple for your use case? No worries, Lightning was designed while doing research at NYU and Facebook AI Research for my PhD to be as flexible as possible for researchers.&lt;/p&gt;

&lt;p&gt;Here are some examples:&lt;/p&gt;

&lt;p&gt;Need your &lt;strong&gt;own backward pass&lt;/strong&gt;? Override this hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;backward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_amp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;use_amp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;scaled_loss&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;scaled_loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Need &lt;strong&gt;your own amp init&lt;/strong&gt;? Override this hook:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;configure_apex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amp_level&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;amp_level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Want to go as deep as adding your own DDP implementation? Override these two hooks:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;configure_ddp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device_ids&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Lightning DDP simply routes to test_step, val_step, etc...
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LightningDistributedDataParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;device_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;find_unused_parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init_ddp_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# use slurm job id for the port number
&lt;/span&gt;    &lt;span class="c1"&gt;# guarantees unique ports across jobs from same grid search
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# use the last 4 numbers in the job id as the id
&lt;/span&gt;        &lt;span class="n"&gt;default_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'SLURM_JOB_ID'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;default_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;default_port&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

        &lt;span class="c1"&gt;# all ports should be in the 10k+ range
&lt;/span&gt;        &lt;span class="n"&gt;default_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_port&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;15000&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;default_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12910&lt;/span&gt;

    &lt;span class="c1"&gt;# if user gave a port number, use that one instead
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;default_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'MASTER_PORT'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'MASTER_PORT'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_port&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# figure out the root node addr
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;root_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'SLURM_NODELIST'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;root_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'127.0.0.2'&lt;/span&gt;

    &lt;span class="n"&gt;root_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resolve_root_node_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'MASTER_ADDR'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;root_node&lt;/span&gt;
    &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init_process_group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;'nccl'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proc_rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;world_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;world_size&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PASTE&lt;/span&gt; &lt;span class="n"&gt;CODE&lt;/span&gt; &lt;span class="n"&gt;HERE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;There are 10s of hooks like these and we add more as researchers request them.&lt;/p&gt;

&lt;p&gt;The bottom line is that &lt;strong&gt;Lightning is trivial to use for a new user and infinitely extensible if you’re a researcher or production team&lt;/strong&gt; working with the bleeding-edge AI research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Readability and moving towards Reproducibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As I mentioned, Lightning was created with a second more ambitious broad motivation: Reproducibility. While true reproducibility requires standard code, standard seeds, standard hardware, etc… Lightning contributes to reproducible research in two ways:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;to **standardize the format of the ML code&lt;/em&gt;*,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;decouple the engineering from the science&lt;/strong&gt; so that the approach can be tested in different systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is an expressive, powerful API for doing research.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If every research project and paper was implemented using the LightningModule template, it would be very easy to find out what’s going on (but perhaps not easy to understand haha)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Distributed training&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lightning &lt;strong&gt;makes multi-GPU or even multi-GPU multi-node training trivial.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For instance, if you want to train the above example on multiple GPUs just add the following flags to the trainer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distributed_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'dp'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Using the above flags will run this model on 4 GPUs.&lt;br&gt;
If you want to run on say 16 GPUs, where you have 4 machines each with 4 GPUs, change the trainer flags to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nb_gpu_nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distributed_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'ddp'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And submit the following SLURM job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#!/bin/bash -l
&lt;/span&gt;
&lt;span class="c1"&gt;# SLURM SUBMIT SCRIPT
#SBATCH --nodes=4
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --mem=0
#SBATCH --time=0-02:00:00
&lt;/span&gt;
&lt;span class="c1"&gt;# activate conda env
&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="n"&gt;activate&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="c1"&gt;# -------------------------
# debugging flags (optional)
&lt;/span&gt; &lt;span class="n"&gt;export&lt;/span&gt; &lt;span class="n"&gt;NCCL_DEBUG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;
 &lt;span class="n"&gt;export&lt;/span&gt; &lt;span class="n"&gt;PYTHONFAULTHANDLER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="c1"&gt;# on your cluster you might need these:
# set the network interface
# export NCCL_SOCKET_IFNAME=^docker0,lo
&lt;/span&gt;
&lt;span class="c1"&gt;# might need the latest cuda
# module load NCCL/2.4.7-1-cuda.10.0
# -------------------------
&lt;/span&gt;
&lt;span class="c1"&gt;# run script from above
&lt;/span&gt;&lt;span class="n"&gt;srun&lt;/span&gt; &lt;span class="n"&gt;python3&lt;/span&gt; &lt;span class="n"&gt;mnist_example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is crazy simple considering how much happens under the hood.&lt;/p&gt;

&lt;p&gt;For more information about distributed training with Pytorch lightning read this article about &lt;a href="https://towardsdatascience.com/how-to-train-a-gan-on-128-gpus-using-pytorch-9a5b27a52c73"&gt;“How To Train A GAN On 128 GPUs Using PyTorch”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Productionalization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lightning models can be easily deployed because they’re still simple PyTorch models under the hood. This means we can leverage all the engineering advancements from the PyTorch community on supporting deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Popularity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pytorch Lightning has over &lt;strong&gt;3800 stars on Github&lt;/strong&gt; and has recently hit &lt;strong&gt;110k downloads&lt;/strong&gt;.&lt;br&gt;
More importantly, the community is growing rapidly with &lt;strong&gt;over 90 contributors, many from the top AI labs in the world&lt;/strong&gt; adding new features daily.&lt;br&gt;
You can talk to us on &lt;a href="https://github.com/PyTorchLightning/pytorch-lightning/issues"&gt;Github&lt;/a&gt; or &lt;a href="https://pytorch-lightning.slack.com/join/shared_invite/enQtODU5ODIyNTUzODQwLTFkMDg5Mzc1MDBmNjEzMDgxOTVmYTdhYjA1MDdmODUyOTg2OGQ1ZWZkYTQzODhhNzdhZDA3YmNhMDhlMDY4YzQ"&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use PyTorch Lightning&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lightning is made for &lt;strong&gt;professional researchers and production teams working on cutting edge research&lt;/strong&gt;. It’s great when you know what you need to do. This focus means it adds advanced features for people looking to test/build things very quickly without getting bogged down in the details.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When not to use PyTorch Lightning&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Although lightning is made for professional researchers and data scientists, new-comers can still benefit. For new-comers, we recommend they build a simple MNIST system from scratch using pure PyTorch. This will show them how to set up a training loop, etc. Once they understand how that works and how the forward/backward pass work, they can move into lightning. &lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  &lt;a href="https://github.com/pytorchbearer/torchbearer"&gt;Torchbearer&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JLmjHS-D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2ApsxackvUNoQ4dR4QFjDFow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JLmjHS-D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2ApsxackvUNoQ4dR4QFjDFow.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our part of the blog will be a little different from the others because &lt;strong&gt;&lt;a href="https://github.com/pytorchbearer/torchbearer"&gt;torchbearer&lt;/a&gt; is coming to an end&lt;/strong&gt; (sort of). In particular, we are joining the PyTorch-Lightning team. The move came about from a meeting with William Falcon at NeurIPS 2019, and was recently &lt;a href="https://medium.com/pytorch/pytorch-frameworks-unite-torchbearer-joins-pytorch-lightning-c588e1e68c98"&gt;announced on the PyTorch blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So, instead of trying to sell you torchbearer, we thought we should write about what we did well, what we did wrong, and why we are moving to Lightning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we did well&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The lib got pretty popular and got to &lt;strong&gt;500+ stars on GitHub&lt;/strong&gt; which was far more than we had ever imagined.&lt;/li&gt;
&lt;li&gt;We became a &lt;strong&gt;part of the PyTorch ecosystem&lt;/strong&gt;. It was an important experience for us that allowed us to feel like a valued part of a wider community.
We’ve built a comprehensive set of built-in callbacks and metrics. This was one of our key successes; &lt;strong&gt;a lot of powerful outcomes can be achieved in a single line of code with torchbearer&lt;/strong&gt;.
&lt;em&gt;An important feature of torchbearer that **enables extreme flexibility&lt;/em&gt;* is the state object. This is a mutable dictionary that houses all of the variables that are in use by the core training loop. By editing these variables in callbacks at different points in the loop, most highly complex outcomes can be achieved.&lt;/li&gt;
&lt;li&gt;It was always important to us that torchbearer had &lt;strong&gt;good documentation&lt;/strong&gt;. We focused on example-led docs that can be executed in your browser with Google Colab. The example library has been a success, giving quick information on the more powerful use cases of torchbearer.&lt;/li&gt;
&lt;li&gt;A final thing to note is that torchbearer has been &lt;strong&gt;used by both of us over the past two years for our PhD research&lt;/strong&gt;. We count this as a success because we have almost &lt;strong&gt;never had to change the torchbearer API&lt;/strong&gt; in order to prototype our ideas, even the ridiculous ones!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What we did wrong&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The state object, which makes this library so flexible, is also problematic. The ability to access any part of the library from any other leads itself towards abuse in the same way that global variables do. In particular, &lt;strong&gt;determining how and when a particular variable in the state object was changed is challenging&lt;/strong&gt; once more than one object is acting on it. Additionally, for state to be effective you need to know what each variable is and in which callbacks you can access it, so the &lt;strong&gt;learning curve is steep&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;By its nature, &lt;strong&gt;torchbearer does not lend itself to distributed training&lt;/strong&gt;, or even to some extent low precision training. Since every part of state is available at all times, how do you chunk this and distribute it across devices? PyTorch can deal with this in some way, in that torchbearer can be used when distributed, but it is unclear exactly what is happening to state at these times.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Changing the core training loop was non-trivial&lt;/strong&gt;. Torchbearer offers a way to completely write your own core loop, but you then have to manually write in callback points to ensure all the built-in Torchbearer functionality. Coupling this with a lower standard of documentation compared to other aspects of the library, custom loops were overly complicated and likely completely unknown to most users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managing an open-source project while working on our PhDs ended up being more difficult than expected&lt;/strong&gt;. As a result, some parts of the library were thoroughly tested and stable (since they were important for our PhD work), while others were under-developed and buggy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;During our initial growth, we decided to dramatically change the core API&lt;/strong&gt;. This significantly improved Torchbearer, but also meant a lot of effort moving from one version to the next. It felt justified as we were still pre 1.0.0 stable release but it certainly contributed to some users choosing other libraries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why we are joining Pytorch Lightning?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first key reason for our willingness to move to Lightning is its popularity. With Lightning we &lt;strong&gt;become part of the fastest-growing PyTorch training library&lt;/strong&gt;, that has already eclipsed many of its competitors.\&lt;/li&gt;
&lt;li&gt;The second key reason for our move, and a key part of the success of Lightning, is that &lt;strong&gt;it was built from the ground up to support distributed training and low precision&lt;/strong&gt;, both challenging to implement in torchbearer. These practical considerations made in the early stages of Lightning’s development are invaluable to the modern deep learning practitioner and &lt;strong&gt;would be challenging to retro-fit in torchbearer.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;In addition, at Lightning &lt;strong&gt;we will be part of a larger team of core developers.&lt;/strong&gt; This will enable us to ensure greater stability and to support a broader range of use cases than is possible with just two developers as we have now.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately, we have always believed that the best way to move things forward would be to join efforts with another library. This is our chance to do that and help Lightning become the best training library for PyTorch.&lt;/p&gt;

&lt;h1&gt;
  
  
  (Subjective) Comparison and Final Thoughts
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dJS7lOoq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/450/1%2AIJ6dNQCyblH2bn1SzLbsxA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dJS7lOoq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/450/1%2AIJ6dNQCyblH2bn1SzLbsxA.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, I want to give a…&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;huge THANK YOU to all the authors!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wow, this is a lot of first-hand info and I hope it will make it easier to choose the library that works for you.&lt;/p&gt;

&lt;p&gt;As I was working on this article with them and looking closer at what their libraries have to offer (and creating some Pull Requests), I gained &lt;strong&gt;my own personal perspective&lt;/strong&gt; that I want to share with you here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/skorch-dev/skorch"&gt;&lt;strong&gt;Skorch&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want the sklearn-like API then &lt;strong&gt;Skorch&lt;/strong&gt; is your lib. It is well tested and documented. It actually &lt;strong&gt;gives more flexibility then what I had anticipated&lt;/strong&gt; before working on this article which was a nice surprise. That said the &lt;strong&gt;focus of this lib is not cutting edge research but rather production applications&lt;/strong&gt;. I feel that it really delivers on their promise and does exactly what it was built to do. I really respect tools/libs like that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.fast.ai/"&gt;&lt;strong&gt;Fastai&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fastai&lt;/strong&gt; for a long time &lt;strong&gt;has been a great choice for people getting into deep learning&lt;/strong&gt;. It can get you state-of-the-art results in 10 lines of almost magical code. But there is another side to the library, perhaps lesser-known, that lets you access &lt;strong&gt;lower-level APIs&lt;/strong&gt; and create custom building blocks that &lt;strong&gt;give researchers and practitioners flexibility to implement very complex systems&lt;/strong&gt;. Maybe it was the uber-popular fastai deep learning course that created a false image of this library in my mind but I will definitely take it for a spin in the future, especially with the recent v2 pre-release.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pytorch.org/ignite/"&gt;&lt;strong&gt;Pytorch Ignite&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignite&lt;/strong&gt; is an interesting animal. With its, &lt;strong&gt;a bit exotic&lt;/strong&gt; (for my personal taste), &lt;strong&gt;engine, event and handler API&lt;/strong&gt; you can &lt;strong&gt;do pretty much whatever you want&lt;/strong&gt;. It has a ton of features out-of-the-box and I definitely understand why many researchers use it in their daily work. It &lt;strong&gt;took me a moment to get familiar with the framework&lt;/strong&gt; but you just need to stop thinking in “callback terms” and you’ll be fine. That said, the API doesn’t speak to me as clearly as some other libs. You should check it out though, as it may be a great choice for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://catalyst-team.github.io/catalyst/"&gt;&lt;strong&gt;Catalyst&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before looking into &lt;strong&gt;Catalyst&lt;/strong&gt; I thought it was a heavy(ish) framework for creating deep learning pipelines. Now my view is completely different. &lt;strong&gt;It decouples engineering stuff from research in a beautiful way&lt;/strong&gt;. Pure PyTorch objects go into a trainer that deals with the training. It is very flexible and has a separate module that deals with Reinforcement Learning. It also &lt;strong&gt;gives you a lot of features out-of-the-box when it comes to reproducibility, and serving models in production.&lt;/strong&gt; And those multistage pipelines I told you about? You can easily create them with minimal overhead. Overall I think it is a &lt;strong&gt;great project and a lot of people out there could benefit from using it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/PyTorchLightning/pytorch-lightning"&gt;&lt;strong&gt;Pytorch Lightning&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lightning&lt;/strong&gt; also wants to separate science from engineering and I think it does a great job at that. There are just a ton of in-built features that make it even more appealing.&lt;br&gt;
But something that makes this library a bit different is that it enables &lt;strong&gt;reproducibility by making deep learning research implementations readable&lt;/strong&gt;. It is really easy to follow the logic inside of the LightningModule where the training step (among other things) is not abstracted away. I think communicating research projects in this way can be extremely effective. &lt;strong&gt;It is getting very popular very quickly&lt;/strong&gt; and with &lt;strong&gt;authors of Torchbearer joining the core developer team&lt;/strong&gt; I think that &lt;strong&gt;this project has a bright future&lt;/strong&gt; in front of it, Lightning bright even 🙂&lt;/p&gt;

&lt;p&gt;So which one should you choose?&lt;br&gt;
As always it depends but I think you now have enough information to make a good decision!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally &lt;a href="https://neptune.ai/blog/model-training-libraries-pytorch-ecosystem?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-model-training-libraries-pytorch-ecosystem"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Keras Metrics: Everything You Need To Know</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Mon, 13 Apr 2020 07:58:50 +0000</pubDate>
      <link>https://dev.to/jakubczakon/keras-metrics-everything-you-need-to-know-5138</link>
      <guid>https://dev.to/jakubczakon/keras-metrics-everything-you-need-to-know-5138</guid>
      <description>&lt;p&gt;This article was originally posted by Derrick Mwiti on &lt;a href="https://neptune.ai/blog/keras-metrics?utm_source=devto&amp;amp;utm_medium=post&amp;amp;utm_campaign=blog-keras-metrics" rel="noopener noreferrer"&gt;neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners. &lt;/p&gt;




&lt;p&gt;Keras metrics are functions that are used to evaluate the performance of your deep learning model. Choosing a good metric for your problem is usually a difficult task.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you need to understand &lt;strong&gt;which metrics are already available in Keras&lt;/strong&gt; and tf.keras and how to use them,&lt;/li&gt;
&lt;li&gt;in many situations you need to &lt;strong&gt;define your own custom metric&lt;/strong&gt; because the metric you are looking for doesn’t ship with Keras.&lt;/li&gt;
&lt;li&gt;sometimes you want to &lt;strong&gt;monitor model performance by looking at charts like ROC curve or Confusion Matrix&lt;/strong&gt; after every epoch.
Lucky for you, this article explains all that!&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Keras metrics 101
&lt;/h1&gt;

&lt;p&gt;In Keras, metrics are passed during the compile stage as shown below. You can pass several metrics by comma separating them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean_squared_error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sgd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mae&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;categorical_accuracy&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How you should choose those evaluation metrics?&lt;/p&gt;

&lt;p&gt;Some of them are available in Keras, others in tf.keras. Sometimes you need to implement your own custom metrics.&lt;/p&gt;

&lt;p&gt;Let’s go over all of those situations.&lt;/p&gt;

&lt;h1&gt;
  
  
  Which metrics are available in Keras?
&lt;/h1&gt;

&lt;p&gt;Keras provides a rich pool of inbuilt metrics. Depending on your problem, you’ll use different ones.&lt;/p&gt;

&lt;p&gt;Let’s look at some of the problems you may be working on.&lt;/p&gt;

&lt;h1&gt;
  
  
  Binary classification
&lt;/h1&gt;

&lt;p&gt;Binary classification metrics are used on computations that involve just two classes. A good example is building a deep learning &lt;strong&gt;model to predict cats and dogs&lt;/strong&gt;. We have two classes to predict and the threshold determines the point of separation between them.&lt;strong&gt;binary_accuracy&lt;/strong&gt; and &lt;strong&gt;accuracy&lt;/strong&gt; are two such functions in Keras.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;binary_accuracy&lt;/strong&gt;, for example, computes the mean accuracy rate across all predictions for binary classification problems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;binary_accuracy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;accuracy&lt;/strong&gt; metric computes the accuracy rate across all predictions. y_true represents the true labels while y_pred represents the predicted ones.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;confusion_matrix&lt;/strong&gt; displays a table showing the true positives, true negatives, false positives, and false negatives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2Fconfusion_matrix.png%3Ffit%3D289%252C90%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2Fconfusion_matrix.png%3Ffit%3D289%252C90%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
In the above confusion matrix, the model made 3305 + 375 correct predictions and 106 + 714 wrong predictions.&lt;/p&gt;

&lt;p&gt;You can also visualize it as a matplotlib chart which we will cover later.&lt;/p&gt;

&lt;p&gt;You can also visualize it as a matplotlib chart which we will cover later.&lt;/p&gt;
&lt;h1&gt;
  
  
  Multiclass classification
&lt;/h1&gt;

&lt;p&gt;These metrics are used for classification &lt;strong&gt;problems involving more than two classes&lt;/strong&gt;. Extending our animal classification example you can have three animals, cats, dogs, and bears. Since we are classifying more than two animals, this is a multiclass classification problem.&lt;/p&gt;

&lt;p&gt;The shape of y_true is the number of entries by 1 that is (n,1) but the shape of y_pred is the number of entries by number of classes(n,c)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;categorical_accuracy&lt;/strong&gt; metric computes the mean accuracy rate across all predictions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;categorical_accuracy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;sparse_categorical_accuracy&lt;/strong&gt; is similar to the categorical_accuracy but mostly used &lt;strong&gt;when making predictions for sparse targets&lt;/strong&gt;. A great example of this is working with text in deep learning problems such as word2vec. In this case, one works with &lt;strong&gt;thousands of classes&lt;/strong&gt; with the aim of predicting the next word. This task produces a situation where the y_true is a huge matrix that is almost all zeros, a perfect spot to use a sparse matrix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sparse_categorical_accuracy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;top_k_categorical_accuracy&lt;/strong&gt; computes the top-k-categorical accuracy rate. We take top k predicted classes from our model and see if the correct class was selected as top k. If it was we say that our model was correct.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;top_k_categorical_accuracy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Regression
&lt;/h1&gt;

&lt;p&gt;The metrics used in regression problems include Mean Squared Error, Mean Absolute Error, and Mean Absolute Percentage Error. These metrics are used when predicting numerical values such as sales and prices of houses. Check out this resource for a complete guide on regression metrics.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mse&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
              &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                       &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean_absolute_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                       &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean_absolute_percentage_error&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                       &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;categorical_accuracy&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  How to create custom metric in Keras?
&lt;/h1&gt;

&lt;p&gt;As we had mentioned earlier, Keras also allows you to define your own custom metrics.&lt;/p&gt;

&lt;p&gt;The function you define &lt;strong&gt;has to take y_true and y_pred as arguments and must return a single tensor value&lt;/strong&gt;. These objects are of type Tensor with float32 data type.The shape of the object is the number of rows by 1. For example, if you have 4,500 entries the shape will be (4500, 1).&lt;/p&gt;

&lt;p&gt;You can use the function by passing it at the compilation stage of your deep learning model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;your_custom_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  How to calculate F1 score in Keras (precision, and recall as a bonus)?
&lt;/h1&gt;

&lt;p&gt;Let’s see how you can compute the &lt;strong&gt;f1 score, precision and recall in Keras&lt;/strong&gt;. We will create it for the multiclass scenario but you can also use it for binary classification.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://neptune.ai/blog/evaluation-metrics-binary-classification#10?utm_source=devto&amp;amp;utm_medium=post&amp;amp;utm_campaign=blog-keras-metrics" rel="noopener noreferrer"&gt;f1 score is the weighted average of precision and recall&lt;/a&gt;. So to calculate f1 we need to create functions that calculate precision and recall first. Note that in multiclass scenario you need to look at all classes not just the positive class (which is the case for binary classification)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="n"&gt;all_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_positives&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;precision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="n"&gt;predicted_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_positives&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f1_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;precision_m&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recall_m&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next step is to use these functions at the compilation stage of our deep learning model. We are also adding the Keras accuracy metric that is available by default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f1_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s now fit the model to the training and test set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can evaluate your model and access the metrics you have just created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;f1_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great, you now know how to create custom metrics in keras.&lt;/p&gt;

&lt;p&gt;That said, sometimes you can use something that is already there, just in a different library like tf.keras 🙂&lt;/p&gt;

&lt;h1&gt;
  
  
  Which metrics are available in tf.keras?
&lt;/h1&gt;

&lt;p&gt;Recently &lt;strong&gt;Keras has become a standard API in TensorFlow&lt;/strong&gt; and there are a lot of useful metrics that you can use.&lt;/p&gt;

&lt;p&gt;Let’s look at some of them.&lt;br&gt;
Unlike in Keras where you just call the metrics using keras.metrics functions, in tf.keras you have to instantiate a Metric class. &lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Accuracy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;There is quite a bit of overlap between keras metrics and tf.keras.&lt;/strong&gt; However, there are some metrics that you can only find in tf.keras.&lt;/p&gt;

&lt;p&gt;Let’s take a look at those.&lt;/p&gt;

&lt;h1&gt;
  
  
  tf.keras Classification Metrics
&lt;/h1&gt;

&lt;p&gt;tf.keras.metrics.AUC computes the approximate AUC (Area under the curve) for ROC curve via the &lt;a href="https://www.khanacademy.org/math/ap-calculus-ab/ab-integration-new/ab-6-2/a/left-and-right-riemann-sums" rel="noopener noreferrer"&gt;Riemann sum&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sgd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mse&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AUC&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use precision and recall that we have implemented before, out of the box in tf.keras.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sgd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mse&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
               &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Precision&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
                        &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Recall&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  tf.keras Segmentation Metrics
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;tf.keras.metrics.MeanIoU&lt;/strong&gt; – Mean Intersection-Over-Union is a metric used for the evaluation of semantic image segmentation models. We first calculate the IOU for each class:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FIOU.png%3Fw%3D767%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FIOU.png%3Fw%3D767%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MeanIoU&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  tf.keras Regression Metrics
&lt;/h1&gt;

&lt;p&gt;Just like Keras, tf.keras has similar regression metrics. We won’t dwell on them much but there is an interesting metric to highlight called &lt;strong&gt;MeanRelativeError&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;MeanRelativeError&lt;/strong&gt; takes  the absolute error for an observation and divides it by constant. This constant, &lt;strong&gt;normalizer&lt;/strong&gt;, can be the same for all observations or different for each sample.&lt;/p&gt;

&lt;p&gt;Therefore, the mean relative error is the average of the relative errors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MeanRelativeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  How to create a custom metric in tf.keras?
&lt;/h1&gt;

&lt;p&gt;In tf.keras you can create a custom metric by extending the keras.metrics.Metric class.&lt;br&gt;
To do so you have to override the update_state, result, and reset_state functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;update_state()&lt;/strong&gt; does all the updates to state variables and calculates the metric,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;result()&lt;/strong&gt; returns the value for the metric from state variables,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;reset_state()&lt;/strong&gt; sets the metric value at the beginning of each epoch to a predefined constant (typically 0)
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MulticlassTruePositives&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metric&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multiclass_true_positives&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MulticlassTruePositives&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initializer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;zeros&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sample_weight&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sample_weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_weight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset_states&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# The state of the metric will be reset at the start of each epoch.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then we simply pass it at compile stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;MulticlassTruePositives&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Performance charts: ROC curve and Confusion Matrix in Keras
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Sometimes the performance cannot be represented as one number&lt;/strong&gt; but rather as a performance chart. Examples of such charts are ROC curve or confusion matrix. In those cases, you may want to log those charts somewhere for further inspection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To do it you need to create a callback&lt;/strong&gt; that will track the performance of your model on every epoch end. Then, you can take a look at the improvement in a folder or an experiment tracking tool.&lt;br&gt;
So let’s do that.&lt;/p&gt;

&lt;p&gt;First, we need a callback that creates ROC curve and confusion matrix at the end of each epoch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;keras.callbacks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Callback&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plot_roc&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PerformanceVisualizationCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Callback&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;

        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_dir&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_epoch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}):&lt;/span&gt;
        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;             
        &lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# plot and save confusion matrix
&lt;/span&gt;        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confusion_matrix_epoch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

       &lt;span class="c1"&gt;# plot and save roc curve
&lt;/span&gt;        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_roc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;roc_curve_epoch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we simply pass it to the model.fit() callbacks argument.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;performance_cbk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PerformanceVisualizationCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                      &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;performance_vizualizations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;performance_cbk&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can have multiple callbacks if you want to.&lt;/p&gt;

&lt;p&gt;Now you will be able to look at those visualizations as your model trains:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/v9BQNgevRKE"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Note:&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;If you want to log everything to the experiment tracking tool like Neptune your callback would look a bit different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;keras.callbacks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Callback&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;neptune&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plot_roc&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jakub-czakon/examples&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;keras-metrics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NeptuneLoggerCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Callback&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_batch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;batch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_epoch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confusion_matrix&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_roc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;roc_curve&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that you &lt;strong&gt;don’t need to create folders for images as the charts will be sent to your tool directly&lt;/strong&gt;. On the flip side you have to create an experiment to start tracking your runs.&lt;br&gt;
Once you have that it is business as usual.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;NeptuneLoggerCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                     &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can explore metrics and performance charts in the app.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/X_Dz40bjKnA"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  How to plot Keras history object?
&lt;/h1&gt;

&lt;p&gt;Whenever fit() is called, it returns a &lt;strong&gt;History&lt;/strong&gt; object that can be used to visualize the training history. &lt;strong&gt;It contains a dictionary with loss and metric values&lt;/strong&gt; at each epoch calculated both for training and validation datasets.&lt;/p&gt;

&lt;p&gt;For example, lets extract the ‘accuracy’ metric and use matplotlib to plot it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                    &lt;span class="n"&gt;validation_split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Plot training &amp;amp; validation accuracy values
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;val_‘accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Epoch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Train&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Test&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;upper left&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2Fhistory.png%3Fw%3D398%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2Fhistory.png%3Fw%3D398%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Keras Metrics Example
&lt;/h1&gt;

&lt;p&gt;Ok, so you’ve gone a long way and learned a bunch. To refresh your memory let’s put it all together in an single example.&lt;br&gt;
We’ll start by taking the mnist dataset and created a simple CNN model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;

&lt;span class="n"&gt;mnist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mnist&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_train&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;255.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_test&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;255.0&lt;/span&gt;
&lt;span class="n"&gt;validation_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;softmax&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll create a custom metric, multiclass &lt;strong&gt;f1 score in keras&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="n"&gt;all_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_positives&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;precision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="n"&gt;predicted_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_positives&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f1_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;precision_m&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recall_m&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll create a custom tf.keras metric: &lt;strong&gt;MulticlassTruePositives&lt;/strong&gt; to be exact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MulticlassTruePositives&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metric&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multiclass_true_positives&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MulticlassTruePositives&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initializer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;zeros&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sample_weight&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sample_weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_weight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset_states&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# The state of the metric will be reset at the start of each epoch.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;true_positives&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll &lt;strong&gt;compile the keras model&lt;/strong&gt; with our metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sgd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sparse_categorical_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;categorical_accuracy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;f1_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                       &lt;span class="n"&gt;recall_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                       &lt;span class="n"&gt;precision_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TopKCategoricalAccuracy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                       &lt;span class="nc"&gt;MulticlassTruePositives&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll implement keras &lt;strong&gt;callback that plots ROC curve and Confusion Matrix&lt;/strong&gt; to a folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;keras.callbacks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Callback&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plot_roc&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PerformanceVisualizationCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Callback&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;

        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_dir&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_epoch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}):&lt;/span&gt;
        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;             
        &lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# plot and save confusion matrix
&lt;/span&gt;        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confusion_matrix_epoch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

       &lt;span class="c1"&gt;# plot and save roc curve
&lt;/span&gt;        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_roc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;roc_curve_epoch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;performance_viz_cbk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PerformanceVisualizationCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                                       &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                       &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                       &lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;perorfmance_charts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll &lt;strong&gt;run training&lt;/strong&gt; and monitor the performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;performance_viz_cbk&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll &lt;strong&gt;visualize metrics from keras history object:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;val_accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Epoch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Train&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Test&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;upper left&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will monitor and explore your experiments in a tool like TensorBoard or Neptune. You just need to &lt;strong&gt;add another callback or modify the one you have&lt;/strong&gt; created before:&lt;/p&gt;

&lt;h1&gt;
  
  
  Tensorboard
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt;  &lt;span class="n"&gt;tf.keras.callbacks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TensorBoard&lt;/span&gt;

&lt;span class="n"&gt;tensorboard_cbk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TensorBoard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logs/training-example/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;performance_viz_cbk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                    &lt;span class="n"&gt;tensorboard_cbk&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With TensorBoard you need to start a local server and explore your runs in the browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tensorboard&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;logdir&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;training&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2Ftensorboard_terminal.png%3Fw%3D1057%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2Ftensorboard_terminal.png%3Fw%3D1057%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Neptune
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jakub-czakon/examples&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;keras-metrics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NeptuneLoggerCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Callback&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;validation_data&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_batch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;batch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_epoch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confusion_matrix&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;plot_roc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;roc_curve&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;neptune_logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NeptuneLoggerCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                       &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check this &lt;a href="https://ui.neptune.ai/jakub-czakon/examples/e/EX-501/charts?utm_source=devto&amp;amp;utm_medium=post&amp;amp;utm_campaign=blog-keras-metrics" rel="noopener noreferrer"&gt;example experiment run&lt;/a&gt; if you are interested:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/X_Dz40bjKnA"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Hopefully, this article gave you some background into model evaluation techniques in keras.&lt;/p&gt;

&lt;p&gt;We’ve covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;built-in methods in keras and tf.keras,
*implementation of your own custom metrics,
*how you can visualize custom performance charts as your model is training.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more information check out the &lt;a href="https://github.com/keras-team/keras/blob/master/keras/metrics.py" rel="noopener noreferrer"&gt;Keras Repository&lt;/a&gt; and &lt;a href="https://www.tensorflow.org/api_docs/python/tf/keras/metrics" rel="noopener noreferrer"&gt;TensorFlow Metrics documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Happy training!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Wed, 25 Mar 2020 10:12:48 +0000</pubDate>
      <link>https://dev.to/jakubczakon/how-to-do-hyperparameter-tuning-on-any-python-script-in-3-easy-steps-26np</link>
      <guid>https://dev.to/jakubczakon/how-to-do-hyperparameter-tuning-on-any-python-script-in-3-easy-steps-26np</guid>
      <description>&lt;p&gt;This article was originally &lt;a href="https://neptune.ai/blog/hyperparameter-tuning-on-any-python-script?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-hyperparameter-tuning-on-any-python-script"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners. &lt;/p&gt;




&lt;p&gt;You wrote a Python script that trains and evaluates your machine learning model. Now, you would like to automatically tune hyperparameters to improve its performance?&lt;/p&gt;

&lt;p&gt;I got you!&lt;/p&gt;

&lt;p&gt;In this article, I will show you how to convert your script into an objective function that can be optimized with any hyperparameter optimization library.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3rub5Zgc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/hyperparametrization-1.png%3Fresize%3D1024%252C496%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3rub5Zgc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/hyperparametrization-1.png%3Fresize%3D1024%252C496%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It will take just 3 steps and you will be tuning model parameters like there is no tomorrow.&lt;/p&gt;

&lt;p&gt;Ready?&lt;/p&gt;

&lt;p&gt;Let's go!&lt;/p&gt;

&lt;p&gt;I suppose your main.py script looks something like this one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'data/train.csv'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ID_code'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                  &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'validation AUC:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h1&gt;
  
  
  Step 1: Decouple search parameters from code
&lt;/h1&gt;

&lt;p&gt;Take the parameters that you want to tune and put them in a dictionary at the top of your script. By doing that you effectively decouple search parameters from the rest of the code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../data/train.csv'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ID_code'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                  &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'validation AUC:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h1&gt;
  
  
  Step 2: Wrap training and evaluation into a function
&lt;/h1&gt;

&lt;p&gt;Now, you can put the entire training and evaluation logic inside of a train_evaluate function. This function takes parameters as input and outputs the validation score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../data/train.csv'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ID_code'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;search_params&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'validation AUC:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h1&gt;
  
  
  Step 3: Run Hypeparameter Tuning script
&lt;/h1&gt;

&lt;p&gt;We are almost there.&lt;/p&gt;

&lt;p&gt;All you need to do now is to use this train_evaluate function as an objective for the black-box optimization library of your choice.&lt;/p&gt;

&lt;p&gt;I will use Scikit Optimize which I have described in great detail in another article but you can use any hyperparameter optimization library out there.&lt;/p&gt;

&lt;p&gt;In a nutshell I:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define the search SPACE,&lt;/li&gt;
&lt;li&gt;create the objective function that will be minimized,&lt;/li&gt;
&lt;li&gt;run the optimization via skopt.forest_minimize function.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this example, I will try 100 different configurations starting with 10 randomly chosen parameter sets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;skopt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;script_step2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;

&lt;span class="n"&gt;SPACE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'log-uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'uniform'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;use_named_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forest_minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_random_starts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;best_auc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fun&lt;/span&gt;
&lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'best result: '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_auc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'best parameters: '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is it.&lt;/p&gt;

&lt;p&gt;The results object contains information about the best score and parameters that produced it.&lt;/p&gt;




&lt;h1&gt;
  
  
  Note:
&lt;/h1&gt;

&lt;p&gt;If you want to visualize your training and save diagnostic charts after it finishes you can add one callback and one function call to log every hyperparameter search to Neptune. &lt;/p&gt;

&lt;p&gt;Just use this &lt;a href="https://docs.neptune.ai/integrations/optuna.html?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-hyperparameter-tuning-on-any-python-script"&gt;optuna monitoring helper function&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptune&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptunecontrib.monitoring.skopt&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sk_utils&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;skopt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;script_step2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;

&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'jakub-czakon/blog-hpo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'hpo-on-any-script'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;upload_source_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'*.py'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;SPACE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'log-uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'uniform'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;use_named_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sk_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NeptuneMonitor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forest_minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_random_starts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;sk_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, when you run your parameter sweep you will see the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n-G-c9RP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/optuna_monitoring.gif%3Ffit%3D600%252C383%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n-G-c9RP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/optuna_monitoring.gif%3Ffit%3D600%252C383%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://ui.neptune.ai/jakub-czakon/blog-hpo/e/BLOG-369/charts?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-hyperparameter-tuning-on-any-python-script"&gt;skopt hyperparameter sweep experiment&lt;/a&gt; with all the code, charts and results.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final thoughts
&lt;/h1&gt;

&lt;p&gt;In this article, you've learned how to optimize hyperparameters of pretty much any Python script in just 3 steps.&lt;/p&gt;

&lt;p&gt;Hopefully, with this knowledge, you will build better machine learning models with less effort.&lt;/p&gt;

&lt;p&gt;Happy training!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Mon, 23 Mar 2020 06:40:31 +0000</pubDate>
      <link>https://dev.to/jakubczakon/exploratory-data-analysis-for-natural-language-processing-a-complete-guide-to-python-tools-2nk5</link>
      <guid>https://dev.to/jakubczakon/exploratory-data-analysis-for-natural-language-processing-a-complete-guide-to-python-tools-2nk5</guid>
      <description>&lt;p&gt;This article was originally posted by Shahul ES on the &lt;a href="https://neptune.ai/blog/exploratory-data-analysis-natural-language-processing-tools?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Neptune blog&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Exploratory data analysis is one of the most important parts of any machine learning workflow and Natural Language Processing is no different. But &lt;strong&gt;which tools you should choose to explore and visualize text data efficiently&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;In this article, we will &lt;strong&gt;discuss and implement nearly all the major techniques&lt;/strong&gt; that you can use to understand your text data and give you a complete(ish) tour into Python tools that get the job done.&lt;/p&gt;

&lt;h1&gt;
  
  
  Before we start: Dataset and Dependencies
&lt;/h1&gt;

&lt;p&gt;In this article, we will use a &lt;a href="https://www.kaggle.com/therohk/million-headlines"&gt;million news headlines dataset&lt;/a&gt; from Kaggle.&lt;/p&gt;

&lt;p&gt;If you want to follow the analysis step-by-step you may want to install the following libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   pandas matplotlib numpy &lt;span class="se"&gt;\&lt;/span&gt;
   nltk seaborn sklearn gensim pyldavis &lt;span class="se"&gt;\&lt;/span&gt;
   wordcloud textblob spacy textstat
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, we can take a look at the data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'data/abcnews-date-text.csv'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7Rfsry7S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output1.png%3Fw%3D979%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7Rfsry7S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output1.png%3Fw%3D979%26ssl%3D1" alt="image1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dataset contains only two columns, the published date, and the news heading.&lt;/p&gt;

&lt;p&gt;For simplicity, I will be exploring the first &lt;strong&gt;10000 rows&lt;/strong&gt; from this dataset. Since the headlines are sorted by publish_date it is actually &lt;strong&gt;2 months from February/19/2003 until April/07/2003&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Ok, I think we are ready to start our data exploration!&lt;/p&gt;

&lt;h1&gt;
  
  
  Analyzing text statistics
&lt;/h1&gt;

&lt;p&gt;Text statistics visualizations are simple but very insightful techniques.&lt;/p&gt;

&lt;p&gt;They include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;word frequency analysis,&lt;/li&gt;
&lt;li&gt;sentence length analysis,&lt;/li&gt;
&lt;li&gt;average word length analysis,&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those really help &lt;strong&gt;explore the fundamental characteristics&lt;/strong&gt; of the text data.&lt;/p&gt;

&lt;p&gt;To do so, we will be mostly using &lt;strong&gt;histograms&lt;/strong&gt; (continuous data) and &lt;strong&gt;bar charts&lt;/strong&gt; (categorical data).&lt;/p&gt;

&lt;p&gt;First, I’ll take a look at the number of characters present in each sentence. This can give us a rough idea about the news headline length.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jzrTo0ah--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/text_statistics_nr_characters.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jzrTo0ah--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/text_statistics_nr_characters.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/o/neptune-ml/org/eda-nlp-tools/n/1-0-character-length-histogram-27f4f679-09fd-4490-b4e3-9020acc1c55d/0e390d7b-fd8a-4612-8a08-d388cce901a7?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The histogram shows that news headlines range from 10 to 70 characters and generally, it is between 25 to 55 characters.&lt;/p&gt;

&lt;p&gt;Now, we will move on to data exploration at a word-level. Let’s plot the number of words appearing in each news headline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;\
    &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;\
    &lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yblFNfmY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/text_statistics_nr_words.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yblFNfmY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/text_statistics_nr_words.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/1-1-word-number-histogram-aff0bde6-6ad1-45cf-a8f8-68a2ad7da521/e4cee3db-8d07-4dc6-8584-063b11e76809?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is clear that the number of words in news headlines ranges from 2 to 12 and mostly falls between 5 to 7 words.&lt;/p&gt;

&lt;p&gt;Up next, let’s check the &lt;strong&gt;average word length&lt;/strong&gt; in each sentence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;\
   &lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt; \
   &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5u2ik8Ah--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_statistics_word_lenght.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5u2ik8Ah--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_statistics_word_lenght.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/1-2-word-length-histogram-6204616c-6314-4ddd-9398-fe73415c09ff/e5c67525-6a16-4751-b4c5-4c64c1ad2730?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The average word length ranges between 3 to 9 with 5 being the most common length. Does it meanz that people are using really short words in news headlines?&lt;/p&gt;

&lt;p&gt;Let’s find out.&lt;/p&gt;

&lt;p&gt;One reason why this may not be true is stopwords. &lt;strong&gt;Stopwords are the words that are most commonly used in any language&lt;/strong&gt; such as “the”,” a”,” an” etc. As these words are probably small in length these words may have caused the above graph to be left-skewed.&lt;/p&gt;

&lt;p&gt;Analyzing the amount and the types of stopwords can give us some good insights into the data.&lt;/p&gt;

&lt;p&gt;To get the corpus containing stopwords you can use the &lt;a href="https://www.nltk.org/"&gt;nltk library&lt;/a&gt;. Nltk contains stopwords from many languages. Since we are only dealing with English news I will filter the English stopwords from the corpus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;nltk&lt;/span&gt;
&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'stopwords'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, we’ll create the corpus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="n"&gt;dic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;dic&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and plot top stopwords.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LaNuYhif--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_statistics_top_stopwords.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LaNuYhif--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_statistics_top_stopwords.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/1-3-top-stopwords-barchart-b953763c-3fea-4331-bff0-429411793e5f/5c0fca05-ba07-4564-a02e-c44b08bfb8cb?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can evidently see that stopwords such as “to”,” in” and “for” dominate in news headlines.&lt;/p&gt;

&lt;p&gt;So now &lt;strong&gt;we know which stopwords occur frequently in our text, let’s inspect which words other than these stopwords occur frequently&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will use the &lt;a href="https://pymotw.com/2/collections/counter.html"&gt;counter function&lt;/a&gt; from the collections library to count and store the occurrences of each word in a list of tuples. This is a &lt;strong&gt;very useful function when we deal with word-level analysis&lt;/strong&gt; in natural language processing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;most&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;most&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vfvewc-k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_statistics_top_nonstopwords.png%3Fw%3D980%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vfvewc-k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_statistics_top_nonstopwords.png%3Fw%3D980%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/1-4-top-non-stopwords-barchart-36267acc-a418-4a5f-a3ba-67a3b51dde12/b57bc536-8cec-46a7-918c-60fba6f2c83d?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Wow! The “us”, “Iraq” and “war” dominate the headlines over the last 15 years.&lt;/p&gt;

&lt;p&gt;Here ‘us’ could mean either the USA or us (you and me). us is not a stopword, but when we observe other words in the graph they are all related to the US — Iraq war and “us” here probably indicate the USA.&lt;/p&gt;

&lt;h1&gt;
  
  
  Ngram exploration
&lt;/h1&gt;

&lt;p&gt;Ngrams are simply &lt;strong&gt;contiguous sequences of n words&lt;/strong&gt;. For example “riverbank”,” The three musketeers” etc.&lt;br&gt;
If the number of words is two, it is called bigram. For 3 words it is called a trigram and so on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking at most frequent n-grams can give you a better understanding of the context&lt;/strong&gt; in which the word was used.&lt;/p&gt;

&lt;p&gt;To implement n-grams we will use ngrams function from nltk.util. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;nltk.util&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ngrams&lt;/span&gt;
&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ngrams&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'I'&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'went'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'to'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'the'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'river'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'bank'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gSE6FOMH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output2.png%3Fw%3D691%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gSE6FOMH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output2.png%3Fw%3D691%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we know how to create n-grams lets visualize them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To build a representation of our vocabulary we will use Countvectorizer&lt;/strong&gt;. Countvectorizer is a simple method used to tokenize, vectorize and represent the corpus in an appropriate form. It is available in &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html"&gt;sklearn.feature_engineering.text&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So with all this, we will analyze the top bigrams in our news headlines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_top_ngram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ngram_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bag_of_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sum_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bag_of_words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;words_freq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sum_words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; 
                  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vocabulary_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="n"&gt;words_freq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words_freq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;words_freq&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;top_n_bigrams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;get_top_ngram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;top_n_bigrams&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4gfoHwhT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/bigrams.png%3Fw%3D987%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4gfoHwhT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/bigrams.png%3Fw%3D987%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/2-0-top-ngrams-barchart-671a187d-c3b4-475a-bc9e-8aa6c937923b/c427446f-7b0e-4621-b791-47b0fd31a39e?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can observe that the bigrams such as ‘anti-war’, ’killed in’ that are related to war dominate the news headlines.&lt;/p&gt;

&lt;p&gt;How about trigrams?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;top_tri_grams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;get_top_ngram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;top_tri_grams&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MlhjN7D2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/trigrams.png%3Fresize%3D1024%252C663%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MlhjN7D2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/trigrams.png%3Fresize%3D1024%252C663%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/2-0-top-ngrams-barchart-671a187d-c3b4-475a-bc9e-8aa6c937923b/c427446f-7b0e-4621-b791-47b0fd31a39e?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can see that many of these trigrams are some combinations of “to face court” and “anti war protest”. &lt;strong&gt;It means that we should put some effort into data cleaning&lt;/strong&gt; and see if we were able to combine those synonym terms into one clean token.&lt;/p&gt;

&lt;h1&gt;
  
  
  Topic Modeling exploration with pyLDAvis
&lt;/h1&gt;

&lt;p&gt;Topic modeling is the process of &lt;strong&gt;using unsupervised learning techniques to extract the main topics that occur in a collection of documents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158"&gt;Latent Dirichlet Allocation&lt;/a&gt; (LDA) is an easy to use and efficient model for topic modeling. Each document is represented by the distribution of topics and each topic is represented by the distribution of words.&lt;/p&gt;

&lt;p&gt;Once we categorize our documents in topics we can dig into further &lt;strong&gt;data exploration for each topic or topic group&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But before getting into topic modeling we have to pre-process our data a little. We will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tokenize: the process by which sentences are converted to a list of tokens or words.&lt;/li&gt;
&lt;li&gt;remove stopwords&lt;/li&gt;
&lt;li&gt;lemmatize: reduces the inflectional forms of each word into a common base or root.&lt;/li&gt;
&lt;li&gt;convert to the bag of words: Bag of words is a dictionary where the keys are words(or ngrams/tokens) and values are the number of times each word occurs in the corpus.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With NLTK you can tokenize and lemmatize easily:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;nltk&lt;/span&gt;
&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'punkt'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'wordnet'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;preprocess_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;stem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PorterStemmer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;lem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WordNetLemmatizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;news&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

        &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lemmatize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;
&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;preprocess_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, let’s create the bag of words model using gensim&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gensim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;corpora&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dictionary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bow_corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;doc2bow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and we can finally create the LDA model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lda_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gensim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LdaMulticore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bow_corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                   &lt;span class="n"&gt;num_topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                   &lt;span class="n"&gt;id2word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                    
                                   &lt;span class="n"&gt;passes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                   &lt;span class="n"&gt;workers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lda_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_topics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GhzTNipC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output3.png%3Fw%3D764%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GhzTNipC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output3.png%3Fw%3D764%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The topic 0 indicates something related to the Iraq war and police. Topic 3 shows the involvement of Australia in the Iraq war.&lt;/p&gt;

&lt;p&gt;You can print all the topics and try to make sense of them but there are tools that can help you run this data exploration more efficiently. One such tool is &lt;a href="https://github.com/bmabey/pyLDAvis"&gt;pyLDAvis&lt;/a&gt; which &lt;strong&gt;visualizes the results of LDA interactively&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pyLDAvis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enable_notebook&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;vis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pyLDAvis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gensim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lda_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bow_corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vis&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/IcJKA0QN0QY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/3-0-topic-modeling-vis-ddd6a861-62d0-40cb-9207-ebd5b47d74d0/e7cb3e68-cc7b-443e-992b-414640a55a0b?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the left side, the area of each circle represents the importance of the topic relative to the corpus. As there are four topics, we have four circles.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The distance between the center of the circles indicates the similarity between the topics. Here you can see that the topic 3 and topic 4 overlap, this indicates that the topics are more similar.&lt;/li&gt;
&lt;li&gt;On the right side, the histogram of each topic shows the top 30 relevant words. For example, in topic 1 the most relevant words are police, new, may, war, etc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So in our case, we can see a lot of words and topics associated with war in the news headlines.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wordcloud
&lt;/h1&gt;

&lt;p&gt;Wordcloud is a great way to represent text data. The size and color of each word that appears in the wordcloud indicate it’s frequency or importance.&lt;/p&gt;

&lt;p&gt;Creating &lt;a href="https://amueller.github.io/word_cloud/index.html"&gt;wordcloud in python&lt;/a&gt; with is easy but we need the data in a form of a corpus. Luckily, I prepared it in the previous section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;wordcloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;STOPWORDS&lt;/span&gt;
&lt;span class="n"&gt;stopwords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STOPWORDS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;show_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;wordcloud&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;background_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'white'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_font_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;wordcloud&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;wordcloud&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'off'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wordcloud&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;show_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WMe7mEdK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/wordclouds.png%3Fw%3D683%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WMe7mEdK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/wordclouds.png%3Fw%3D683%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/4-0-wordclouds-853dfded-4d17-4f37-83e4-15ec53f74e60/5833b046-3cf9-4c0f-8fbf-4a5933da924e?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Again, you can see that the terms associated with the war are highlighted which indicates that these words occurred frequently in the news headlines.&lt;/p&gt;

&lt;p&gt;There are &lt;strong&gt;many parameters that can be adjusted&lt;/strong&gt;. Some of the most prominent ones are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stopwords: The set of words that are blocked from appearing in the image.&lt;/li&gt;
&lt;li&gt;max_words: Indicates the maximum number of words to be displayed.&lt;/li&gt;
&lt;li&gt;max_font_size: maximum font size.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are many more options to create beautiful word clouds. For more details, you can refer here.&lt;/p&gt;

&lt;h1&gt;
  
  
  Sentiment analysis
&lt;/h1&gt;

&lt;p&gt;Sentiment analysis is a very common natural language processing task in which we &lt;strong&gt;determine if the text is positive, negative or neutral&lt;/strong&gt;. This is very useful for finding the sentiment associated with reviews, comments which can get us some valuable insights out of text data.&lt;/p&gt;

&lt;p&gt;There are many projects that will help you do sentiment analysis in python. I personally like TextBlob and Vader Sentiment.&lt;/p&gt;

&lt;h1&gt;
  
  
  Textblob
&lt;/h1&gt;

&lt;p&gt;Textblob is a python library built on top of nltk. It has been around for some time and is very easy and convenient to use.&lt;br&gt;
The sentiment function of TextBlob returns two properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;polarity: is a floating-point number that lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement.&lt;/li&gt;
&lt;li&gt;subjectivity: refers to how someone’s judgment is shaped by personal opinions and feelings. Subjectivity is represented as a floating-point value which lies in the range of [0,1].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I will run this function on our news headlines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;textblob&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextBlob&lt;/span&gt;
&lt;span class="n"&gt;TextBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'100 people killed in Iraq'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--USHrFELa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output4.png%3Fw%3D787%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--USHrFELa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output4.png%3Fw%3D787%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TextBlob claims that the text “100 people killed in Iraq” is negative and is not an opinion or feeling but rather a factual statement. I think we can agree with TextBlob here.&lt;/p&gt;

&lt;p&gt;Now that we know how to calculate those sentiment scores &lt;strong&gt;we can visualize them using a histogram and explore data even further&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;polarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TextBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;polarity&lt;/span&gt;
&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;\
   &lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;polarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kfFltImy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/sentiment_histogram.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kfFltImy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/sentiment_histogram.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/5-0-polarity-score-histogram-7435097b-2554-423d-82f9-a4dfce94ea9b/03b75d0b-4b3b-49c0-a130-b9e63a9e5bf9?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can see that the polarity mainly ranges between 0.00 and 0.20. This indicates that the majority of the news headlines are neutral.&lt;/p&gt;

&lt;p&gt;Let’s dig a bit deeper by classifying the news as negative, positive and neutral based on the scores.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;'neg'&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;'neu'&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;'pos'&lt;/span&gt;

&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;\
   &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;polarity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;polarity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IcGHryWH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/polarity_barchart.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IcGHryWH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/polarity_barchart.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/5-1-sentiment-barchart-1da2f77b-db4e-4636-b186-0328dcbb791b/ea6a3450-6d61-4b3f-9274-f1f0c241fa5c?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yep, 70 % of news is neutral with only 18% of positive and 11% of negative.&lt;/p&gt;

&lt;p&gt;Let’s take a look at &lt;strong&gt;some of the positive and negative headlines&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;'pos'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n2WUhKDA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output5-1.png%3Fw%3D786%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n2WUhKDA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output5-1.png%3Fw%3D786%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Positive news headlines are mostly about some victory in sports.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;'neg'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kLsr7goE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output6.png%3Fw%3D790%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kLsr7goE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output6.png%3Fw%3D790%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yep, pretty negative news headlines indeed.&lt;/p&gt;

&lt;h1&gt;
  
  
  Vader Sentiment Analysis
&lt;/h1&gt;

&lt;p&gt;The next library we are going to discuss is VADER. &lt;strong&gt;Vader works better in detecting negative sentiment&lt;/strong&gt;. It is very useful in the case of social media text sentiment analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VADER or Valence Aware Dictionary and Sentiment Reasoner&lt;/strong&gt; is a rule/lexicon-based, open-source sentiment analyzer pre-built library, protected under the MIT license.&lt;/p&gt;

&lt;p&gt;VADER sentiment analysis class &lt;strong&gt;returns a dictionary that contains the probabilities of the text for being positive, negative and neutral&lt;/strong&gt;. Then we can filter and choose the sentiment with most probability.&lt;/p&gt;

&lt;p&gt;We will do the same analysis using VADER and check if there is much difference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;nltk.sentiment.vader&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentimentIntensityAnalyzer&lt;/span&gt;
&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'vader_lexicon'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SentimentIntensityAnalyzer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_vader_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Polarity score returns dictionary
&lt;/span&gt;    &lt;span class="n"&gt;ss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;polarity_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;#return ss
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;\
    &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_vader_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;polarity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'neg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'neu'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'pos'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;polarity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;polarity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FerNNqb3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/vader_polarity_barchart.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FerNNqb3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/vader_polarity_barchart.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/5-1-sentiment-barchart-1da2f77b-db4e-4636-b186-0328dcbb791b/ea6a3450-6d61-4b3f-9274-f1f0c241fa5c?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yep, there is a slight difference in distribution. Even more headlines are classified as neutral 85 % and the number of negative news headlines has increased (to 13 %).&lt;/p&gt;

&lt;h1&gt;
  
  
  Named Entity Recognition
&lt;/h1&gt;

&lt;p&gt;Named entity recognition is an information extraction method in which entities that are present in the text are classified into predefined entity types like “Person”,” Place”,” Organization”, etc. By using &lt;strong&gt;NER we can get great insights about the types of entities present in the given text dataset&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let us consider an example of a news article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--47smS3zk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/news_image.png%3Fw%3D658%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--47smS3zk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/news_image.png%3Fw%3D658%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above news, the named entity recognition model should be able to identify&lt;br&gt;
entities such as RBI as an organization, Mumbai and India as Places, etc.&lt;/p&gt;

&lt;p&gt;There are three standard libraries to do Named Entity Recognition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nlp.stanford.edu/software/CRF-NER.shtml"&gt;Standford NER&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://spacy.io/"&gt;spaCy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nltk.org/"&gt;NLTK&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this tutorial, &lt;strong&gt;I will use spaCy&lt;/strong&gt; which is an open-source library for advanced natural language processing tasks. It is written in Cython and is known for its industrial applications. Besides NER, &lt;strong&gt;spaCy provides many other functionalities like pos tagging, word to vector transformation, etc&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://spacy.io/api/annotation#section-named-entities"&gt;SpaCy’s named entity recognition&lt;/a&gt; has been trained on the &lt;a href="https://catalog.ldc.upenn.edu/LDC2013T19"&gt;OntoNotes 5 corpus&lt;/a&gt; and it supports the following entity types:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--o_WUpNX_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/spacy_ner.png%3Fw%3D647%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--o_WUpNX_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/spacy_ner.png%3Fw%3D647%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are three &lt;a href="https://spacy.io/models/en/"&gt;pre-trained models for English&lt;/a&gt; in spaCy. I will use en_core_web_sm for our task but you can try other models.&lt;/p&gt;

&lt;p&gt;To use it we have to download it first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy download en_core_web_sm
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now we can initialize the language model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;spacy&lt;/span&gt;
&lt;span class="n"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"en_core_web_sm"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;One of the nice things about Spacy is that we only need to apply nlp function once, the entire background pipeline will return the objects we need.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'India and Iran have agreed to boost the economic viability &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;of the strategic Chabahar port through various measures, &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;including larger subsidies to merchant shipping firms using the facility, &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;people familiar with the development said on Thursday.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3zFn0qQg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output7.png%3Fw%3D785%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3zFn0qQg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output7.png%3Fw%3D785%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that India and Iran are recognized as Geographical locations (GPE), Chabahar as Person and Thursday as Date.&lt;/p&gt;

&lt;p&gt;We can also visualize the output using displacy module in spaCy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;spacy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;displacy&lt;/span&gt;
&lt;span class="n"&gt;displacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'ent'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J44Jjh2b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output8.png%3Fw%3D792%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J44Jjh2b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output8.png%3Fw%3D792%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This creates a very neat &lt;strong&gt;visualization of the sentence with the recognized entities&lt;/strong&gt; where each entity type is marked in different colors.&lt;/p&gt;

&lt;p&gt;Now that we know how to perform NER we can explore the data even further by doing a variety of visualizations on the named entities extracted from our dataset.&lt;/p&gt;

&lt;p&gt;First, we will &lt;strong&gt;run the named entity recognition on our news&lt;/strong&gt; headlines and store the entity types.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;\
    &lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, we can visualize the entity frequencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QAYdvvTv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/ner_frequencies_entity.png%3Fresize%3D1024%252C672%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QAYdvvTv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/ner_frequencies_entity.png%3Fresize%3D1024%252C672%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/6-0-named-entity-barchart-9012f4a0-3761-4ebf-9c25-d4f363858010/ac08ec73-ddd3-4a42-a35b-b2311eb9d075?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now we can see that the GPE and ORG dominate the news headlines followed by the PERSON entity.&lt;/p&gt;

&lt;p&gt;We can also &lt;strong&gt;visualize the most common tokens per entity&lt;/strong&gt;. Let’s check which places appear the most in news headlines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"GPE"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;gpe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;gpe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;gpe&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--m8YJY4C6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/ner_gpe-1.png%3Fw%3D1001%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--m8YJY4C6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/ner_gpe-1.png%3Fw%3D1001%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/6-1-most-common-named-entity-barchart-0614fdac-0400-4460-ac3a-b3c5669906a0/4d3d398d-df9d-484c-97ec-07390ba4dd21?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I think we can confirm the fact that the “us” means the USA in news headlines. Let’s also find the most common names that appeared in news headlines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;per&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"PERSON"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;per&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;per&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;per&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5LzN1Ltb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/ner_gpe.png%3Fw%3D981%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5LzN1Ltb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/ner_gpe.png%3Fw%3D981%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/6-1-most-common-named-entity-barchart-0614fdac-0400-4460-ac3a-b3c5669906a0/4d3d398d-df9d-484c-97ec-07390ba4dd21?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Saddam Hussain and George Bush were the presidents of Iraq and the USA during wartime. Also, we can see that the model is far from perfect classifying “vic govt” or “nsw govt” as a person rather than a government agency.&lt;/p&gt;

&lt;h1&gt;
  
  
  Exploration through Parts of Speach Tagging in python
&lt;/h1&gt;

&lt;p&gt;Parts of speech (POS) tagging is a &lt;strong&gt;method that assigns part of speech labels to words in a sentence&lt;/strong&gt;. There are eight main parts of speech:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Noun (NN)- Joseph, London, table, cat, teacher, pen, city&lt;/li&gt;
&lt;li&gt;Verb (VB)- read, speak, run, eat, play, live, walk, have, like, are, is&lt;/li&gt;
&lt;li&gt;Adjective(JJ)- beautiful, happy, sad, young, fun, three&lt;/li&gt;
&lt;li&gt;Adverb(RB)- slowly, quietly, very, always, never, too, well, tomorrow&lt;/li&gt;
&lt;li&gt;Preposition (IN)- at, on, in, from, with, near, between, about, under&lt;/li&gt;
&lt;li&gt;Conjunction (CC)- and, or, but, because, so, yet, unless, since, if&lt;/li&gt;
&lt;li&gt;Pronoun(PRP)- I, you, we, they, he, she, it, me, us, them, him, her, this&lt;/li&gt;
&lt;li&gt;Interjection (INT)- Ouch! Wow! Great! Help! Oh! Hey! Hi!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a straightforward task, as the same word may be used in different sentences in different contexts. However, once you do it, there are a lot of helpful visualizations that you can create that can give you additional insights into your dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I will use the nltk to do the parts of speech tagging&lt;/strong&gt; but there are other libraries that do a good job (spacy, textblob).&lt;/p&gt;

&lt;p&gt;Let’s look at an example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;nltk&lt;/span&gt;
&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"The greatest comeback stories in 2019"&lt;/span&gt;
&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pos_tag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oZwZdo8v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output9.png%3Fw%3D786%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oZwZdo8v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ai/wp-content/uploads/output9.png%3Fw%3D786%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also visualize the sentence parts of speech and its dependency graph with spacy.displacy module.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'The greatest comeback stories in 2019'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;displacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'dep'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jupyter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'distance'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EvrDXEyL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output10.png%3Fw%3D788%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EvrDXEyL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i2.wp.com/neptune.ai/wp-content/uploads/output10.png%3Fw%3D788%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can observe various dependency tags here. For example, DET tag denotes the relationship between the determiner “the” and the noun “stories”.&lt;/p&gt;

&lt;p&gt;You can check the list of dependency tags and their meanings &lt;a href="https://universaldependencies.org/u/dep/index.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ok, now that we now what POS tagging is, let’s use it to explore our headlines dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pos_tag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;
&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--60FNnCwl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/pos_tagging_frequencies.png%3Fw%3D948%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--60FNnCwl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/pos_tagging_frequencies.png%3Fw%3D948%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/7-0-parts-of-speach-barchart-9140250c-50d2-4343-b5e2-b3f5fb9c2089/15b07733-f02d-4a7c-b2fc-05ecffdf3e7b?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can clearly see that the noun (NN) dominates in news headlines followed by the adjective (JJ). This is typical for news articles while &lt;strong&gt;for artistic forms higher adjective(ADJ) frequency&lt;/strong&gt; could happen quite a lot.&lt;/p&gt;

&lt;p&gt;You can dig deeper into this by investigating &lt;strong&gt;which singular noun occur most commonly in news headlines&lt;/strong&gt;. Let us find out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_adjs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;adj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pos_tag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;'NN'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;adj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;adj&lt;/span&gt;
&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_adjs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--C0qoN0mD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/pos_tagging_noun_freqs.png%3Fw%3D967%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C0qoN0mD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/pos_tagging_noun_freqs.png%3Fw%3D967%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/7-1-most-common-part-of-speach-barchart-3f896e91-e21c-4ea7-811f-02acb497479f/a41302f3-8803-47ce-98e5-bb1a73eda5cc?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nouns such as “war”, “iraq”, “man” dominate in the news headlines. You can visualize and examine other parts of speech using the above function.&lt;/p&gt;

&lt;h1&gt;
  
  
  Exploring through text complexity
&lt;/h1&gt;

&lt;p&gt;It can be very informative to know &lt;strong&gt;how readable (difficult to read) the text is&lt;/strong&gt; and what type of reader can fully understand it. Do we need a college degree to understand the message or a first-grader can clearly see what the point is?&lt;/p&gt;

&lt;p&gt;You can actually put a number called readability index on a document or text. &lt;strong&gt;Readability index is a numeric value that indicates how difficult (or easy) it is to read and understand a text&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There are many readability score formulas available for the English language. Some of the most prominent ones are:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YsbM2A1q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/762/1%2AV2-4vPyIAM8T7YqGRBiesQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YsbM2A1q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/762/1%2AV2-4vPyIAM8T7YqGRBiesQ.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/shivam5992/textstat"&gt;Textstat&lt;/a&gt; is a cool Python library that provides an implementation of all these text statistics calculation methods. Let’s use Textstat to implement Flesch Reading Ease index.&lt;/p&gt;

&lt;p&gt;Now, you can plot a histogram of the scores and visualize the output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;textstat&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;flesch_reading_ease&lt;/span&gt;
&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;\
   &lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;flesch_reading_ease&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--crRpSEum--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_complexity.png%3Fw%3D951%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--crRpSEum--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/text_complexity.png%3Fw%3D951%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/8-0-text-complexity-histogram-b00e38f2-5710-4efe-85c2-77b6366dbe3b/b6b8dd8f-3ec6-4fd1-a548-7daf889444e5?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Code Snippet that Generates this Chart&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Almost all of the readability scores fall above 60. This means that an average 11-year-old student can read and understand the news headlines. Let’s check all news headlines that have a readability score below 5.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reading&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;reading&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'headline_text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iuQQz6Sn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output11.png%3Fw%3D788%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iuQQz6Sn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ai/wp-content/uploads/output11.png%3Fw%3D788%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see some of the complex words being used in news headlines like “capitulation”,” interim”,” entrapment” etc. These words may have caused the scores to fall under 5.&lt;/p&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;In this article, we discussed and implemented various exploratory data analysis methods for text data. Some common, some lesser-known but all of them could be a great addition to your data exploration toolkit.&lt;/p&gt;

&lt;p&gt;Hopefully, you will find some of them useful in your current and future projects.&lt;/p&gt;

&lt;p&gt;To make data exploration even easier, I have created a &lt;strong&gt;“Exploratory Data Analysis for Natural Language Processing Template”&lt;/strong&gt; that you can use for your work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt; &lt;a href="https://ui.neptune.ai/neptune-ai/eda-nlp-tools/n/8-0-text-complexity-histogram-b00e38f2-5710-4efe-85c2-77b6366dbe3b/95b28cf3-a123-4104-bdd9-358e123b8a58?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-exploratory-data-analysis-natural-language-processing-tools"&gt;Get Exploratory Data Analysis for Natural Language Processing Template&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Also, as you may have seen already, &lt;strong&gt;for every chart in this article, there is a code snippet&lt;/strong&gt; that creates it. Just click on the button below a chart.&lt;/p&gt;

&lt;p&gt;Happy exploring!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>exploratorydataanalysis</category>
      <category>naturallanguageprocessing</category>
    </item>
    <item>
      <title>Optuna vs Hyperopt: Which Hyperparameter Optimization Library Should You Choose?</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Mon, 13 Jan 2020 10:39:50 +0000</pubDate>
      <link>https://dev.to/jakubczakon/optuna-vs-hyperopt-which-hyperparameter-optimization-library-should-you-choose-j7a</link>
      <guid>https://dev.to/jakubczakon/optuna-vs-hyperopt-which-hyperparameter-optimization-library-should-you-choose-j7a</guid>
      <description>&lt;p&gt;This article was originally &lt;a href="https://neptune.ml/blog/optuna-vs-hyperopt?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-optuna-vs-hyperopt&amp;amp;utm_content=originally-posted-on"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners. &lt;/p&gt;




&lt;p&gt;Thinking which library should you choose for hyperparameter optimization?&lt;/p&gt;

&lt;p&gt;Been using Hyperopt for a while and feel like changing?&lt;/p&gt;

&lt;p&gt;Just heard about &lt;a href="https://optuna.org/"&gt;Optuna&lt;/a&gt; and you want to see how it works?&lt;/p&gt;

&lt;p&gt;Good!&lt;/p&gt;

&lt;p&gt;In this article I will:&lt;/p&gt;

&lt;p&gt;show you an example of using Optuna and Hyperopt on a real problem,&lt;br&gt;
compare Optuna vs Hyperopt on API, documentation, functionality, and more,&lt;br&gt;
give you my overall score and recommendation on which hyperparameter optimization library you should use.&lt;br&gt;
Let’s do it.&lt;/p&gt;
&lt;h1&gt;
  
  
  Evaluation criteria &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Ease of use and API&lt;br&gt;
Options methods and hyper(hyperparameters)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search Space&lt;/li&gt;
&lt;li&gt;Optimization Methods&lt;/li&gt;
&lt;li&gt;Callbacks&lt;/li&gt;
&lt;li&gt;Persisting and Restarting&lt;/li&gt;
&lt;li&gt;Run Pruning​&lt;/li&gt;
&lt;li&gt;Handling Exceptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Documentation&lt;br&gt;
Visualizations&lt;br&gt;
Speed and Parallelization&lt;br&gt;
Experimental Results&lt;/p&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Ease of use and API
&lt;/h1&gt;

&lt;p&gt;In this section I want to see how to run a basic hyperparameter tuning script for both libraries, see how natural and easy-to-use it is and what is the API.&lt;/p&gt;
&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;You define your &lt;strong&gt;search space and objective in one function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Moreover, you sample the hyperparameters from the trial object. Because of that, the &lt;strong&gt;parameter space is defined at execution&lt;/strong&gt;. For those of you who like Pytorch because of this &lt;strong&gt;imperative approach&lt;/strong&gt;, Optuna will feel natural.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Then, you create the study object and optimize it. What is great is that &lt;strong&gt;you can choose&lt;/strong&gt; whether you want to &lt;strong&gt;maximize or minimize&lt;/strong&gt; your objective. That is useful when optimizing a metric like AUC  because you don’t have to change the sign of the objective before training and then convert best results after training to get a positive score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'Optuna &amp;gt; Hyperopt
maximize'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;That is it.&lt;/p&gt;

&lt;p&gt;Everything you may want to know about the optimization is available in the study object.&lt;/p&gt;

&lt;p&gt;What I love about Optuna is that I get to define how I want to sample my search space on-the-fly which gives me &lt;strong&gt;a lot of&lt;/strong&gt; flexibility. Ability to choose a direction of optimization is also pretty nice. &lt;/p&gt;

&lt;p&gt;If you want to see the full code example you can scroll down to the Example script.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;You start by defining your parameter search space:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SPACE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
&lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
         &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
&lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
         &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
&lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
         &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
&lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Then, you create an objective function that you want to minimize. That means you will have to &lt;strong&gt;flip the sign of your objective&lt;/strong&gt; for the-higher-the-better metric like AUC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Finally, you instantiate the &lt;em&gt;Trials()&lt;/em&gt; object and minimize your objective on the parameter search SPACE.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trials&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fmin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;algo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tpe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_evals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;…and done!&lt;/p&gt;

&lt;p&gt;All the information about the hyperparameters that were tested and the corresponding score are kept in the trials object.&lt;/p&gt;

&lt;p&gt;The thing that I don’t like is the fact that I need to instantiate the Trials() even in the simplest of cases. I would rather have fmin return the trials and do the instantiation by default. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Both libraries do a good job here but I feel that &lt;strong&gt;Optuna is slightly better&lt;/strong&gt; because of the flexibility, imperative approach to sampling parameters and a bit less boilerplate. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ease of use and API​​&lt;/strong&gt;&lt;br&gt;
Optuna &amp;gt; Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Options, methods, and hyper(hyperparameters)
&lt;/h2&gt;

&lt;p&gt;In real-life scenarios running hyperparameter optimization requires a lot of additional options away from the golden path. Areas that I am particularly interested in are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;search space&lt;/li&gt;
&lt;li&gt;optimization methods/algorithms&lt;/li&gt;
&lt;li&gt;callbacks&lt;/li&gt;
&lt;li&gt;persisting and restarting parameter sweeps&lt;/li&gt;
&lt;li&gt;pruning unpromising runs&lt;/li&gt;
&lt;li&gt;handling exceptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this section, I will compare Optuna and Hyperopt on exactly those.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Search Space
&lt;/h3&gt;

&lt;p&gt;In this section I want to compare the search space definition, flexibility in defining a complex space and sampling options for each parameter type (Float, Integer, Categorical).&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;You can find sampling options for all hyperparameter types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;for categorical parameters you can use trials.suggest_categorical&lt;/li&gt;
&lt;li&gt;for integers there is trials.suggest_int&lt;/li&gt;
&lt;li&gt;for float parameters you have trials.suggest_uniform, trials.suggest_loguniform and even, more exotic, trials.suggest_discrete_uniform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Especially for the integer parameters, you could wish for more options but it deals with most use-cases.&lt;/p&gt;

&lt;p&gt;A great feature of this library is that you sample from the parameter space on-the-fly and you can do it however you like.&lt;/p&gt;

&lt;p&gt;You can use if statements, you can change intervals from which you search, you can use the information from the &lt;em&gt;trial&lt;/em&gt; object to guide your search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;classifier_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'classifier'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'SVC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'RandomForest'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classifier_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'SVC'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;svc_c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'svc_c'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;classifier_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SVC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;svc_c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rf_max_depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'rf_max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;classifier_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ensemble&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rf_max_depth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is awesome, you can do literally anything!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;Search space is where Hyperopt really gives you a ton of sampling options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;for categorical parameters you have hp.choice&lt;/li&gt;
&lt;li&gt;for integers you get hp.randit, hp.quniform, hp.qloguniform and hp.qlognormal&lt;/li&gt;
&lt;li&gt;for floats we have hp.normal,  hp.uniform, hp.lognormal and hp.loguniform​&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As far as I know this is the most extensive sampling functionality out there.&lt;/p&gt;

&lt;p&gt;You define your search space before you run optimization but you can create very complex parameter spaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SPACE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'classifier_type'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'type'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'naive_bayes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'type'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'svm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'C'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lognormal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'svm_C'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s"&gt;'kernel'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'svm_kernel'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'ktype'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'linear'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'ktype'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'RBF'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'width'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lognormal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'svm_rbf_width'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
            &lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'type'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'dtree'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'criterion'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'dtree_criterion'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'gini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'entropy'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'dtree_max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qlognormal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'dtree_max_depth_int'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
        &lt;span class="s"&gt;'min_samples_split'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qlognormal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'dtree_min_samples_split'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;By combining &lt;em&gt;hp.choice&lt;/em&gt; with other sampling methods we can have conditional spaces. This is &lt;strong&gt;useful when you are optimizing hyperparameters for a machine learning&lt;/strong&gt; pipeline that involves preprocessing, feature engineering and model training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I have to say &lt;strong&gt;I like them both&lt;/strong&gt;. I can define nested search spaces easily and I have a lot of sampling options for all the parameter types. &lt;strong&gt;Optuna has an imperative parameter definition&lt;/strong&gt;, which gives more flexibility while &lt;strong&gt;Hyperopt has more parameter sampling options&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Search Space​​&lt;/strong&gt;​&lt;br&gt;
Optuna = Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimization methods
&lt;/h3&gt;

&lt;p&gt;Both Optuna and Hyperopt &lt;strong&gt;are using the same optimization methods&lt;/strong&gt; under the hood. They have:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;rand.suggest&lt;/strong&gt; (Hyperopt) and  &lt;strong&gt;samplers.random.RandomSampler&lt;/strong&gt; (Optuna)&lt;/p&gt;

&lt;p&gt;Your standard random search over the parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;tpe.suggest&lt;/strong&gt; (Hyperopt) and  &lt;strong&gt;samplers.tpe.sampler.TPESampler&lt;/strong&gt; (Optuna)&lt;/p&gt;

&lt;p&gt;Tree of Parzen Estimators (TPE). The idea behind this method is similar to what was explained in the previous blog post about Scikit Optimize. We &lt;strong&gt;use a cheap surrogate model to estimate the performance of the expensive objective function&lt;/strong&gt; on a set of parameters.&lt;/p&gt;

&lt;p&gt;The difference between the methods used in Scikit Optimize and Tree of Parzen Estimators (TPE) is that instead of estimating the actual performance (point estimation) we want to estimate the density in the tails. We want to be able to tell whether a run will be good (right tail) or bad (left tail). &lt;/p&gt;

&lt;p&gt;I like the following explanation taken from the &lt;a href="https://www.automl.org/book/"&gt;AutoML_Book&lt;/a&gt; by amazing folks over at AutoML.org Freiburg.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of modeling the probability p(y|λ) of observations y given the &amp;gt; configurations λ, the Tree Parzen Estimator models density functions p(λ|y &amp;lt; α) and p(λ|y ≥ α). Given a percentile α (usually set to 15%), the observations are divided in good observations and bad observations and simple 1-d Parzen windows are used to model the two distributions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By using p(λ|y &amp;lt; α) and p(λ|y ≥ α) you can estimate the expected improvement of a parameter configuration over previous best.&lt;/p&gt;

&lt;p&gt;Interestingly, both for Optuna and Hyperopt, there are no options to specify the &lt;strong&gt;α&lt;/strong&gt; parameter in the optimizer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;integration.SkoptSampler&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Optuna lets you use samplers from Scikit-Optimize (skopt). &lt;/p&gt;

&lt;p&gt;Skopt offers a bunch of Tree-Based methods as a choice for your surrogate model. &lt;/p&gt;

&lt;p&gt;In order to use them you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;create a &lt;em&gt;SkoptSampler&lt;/em&gt; instance specifying the parameters of the surrogate model and acquisition function in the &lt;em&gt;skopt_kwargs&lt;/em&gt; argument, &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pass the sampler instance to the &lt;em&gt;optuna.create_study&lt;/em&gt; method&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;optuna.integration&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SkoptSampler&lt;/span&gt;

&lt;span class="n"&gt;sampler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SkoptSampler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skopt_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'base_estimator'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'RF'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                     &lt;span class="s"&gt;'n_random_starts'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                     &lt;span class="s"&gt;'base_estimator'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'ET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                     &lt;span class="s"&gt;'acq_func'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'EI'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                     &lt;span class="s"&gt;'acq_func_kwargs'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'xi'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;pruners.SuccessiveHalvingPruner&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also use one of the multiarmed bandit methods called Asynchronous Successive Halving Algorithm (ASHA). If you are interested in the details &lt;a href="https://arxiv.org/abs/1810.05934"&gt;please read the paper&lt;/a&gt; but the general idea is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run a bunch of parameter configurations for some time&lt;/li&gt;
&lt;li&gt;prune the (half of) the least promising runs every &lt;/li&gt;
&lt;li&gt;run a bunch of parameter configurations for some more time&lt;/li&gt;
&lt;li&gt;prune the (half of) the least promising runs every &lt;/li&gt;
&lt;li&gt;stop when only one configuration is left&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By doing so, the search can focus on the more promising runs. However, the static allocation of the budgets to configurations is a problem in practice (which a newer approach called HyperBand solves). &lt;/p&gt;

&lt;p&gt;It is very easy to use ASHA in Optuna. Just pass a &lt;em&gt;SuccesiveHalvingPruner&lt;/em&gt; to &lt;em&gt;.create_study()&lt;/em&gt; and you are good to go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;optuna.pruners&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SuccessiveHalvingPruner&lt;/span&gt;

&lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pruner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SuccessiveHalvingPruner&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Nice and simple.&lt;/p&gt;

&lt;p&gt;If you would like to learn more, you may want to check out my &lt;a href="https://neptune.ml/blog/scikit-optimize?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-optuna-vs-hyperopt&amp;amp;utm_content=other-neptune-post"&gt;article about Scikit Optimize&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Overall, there are a lot of options when it comes to optimization functions right now. However, there are some important ones, like &lt;a href="https://github.com/automl/HpBandSter"&gt;Hyperband or BOHB&lt;/a&gt; missing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;atpe.suggest&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Recently added, adaptive TPE was invented at ElectricBrain and it is actually a series of (not so) little improvements that they experimented with on top of TPE. &lt;/p&gt;

&lt;p&gt;The authors explain their approach and modifications they made to TPE thoroughly in this fascinating &lt;a href="https://www.electricbrain.io/post/learning-to-optimize"&gt;blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is super easy to use. Instead of tpe.suggest you need to pass &lt;em&gt;atpe.suggest&lt;/em&gt; to your &lt;em&gt;fmin&lt;/em&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;hyperopt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fmin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;atpe&lt;/span&gt;

&lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fmin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;max_evals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;algo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;atpe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I really like this effort to include new optimization algorithms in the library, especially since it’s a new original approach not just an integration with the existing algorithm.&lt;/p&gt;

&lt;p&gt;Hopefully, in the future, multi-armed bandit methods like Hyperband, BOHB, or tree-based methods like SMAC3 will be included as well. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Optimization methods&lt;/strong&gt;​&lt;br&gt;
Optuna = Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Callbacks
&lt;/h3&gt;

&lt;p&gt;In this section, I want to see how easy it is to define callbacks to monitor/snapshot/modify training after each iteration. It is useful, especially when your training is long and/or distributed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;User callbacks are &lt;strong&gt;nicely supported&lt;/strong&gt; with the callbacks argument in of the .optimize() method. Just pass a list of callables that take study and trial as input and you are good to go.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;neptune_monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'run_score'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'run_parameters'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;neptune_monitor&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Because you can access both &lt;em&gt;study&lt;/em&gt; and &lt;em&gt;trial&lt;/em&gt; you have all the flexibility you can possibly want to checkpoint, do early stopping or modify future search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;There are no callbacks per se, but you can put your callback function inside the objective and it will be executed every time the objective is called.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monitor_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'run_score'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'run_parameters'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;monitor_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I don’t love it but I guess I can live with that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6 / 10&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Optuna makes it really easy with the callbacks argument while in Hyperopt you have to modify the objective.  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Callbacks&lt;/strong&gt;​​&lt;br&gt;
Optuna &amp;gt; Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to monitor your Optuna experiments and log all the charts, visualizations, and results you can &lt;a href="https://neptune-contrib.readthedocs.io/user_guide/monitoring/optuna.html"&gt;use Neptune helpers&lt;/a&gt;:&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;opt.utils.neptune_monitor&lt;/em&gt;: logs run scores and run parameters and plots the scores so far&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;opt_utils.log_study&lt;/em&gt;: logs best results, best param, and the study object itself
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just add this to your script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptune&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptunecontrib.monitoring.optuna&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;opt_utils&lt;/span&gt;

&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'jakub-czakon/blog-hpo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'optuna sweep'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;opt_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NeptuneMonitor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'maximize'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;opt_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/o5O13RE9_O8"&gt;
&lt;/iframe&gt;
&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Persisting and restarting
&lt;/h2&gt;

&lt;p&gt;Saving and loading your hyperparameter searches can save you time, money, and can help get better results. Let’s compare both frameworks on that.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;Simply use &lt;em&gt;joblib.dump&lt;/em&gt; to pickle the &lt;em&gt;trials&lt;/em&gt; object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'artifacts/study.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;… and you can load it later with joblib.load to restart your search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../artifacts/study.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;distributed setups&lt;/strong&gt;, you can use the &lt;strong&gt;name&lt;/strong&gt; of the study the &lt;strong&gt;URL to the database&lt;/strong&gt; where your distributed study is to instantiate new study. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'example-study'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                    &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'sqlite:///example.db'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                    &lt;span class="n"&gt;load_if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Nice and easy.&lt;/p&gt;

&lt;p&gt;More about running distributed hyperparameter optimization with Optuna in the Speed and Parallelization section.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;Similarly to Optuna use &lt;em&gt;joblib.dump&lt;/em&gt; to pickle the &lt;em&gt;trials&lt;/em&gt; object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trials&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  
&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fmin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;algo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tpe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_evals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'artifacts/hyperopt_trials.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;… load it with joblib.load and restart.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'artifacts/hyperopt_trials.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fmin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;algo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tpe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_evals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Simple and works with no problems.&lt;/p&gt;

&lt;p&gt;If you are optimizing hyperparameters in a &lt;strong&gt;distributed&lt;/strong&gt; fashion you can load &lt;em&gt;MongoTrials()&lt;/em&gt; object that connects to  MongoDB. More about running distributed hyperparameter optimization with Hyperopt in the Speed and Parallelization section.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Both make it easy and get the job done.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Persisting and restarting​&lt;/strong&gt;​​&lt;br&gt;
Optuna = Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Run Pruning
&lt;/h2&gt;

&lt;p&gt;Not all hyperparameter configurations are created equal. For some of them, you can tell very quickly that they will not produce high scores. Ideally, you would like to stop those runs as soon as possible try different parameters instead. &lt;/p&gt;

&lt;p&gt;Optuna gives you an option to do that with &lt;strong&gt;Pruning Callbacks&lt;/strong&gt;. Many machine learning frameworks are supported:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KerasPruningCallback, TFKerasPruningCallback&lt;/li&gt;
&lt;li&gt;TensorFlowPruningHook&lt;/li&gt;
&lt;li&gt;PyTorchIgnitePruningHandler, PyTorchLightningPruningCallback&lt;/li&gt;
&lt;li&gt;FastAIPruningCallback&lt;/li&gt;
&lt;li&gt;LightGBMPruningCallback&lt;/li&gt;
&lt;li&gt;XGBoostPruningCallback&lt;/li&gt;
&lt;li&gt;and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can read about them in the docs.&lt;/p&gt;

&lt;p&gt;For example, in the case of lightGBM training you would pass this callback to the lgb.train function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pruning_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;callbacks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pruning_callback&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pruning_callback&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_BOOST_ROUND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EARLY_STOPPING_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="n"&gt;pruning_callback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LightGBMPruningCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pruning_callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Only Optuna gives you this option so it is a clear win.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Run Pruning​​&lt;/strong&gt;​​&lt;br&gt;
Optuna &amp;gt; Hyperopt&lt;/p&gt;


&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Exceptions
&lt;/h2&gt;

&lt;p&gt;If one of your runs fails due to the wrong parameter combination, random training error or some other problem you could lose all the &lt;em&gt;parameter_configuration:score&lt;/em&gt; pairs evaluated so far in a study. &lt;/p&gt;

&lt;p&gt;You can use callbacks to save this information after every iteration or use a DB to store it as explained in the Speed and Parallelization&lt;br&gt;
 section.&lt;/p&gt;

&lt;p&gt;However, you may want to let this study continue even when the exception happens. To make it possible, Optuna lets you pass the allowed exceptions to the &lt;em&gt;.optimize()&lt;/em&gt; method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;non_existent_variable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'maximize'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;catch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;NameError&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Again, only Optuna supports this.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Handling Exceptions&lt;/strong&gt;​​&lt;br&gt;
Optuna &amp;gt; Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Documentation
&lt;/h2&gt;

&lt;p&gt;When you are a user of a library or a framework it is absolutely crucial to find the information you need when you need it. This is where documentation/support channels come into the picture and they can make or break a library.&lt;/p&gt;

&lt;p&gt;Let’s see how Optuna and Hyperopt compare on that.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;It is &lt;strong&gt;really good&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There is a &lt;a href="https://optuna.org/"&gt;proper webpage&lt;/a&gt; that explains all the basic concepts and shows you where to find more information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gKI3A7ds--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ml/wp-content/uploads/optuna_docs.png%3Fw%3D927%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gKI3A7ds--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i0.wp.com/neptune.ml/wp-content/uploads/optuna_docs.png%3Fw%3D927%26ssl%3D1" alt="optunadocs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, there is complete and very easy-to-understand &lt;a href="https://optuna.readthedocs.io/en/latest/tutorial/index.html"&gt;documentation on read-the-docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tutorials with both simple and advanced examples&lt;/li&gt;
&lt;li&gt;API Reference with all the functions containing beautiful docstrings. To give you an idea imagine having charts inside of your docstrings so that you can understand what is happening inside your function better. Check out the &lt;a href="https://optuna.readthedocs.io/en/latest/reference/samplers.html#optuna.samplers.BaseSampler"&gt;&lt;em&gt;BaseSampler&lt;/em&gt;&lt;/a&gt; if you don’t believe me.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is also important to mention that the supporting team from &lt;a href="https://preferred.jp/ja/"&gt;Preferred Networks&lt;/a&gt; really takes care of this project. They respond to Github issues and the community is growing around it with great feature ideas and PRs coming in. Checkout the &lt;a href="https://github.com/pfnet/optuna/issues"&gt;Github project issues section&lt;/a&gt; to see what is going on there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;It was recently updated and now it is quite &lt;strong&gt;alright&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can find it &lt;a href="http://hyperopt.github.io/hyperopt/scaleout/mongodb/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can easily find information about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how to get started&lt;/li&gt;
&lt;li&gt;how to define both simple and advances search spaces&lt;/li&gt;
&lt;li&gt;how to run the installation&lt;/li&gt;
&lt;li&gt;how to run Hyperopt in parallel via MongoDB or Spark&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unfortunately, there were some things that I didn’t like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing API reference with the docstrings all functions/methods&lt;/li&gt;
&lt;li&gt;docstrings themselves are missing for most of methods/functions which forces you to read the implementation (there are some positive side effects here:) )&lt;/li&gt;
&lt;li&gt;no examples of using Adaptive TPE. I wasn’t sure if I am using it correctly, whether I should specify some additional (hyper)hyper parameters. Missing docstrings didn’t help me here either.&lt;/li&gt;
&lt;li&gt;some links to 404 in the docs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, it has improved a lot lately, but I was still a bit lost at times. I hope that with time it will get even better so stay tuned. &lt;/p&gt;

&lt;p&gt;The good thing is, &lt;strong&gt;there are a lot of blog posts about it&lt;/strong&gt;. Some of them that I found useful are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://medium.com/district-data-labs/parameter-tuning-with-hyperopt-faa86acdfdce"&gt;“Parameter Tuning with Hyperopt”&lt;/a&gt; by District Data Labs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/vooban-ai/hyperopt-tutorial-for-optimizing-neural-networks-hyperparameters-e3102814b919"&gt;“Hyperopt tutorial for Optimizing Neural Networks Hyperparameters”&lt;/a&gt; by Vooban&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.goodaudience.com/on-using-hyperopt-advanced-machine-learning-a2dde2ccece7"&gt;“On Using Hyperopt: Advanced Machine Learning”&lt;/a&gt; by Tanay Agrawal&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://towardsdatascience.com/an-introductory-example-of-bayesian-optimization-in-python-with-hyperopt-aae40fff4ff0"&gt;“An Introductory Example of Bayesian Optimization in Python with Hyperopt”&lt;/a&gt; by Will Koehrsen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The documentation is &lt;strong&gt;not the strongest side&lt;/strong&gt; of this project but because it’s a classic there are a lot of resources out there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;​​&lt;br&gt;
Optuna &amp;gt; Hyperopt&lt;/p&gt;


&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizations
&lt;/h2&gt;

&lt;p&gt;Visualizing hyperparameter searches can be very useful. You can gain information on interactions between parameters and see where you should search next.&lt;/p&gt;

&lt;p&gt;That is why I want to compare visualization suits that Optuna and Hyperopt offer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;A few great visualizations are available in the &lt;em&gt;optuna.visualization&lt;/em&gt; module:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;plot_contour&lt;/strong&gt;: plots parameter interactions on an interactive chart. You can choose which hyperparameters you would like to explore.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plot_contour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/SEZtjnfibpU"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;plot_optimization_history&lt;/strong&gt;: shows the scores from all trials as well as the best score so far at each point.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plot_optimization_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/r9xaZ5DiMOg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;plot_parallel_coordinate&lt;/strong&gt;: interactively visualizes the hyperparameters and scores
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plot_parallel_coordinate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/CkaHQRWSoRw"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;plot_slice&lt;/strong&gt;: shows the &lt;strong&gt;evolution of the search&lt;/strong&gt;. You can see where in the hyperparameter space your search went and &lt;strong&gt;which parts of the space were explored more&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plot_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/Qz6nN0tBPZg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Overall, &lt;strong&gt;visualizations in Optuna are incredibile!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They let you zoom in on the hyperparameter interactions and help you decide on how to run your next parameter sweep. Amazing job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;There are three visualization functions in the &lt;em&gt;hyperopt.plotting&lt;/em&gt; module: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;main_plot_history&lt;/strong&gt;: shows you the results of each iteration and highlights the best score.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;main_plot_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AATfnpBG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/hyperopt_main_plot-history.png%3Fw%3D864%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AATfnpBG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/hyperopt_main_plot-history.png%3Fw%3D864%26ssl%3D1" alt="hyperopt_vis1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;main_plot_histogram&lt;/strong&gt;: shows you the histogram of results over all iterations.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;main_plot_histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mz9-ZcXQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/hyperopt_main_plot_historgram.png%3Fw%3D864%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mz9-ZcXQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/hyperopt_main_plot_historgram.png%3Fw%3D864%26ssl%3D1" alt="hyperopt main_plot_histogram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;main_plot_vars&lt;/strong&gt;: I don’t really know what it does as I couldn’t get it to run and there were no docstrings nor examples (again, the documentation is far from perfect).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Summing up, there are some basic visualization utilities but they &lt;strong&gt;are not super useful&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;I am very impressed by the visualizations available in Optuna. Useful, interactive, and beautiful. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Visualizations&lt;/strong&gt;​​&lt;br&gt;
Optuna &amp;gt; Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to play with those visualizations you can use the study object that I saved as ‘study.pkl’ for each experiment. &lt;/p&gt;

&lt;p&gt;For example &lt;a href="https://ui.neptune.ml/jakub-czakon/blog-hpo/e/BLOG-404/artifacts?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-optuna-vs-hyperopt&amp;amp;utm_content=explore-experiment"&gt;go to artifacts of this one&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Speed and Parallelization
&lt;/h2&gt;

&lt;p&gt;When it comes to hyperparameter optimization, being able to distribute your training on your machine or many machines (cluster) can be crucial. &lt;/p&gt;

&lt;p&gt;That is why, I checked the distributed training options for both Optuna and Hyperopt.&lt;/p&gt;

&lt;h4&gt;
  
  
  Optuna
&lt;/h4&gt;

&lt;p&gt;You can run distributed hyperparameter optimization on one machine or a cluster of machines and it is actually really simple.&lt;/p&gt;

&lt;p&gt;For one machine you simply change the n_jobs parameter in your .optimize()method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;To run it on a cluster you need to do is &lt;strong&gt;create a study that resides in a database&lt;/strong&gt; (&lt;a href="https://optuna.readthedocs.io/en/latest/tutorial/distributed.html"&gt;you can choose among many Relational DBs&lt;/a&gt;). &lt;/p&gt;

&lt;p&gt;There are two options to do that. You can do it via command-line interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;optuna create-study &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--study-name&lt;/span&gt; &lt;span class="s2"&gt;"distributed-example"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--storage&lt;/span&gt; &lt;span class="s2"&gt;"sqlite:///example.db"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You can also create a study in your optimization script.&lt;/p&gt;

&lt;p&gt;By using load_if_exists=True you can treat your master script and worker scripts in the same way which simplifies things a lot!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'distributed-example'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'sqlite:///example.db'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Finally, you can run your worker scripts from many machines and they will all use the same information from the study database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;terminal&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;run_worker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;span class="n"&gt;terminal&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;run_worker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Easy and works like a charm!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Hyperopt
&lt;/h4&gt;

&lt;p&gt;You can distribute your computation over a cluster of machines. Good, step-by-step instructions can be found in this &lt;a href="https://blog.goodaudience.com/on-using-hyperopt-advanced-machine-learning-a2dde2ccece7"&gt;blog post by Tanay Agrawal&lt;/a&gt; but in a nutshell, you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start a server with MongoDB&lt;/strong&gt; on it which will consume results from your worker training scripts and send out the next parameter set to try,&lt;/li&gt;
&lt;li&gt;In your training script, instead of &lt;em&gt;Trials()&lt;/em&gt; create a &lt;em&gt;MongoTrials()&lt;/em&gt; object pointing to the database server you have started in the previous step,&lt;/li&gt;
&lt;li&gt;Move your objective function to a separate objective.py script and rename it to function,&lt;/li&gt;
&lt;li&gt;Compile your Python training script,&lt;/li&gt;
&lt;li&gt;Run &lt;strong&gt;hyperopt-mongo-worker&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Though it gets the job done it doesn’t feel quite perfect. You need to do some juggling around the objective function, and starting MongoDB could have been provided in the CLI to makes things easier.&lt;/p&gt;

&lt;p&gt;It is also important to mention that &lt;strong&gt;integration with Spark&lt;/strong&gt; via SparkTrials object was recently added. There is a &lt;a href="http://hyperopt.github.io/hyperopt/scaleout/spark/"&gt;step by step guide&lt;/a&gt; to help you get started and you can even use the &lt;a href="https://github.com/hyperopt/hyperopt/blob/master/download_spark_dependencies.sh"&gt;spark-installation script&lt;/a&gt; to makes things easier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hyperopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fmin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;space&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;search_space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;algo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hyperopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tpe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;max_evals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;trials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hyperopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SparkTrials&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Works exactly the way you would expect it to work.&lt;/p&gt;

&lt;p&gt;Nice and simple!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9 / 10&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Both libraries support distributed training which is great. However, Optuna does a bit better job with simpler, more user-friendly interface.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Speed and Parallelization&lt;/strong&gt;​​&lt;br&gt;
Optuna = Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Experimental results*
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Just to be clear those are the &lt;strong&gt;results on just one example problem&lt;/strong&gt; and &lt;strong&gt;one run per lib/configuration&lt;/strong&gt; and they do not guarantee generalization. To run a proper benchmark, you would run it multiple times on various datasets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That being said, as a practitioner, I would hope to see some improvements over the random search for each problem. Otherwise, why bother with an HPO library?&lt;/p&gt;

&lt;p&gt;Ok, so as an example let’s tweak the hyperparameters of the &lt;strong&gt;lightGBM&lt;/strong&gt; model on a tabular, &lt;strong&gt;binary classification&lt;/strong&gt; problem. If you want to use the same dataset as I did you should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;download it from kaggle&lt;/li&gt;
&lt;li&gt;use the &lt;strong&gt;first 10000 rows&lt;/strong&gt; from the &lt;em&gt;train.csv&lt;/em&gt; file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To make the training quick I fixed the &lt;strong&gt;number of boosting rounds to 300 with a 30 round early stopping&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;NUM_BOOST_ROUND&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;EARLY_STOPPING_ROUNDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                          &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                          &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_BOOST_ROUND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EARLY_STOPPING_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
                      &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;All the training and evaluation logic is put inside the train_evaluate function. We can &lt;strong&gt;treat it as a black box&lt;/strong&gt; that takes the data and hyperparameter set and produces the AUC evaluation score.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can actually turn every script that takes parameters as inputs and outputs the score into such train_evaluate. Once that is done you can treat it as black box and tune your parameters. &lt;/p&gt;

&lt;p&gt;I show how to do that step-by-step in a different post &lt;a href="https://neptune.ml/blog/hyperparameter-tuning-on-any-python-script?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-optuna-vs-hyperopt&amp;amp;utm_content=other-neptune-post"&gt;“How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps​”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--E5osEzZ9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/hyperparametrization-1.png%3Fw%3D1072%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--E5osEzZ9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/hyperparametrization-1.png%3Fw%3D1072%26ssl%3D1" alt="otherhpopost"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;To train a model on a set of parameters you need to run something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;N_ROWS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'/mnt/ml-team/minerva/open-solutions/santander/data/train.csv'&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;N_ROWS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ID_code'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'boosting'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'gbdt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'num_threads'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Validation AUC: {}'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;For this study, I tried to find the best parameters within &lt;strong&gt;100 run budget&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I ran 6 experiments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random search (from hyperopt) as a reference &lt;/li&gt;
&lt;li&gt;Tree of Parzen Estimator search strategies for both Optuna and Hyperopt&lt;/li&gt;
&lt;li&gt;Adaptive TPE from Hyperopt&lt;/li&gt;
&lt;li&gt;TPE from Optuna with a pruning callback for more runs but within the same time frame. It turns out that 400 runs with pruning takes as much time as 100 runs without it. &lt;/li&gt;
&lt;li&gt;Optuna with Random Forest surrogate model from skopt.Sampler&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You may want to scroll down to the Example Script at the end.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kw-u2aVF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/optuna_vs_dashboard.png%3Fw%3D758%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kw-u2aVF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/optuna_vs_dashboard.png%3Fw%3D758%26ssl%3D1" alt="npt dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to explore all of those experiments in more detail you can simply go to the &lt;a href="https://ui.neptune.ml/jakub-czakon/blog-hpo/experiments?filterId=21c55071-3b32-45e2-aed6-2420e798a5f4&amp;amp;viewId=25f1c505-2dcd-4320-8cb9-94210cc5f580&amp;amp;utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-optuna-vs-hyperopt&amp;amp;utm_content=explore-dashboard"&gt;experiment dashboard&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://neptune.ml/register?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-optuna-vs-hyperopt&amp;amp;utm_content=register"&gt;Register for the free tool for experiment tracking and management&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Both Optuna and Hyperopt improved over the random search&lt;/strong&gt; which is good.&lt;/p&gt;

&lt;p&gt;TPE implementation from &lt;strong&gt;Optuna was slightly better than Hyperopt’s&lt;/strong&gt; Adaptive TPE but not by much. On the other hand, when running hyperparameter optimization, those small improvements are exactly what you are going for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is interesting is that TPE implementation from HPO and Optuna give vastly different results on this problem&lt;/strong&gt;. Maybe the cutoff point between good and bad parameter configurations λ is chosen differently or sampling methods have defaults that work better for this particular problem. &lt;/p&gt;

&lt;p&gt;Moreover, &lt;strong&gt;using pruning decreased training time by 4x&lt;/strong&gt;. I could run 400 searches in the time that runs 100 without pruning. On the flip side, &lt;strong&gt;using pruning got a lower score&lt;/strong&gt;. It may be different for your problem but it is important to consider that when making a decision whether to use pruning or not.&lt;/p&gt;

&lt;p&gt;For this section, I assigned points based on the improvements over the random search strategy.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hyperopt&lt;/strong&gt; got (0.850 – 0.844)&lt;em&gt;100 = **6&lt;/em&gt;*&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optuna&lt;/strong&gt; got (0.854 – 0.844)&lt;em&gt;100 = **10&lt;/em&gt;*&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Experimental results&lt;/strong&gt;&lt;br&gt;
Optuna = Hyperopt&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Jump back to the evaluation criteria -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Let’s take a look at the overall scores:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4jeyLwX9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/optuna_vs_hyperopt_conclusion-1.png%3Fw%3D661%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4jeyLwX9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i1.wp.com/neptune.ml/wp-content/uploads/optuna_vs_hyperopt_conclusion-1.png%3Fw%3D661%26ssl%3D1" alt="comparison table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even if you look at it generously and consider only the features that both libraries share, &lt;strong&gt;Optuna is a better framework&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It is on-par or slightly better on all criteria and: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it has better documentation&lt;/li&gt;
&lt;li&gt;it has way better visualization suite&lt;/li&gt;
&lt;li&gt;it has some features like pruning, callbacks, and exception handling that hyperopt doesn’t support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After doing all this research I am convinced that &lt;strong&gt;Optuna is a great library&lt;/strong&gt; for hyperparameter optimization.&lt;/p&gt;

&lt;p&gt;Moreover, I think that &lt;strong&gt;you should strongly consider switching from Hyperopt&lt;/strong&gt; if you were using that in the past.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example script
&lt;/h2&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptune&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptunecontrib.monitoring.optuna&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;opt_utils&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;optuna&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;N_ROWS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'../data/train.csv'&lt;/span&gt;
&lt;span class="n"&gt;NUM_BOOST_ROUND&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;EARLY_STOPPING_ROUNDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
&lt;span class="n"&gt;STATIC_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'boosting'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'gbdt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;N_TRIALS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;N_ROWS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ID_code'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_BOOST_ROUND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EARLY_STOPPING_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'min_data_in_leaf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggest_uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="n"&gt;all_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;STATIC_PARAMS&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'jakub-czakon/blog-hpo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'optuna sweep'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;opt_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NeptuneMonitor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'maximize'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;N_TRIALS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;opt_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;






</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>hyperparameteroptimization</category>
    </item>
    <item>
      <title>24 Evaluation Metrics for Binary Classification (And When to Use Them)</title>
      <dc:creator>Jakub Czakon</dc:creator>
      <pubDate>Fri, 20 Dec 2019 20:57:10 +0000</pubDate>
      <link>https://dev.to/jakubczakon/24-evaluation-metrics-for-binary-classification-and-when-to-use-them-4042</link>
      <guid>https://dev.to/jakubczakon/24-evaluation-metrics-for-binary-classification-and-when-to-use-them-4042</guid>
      <description>&lt;p&gt;This article was originally &lt;a href="https://neptune.ml/blog/evaluation-metrics-binary-classification?utm_source=devto&amp;amp;utm_medium=cross_posting&amp;amp;utm_campaign=evaluation_metrics_for_binary_classification&amp;amp;utm_content=originally_posted_on" rel="noopener noreferrer"&gt;posted on neptune.ml/blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners. &lt;/p&gt;




&lt;p&gt;Not sure which evaluation metric you should choose for your binary classification problem? After reading this blog post you should have a good idea.&lt;/p&gt;

&lt;p&gt;You will learn about a bunch of common and lesser-known evaluation metrics and charts to &lt;strong&gt;understand how to choose&lt;/strong&gt; the model performance &lt;strong&gt;metric for your problem&lt;/strong&gt;. Specifically, for each metric, I will talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is the &lt;strong&gt;definition&lt;/strong&gt; and &lt;strong&gt;intuition&lt;/strong&gt; behind it,&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;non-technical explanation&lt;/strong&gt; that you can communicate to business stakeholders,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How to calculate or plot it&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When&lt;/strong&gt; should you &lt;strong&gt;use it&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With that, you will understand the trade-offs so that making metric related decisions will be easier.&lt;/p&gt;

&lt;p&gt;I will present all the good stuff in a moment, but first, let’s define our classification problem.&lt;/p&gt;

&lt;h1&gt;
  
  
  Before we start: problem definition
&lt;/h1&gt;

&lt;p&gt;You will be using those evaluation metrics in the context of a project, so I prepared an example fraud-detection problem based on a recent &lt;a href="https://www.kaggle.com/c/ieee-fraud-detection/overview" rel="noopener noreferrer"&gt;kaggle competiton&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;I selected &lt;strong&gt;43 features&lt;/strong&gt; and sampled &lt;strong&gt;66000 observations&lt;/strong&gt; from the original dataset adjusting the &lt;strong&gt;fraction of positive class to 0.09&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Then I trained a bunch of lightGBM classifiers with different hyperparameters. I only used &lt;em&gt;learning_rate&lt;/em&gt; and &lt;em&gt;n_estimators&lt;/em&gt; parameters because I wanted to have an intuition as to which models are “truly” better. Specifically, I suspect that the model with only 10 trees is worse than a model with 100 trees. Of course, as use more trees and smaller learning rates, it gets tricky but I think it is a decent proxy.&lt;/p&gt;

&lt;p&gt;So for combinations of &lt;em&gt;learning_rate&lt;/em&gt; and &lt;em&gt;n_estimators&lt;/em&gt;, I did the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;defined hyperparameter values:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;random_state&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;predicted on test data:log_binary_classification_metrics(y_test, y_test_pred)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;predicted on test data:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;y_test_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;logged all the metrics for each run:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;log_binary_classification_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For full code base &lt;a href="https://github.com/neptune-ml/blog-binary-classification-metrics" rel="noopener noreferrer"&gt;go to this repository&lt;/a&gt; or scroll down to the example script. &lt;/p&gt;

&lt;p&gt;You can also &lt;a href="https://ui.neptune.ml/neptune-ml/binary-classification-metrics/experiments?filterId=20f71748-85ad-499d-a72e-68962bcd36a0&amp;amp;utm_source=devto&amp;amp;utm_medium=cross_posting&amp;amp;utm_campaign=evaluation_metrics_for_binary_classification&amp;amp;utm_content=explore_dashboard&amp;amp;utm_term=experiment_compare" rel="noopener noreferrer"&gt;explore experiment runs&lt;/a&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;evaluation metrics&lt;/li&gt;
&lt;li&gt;performance charts&lt;/li&gt;
&lt;li&gt;metric by threshold plots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fexperiment_dashboard.png%3Fw%3D958%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fexperiment_dashboard.png%3Fw%3D958%26ssl%3D1" alt="experiment dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ok, now we are ready to talk about those classification metrics!&lt;/p&gt;

&lt;h1&gt;
  
  
  Learn about the following evaluation metrics &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Confusion Martix&lt;/li&gt;
&lt;li&gt;False positive rate | Type-I error&lt;/li&gt;
&lt;li&gt;False negative rate | Type-II error&lt;/li&gt;
&lt;li&gt;True negative rate | Specificity&lt;/li&gt;
&lt;li&gt;Negative predictive value&lt;/li&gt;
&lt;li&gt;False discovery rate&lt;/li&gt;
&lt;li&gt;True positive rate | Recall | Sensitivity&lt;/li&gt;
&lt;li&gt;Positive predictive value | Precision&lt;/li&gt;
&lt;li&gt;Accuracy&lt;/li&gt;
&lt;li&gt;F beta score&lt;/li&gt;
&lt;li&gt;F1 score&lt;/li&gt;
&lt;li&gt;F2 score&lt;/li&gt;
&lt;li&gt;Cohen Kappa&lt;/li&gt;
&lt;li&gt;Matthews correlation coefficient | MCC&lt;/li&gt;
&lt;li&gt;ROC curve&lt;/li&gt;
&lt;li&gt;ROC AUC score&lt;/li&gt;
&lt;li&gt;Precision-Recall curve&lt;/li&gt;
&lt;li&gt;PR AUC | Average precision&lt;/li&gt;
&lt;li&gt;Log loss&lt;/li&gt;
&lt;li&gt;Brier score&lt;/li&gt;
&lt;li&gt;Cumulative gain chart&lt;/li&gt;
&lt;li&gt;Lift curve | Lift chart&lt;/li&gt;
&lt;li&gt;Kolmogorov-Smirnov plot&lt;/li&gt;
&lt;li&gt;Kolmogorov Smirnov statistics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I know it is a lot to go over at once. That is why you can jump to the section that is interesting to you and read just that.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Confusion Matrix &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;

&lt;p&gt;It is a common way of presenting true positive (tp), true negative (tn), false positive (fp) and false negative (fn) predictions. Those values are presented in the form of a matrix where the Y-axis shows the true classes while the X-axis shows the predicted classes.&lt;/p&gt;

&lt;p&gt;It is calculated on class predictions, which means the outputs from your model need to be thresholded first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;cm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does it look:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fconf_matrix.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fconf_matrix.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="confusion matrix"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So in this example, we can see that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;11918&lt;/em&gt; predictions were &lt;em&gt;true negatives&lt;/em&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;872&lt;/em&gt; were &lt;em&gt;true positives&lt;/em&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;82&lt;/em&gt; were &lt;em&gt;false positives&lt;/em&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;333&lt;/em&gt; predictions were &lt;em&gt;false negatives&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also, as we already know, this is an imbalanced problem. By the way, if you want to read more about imbalanced problems I recommend taking a look at this &lt;a href="https://www.svds.com/learning-imbalanced-classes/" rel="noopener noreferrer"&gt;article by Tom Fawcett&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pretty much always. I like to see the nominal values rather than normalized to get a feeling on how the model is doing on different, often imbalanced, classes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  2. False Positive Rate | Type I error &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;When we predict something when it isn’t we are contributing to the false positive rate. You can think of it as a &lt;strong&gt;fraction of false alerts&lt;/strong&gt; that will be raised based on your model predictions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffpr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D257%252C86%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffpr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D257%252C86%26ssl%3D1" alt="false positive rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;false_positive_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffpr.png%3Fw%3D552%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffpr.png%3Fw%3D552%26ssl%3D1" alt="false positive rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For all the models type-1 error alerts are pretty low but by adjusting the threshold we can get an even lower ratio. Since we have true negatives in the denominator, our error will tend to be low just because the dataset is imbalanced.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffpr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffpr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="false positive rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Obviously, if we increase the threshold only higher scored observations will be classified as positive. In our example, we can see that to reach perfect FPR of 0 we need to increase the threshold to 0.83. However, that will likely mean only very few predictions classified.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You rarely would use this metric alone. Usually as an auxiliary one with some other metric,&lt;/li&gt;
&lt;li&gt;If the &lt;strong&gt;cost of dealing with an alert is high&lt;/strong&gt; you should consider increasing the threshold to get fewer alerts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  3. False Negative Rate | Type II error &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;When we don’t predict something when it is, we are contributing to the false negative rate. You can think of it as a &lt;strong&gt;fraction of missed fraudulent transactions&lt;/strong&gt; that your model lets through.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffnr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D262%252C86%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffnr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D262%252C86%26ssl%3D1" alt="false negative rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;false_negative_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffnr.png%3Fw%3D547%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffnr.png%3Fw%3D547%26ssl%3D1" alt="false negative rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that in our example, type-2 errors are quite a bit higher then type-1 errors. Interestingly our &lt;a href="https://ui.neptune.ml/neptune-ml/binary-classification-metrics/e/BIN-98/logs?utm_source=devto&amp;amp;utm_medium=cross_posting&amp;amp;utm_campaign=evaluation_metrics_for_binary_classification&amp;amp;utm_content=explore_experiment" rel="noopener noreferrer"&gt;BIN-98 experiment&lt;/a&gt; that had the lowest type-1 error has the highest type-2 error. There is a simple explanation based on the fact that our dataset is imbalanced and with type-2 error we don’t have true negatives in the denominator.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffnr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffnr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="false negative rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we decrease the threshold, more observations will be classified as positive. At a certain threshold, we will mark everything as positive (fraudulent for example). We can actually get to the FNR of 0.083 by decreasing the threshold to 0.01.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use it:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Usually, it is not used alone but rather with some other metric,&lt;/li&gt;
&lt;li&gt;If the cost of letting the fraudulent transactions through is high and the value you get from the users isn’t you can consider focusing on this number.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;True Negative Rate | Specificity &lt;a&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It measures how many observations out of all negative observations have we classified as negative. In our fraud detection example, it tells us how many transactiohttps://i1.wp.com/neptune.ml/wp-content/uploads/cohen_kappa_eq.png?zoom=1.100000023841858&amp;amp;fit=184%2C76&amp;amp;ssl=1ns, out of all non-fraudulent transactions, we marked as clean.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftnr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D260%252C84%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftnr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D260%252C84%26ssl%3D1" alt="true negative rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;true_negative_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tn&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftnr.png%3Fw%3D543%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftnr.png%3Fw%3D543%26ssl%3D1" alt="true negative rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Very high specificity for all the models. If you think about it, in our imbalanced problem you would expect that. Classifying negative cases as negative is a lot easier than classifying positive cases and hence the score is high.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftnr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftnr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="true negative rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The higher the threshold the more observations are truly negative observations we can recall. We can see that starting from say threshold=0.4 our model is doing really well in classifying negative cases as negative.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Usually, you don’t use it alone but rather as an auxiliary metric,&lt;/li&gt;
&lt;li&gt;When you really want to be sure that you are right when you say something is safe. A typical example would be a doctor telling a patient “you are healthy”. Making a mistake here and telling a sick person they are safe and can go home is something you may want to avoid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  5. Negative Predictive Value &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It measures how many predictions out of all negative predictions were correct. You can think of it as precision for negative class. With our example, it tells us what is the fraction of correctly predicted clean transactions in all non-fraudulent predictions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpv_eq.png%3Fzoom%3D1.100000023841858%26fit%3D267%252C84%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpv_eq.png%3Fzoom%3D1.100000023841858%26fit%3D267%252C84%26ssl%3D1" alt="negative predictive value"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;negative_predictive_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpv.png%3Fw%3D598%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpv.png%3Fw%3D598%26ssl%3D1" alt="negative predictive value"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All models score really high and no wonder, since with an imbalanced problem it is easy to predict negative class.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpv_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpv_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="negative predictive value"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The higher the threshold the more cases are classified as negative and the score goes down. However, in our imbalanced example even at a very high threshold, the negative predictive value is still good.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When we care about high precision on negative predictions. For example, imagine we really don’t want to have any additional process for screening the transactions predicted as clean. In that case, we may want to make sure that our negative predictive value is high.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  6. False Discovery Rate &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It measures how many predictions out of all positive predictions were incorrect. You can think of it as simply 1-precision. With our example, it tells us what is the fraction of incorrectly predicted fraudulent transactions in all fraudulent predictions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffdr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D257%252C84%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffdr_eq.png%3Fzoom%3D1.100000023841858%26fit%3D257%252C84%26ssl%3D1" alt="false discovery rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;false_discovery_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffdr.png%3Fw%3D569%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffdr.png%3Fw%3D569%26ssl%3D1" alt="false discovery rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The “best model” is incredibly shallow lightGBM which we expect to be incorrect (deeper model should work better).&lt;/p&gt;

&lt;p&gt;That is an important takeaway, looking at precision (or recall) alone can lead to you selecting a suboptimal model.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffdr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffdr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="false discovery rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The higher the threshold, the less positive predictions. The less positive predictions, the ones that are classified as positive have higher certainty scores. Hence, the false discovery rate goes down.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Again, it usually doesn’t make sense to use it alone but rather coupled with other metrics like recall.&lt;/li&gt;
&lt;li&gt;When raising false alerts is costly and when you want all the positive predictions to be worth looking at you should optimize for precision.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  7. True Positive Rate | Recall | Sensitivity
&lt;/h1&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It measures how many observations out of all positive observations have we classified as positive. It tells us how many fraudulent transactions we recalled from all fraudulent transactions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftpr_Eq.png%3Fzoom%3D1.100000023841858%26fit%3D256%252C84%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftpr_Eq.png%3Fzoom%3D1.100000023841858%26fit%3D256%252C84%26ssl%3D1" alt="true positive rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you are optimizing recall you want to &lt;strong&gt;put all guilty in prison&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recall_score&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;true_positive_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# or simply
&lt;/span&gt;
&lt;span class="nf"&gt;recall_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftpr.png%3Fw%3D571%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftpr.png%3Fw%3D571%26ssl%3D1" alt="true positive rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our best model can recall 0.72 fraudulent transactions at the threshold 0.5. the difference in recall between our models is quite significant and we can clearly see better and worse models. Of course, for every model, we can adjust the threshold to recall all fraudulent transactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftpr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ftpr_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="true positive rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the threshold of 0.1, we classify the vast majority of transactions as fraudulent and hence get really high recall of 0.917. As the threshold increases the recall falls.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Usually, you will not use it alone but rather coupled with other metrics like precision.,&lt;/li&gt;
&lt;li&gt;That being said, recall is a go-to metric, when you really care about catching all fraudulent transactions even at a cost of false alerts. Potentially it is cheap for you to process those alerts and very expensive when the transaction goes unseen.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  8. Positive Predictive Value | Precision &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It measures how many observations predicted as positive are in fact positive. Taking our fraud detection example, it tells us what is the ratio of transactions correctly classified as fraudulent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fppv_eq.png%3Fzoom%3D1.100000023841858%26fit%3D255%252C84%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fppv_eq.png%3Fzoom%3D1.100000023841858%26fit%3D255%252C84%26ssl%3D1" alt="positive predictive value"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you are optimizing precision you want to make sure that &lt;strong&gt;people that you put in prison are guilty&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision_score&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;positive_predictive_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# or simply
&lt;/span&gt;
&lt;span class="nf"&gt;precision_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fppv.png%3Fw%3D597%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fppv.png%3Fw%3D597%26ssl%3D1" alt="positive predictive value"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It seems like all the models have pretty high precision at this threshold. The “best model” is incredibly shallow lightGBM which obviously smells fishy. That is an important takeaway, looking at precision (or recall) alone can lead to you selecting a suboptimal model.&lt;/p&gt;

&lt;p&gt;Of course, for every model, we can adjust the threshold to increase precision. That is because if we take a small fraction of high scoring predictions the precision on those will likely be high.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fppv_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fppv_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="positive predictive value"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The higher the threshold the better the precision and with a threshold of 0.68 we can actually get a perfectly precise model. Over this threshold, the model doesn’t classify anything as positive and so we don’t plot it.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Again, it usually doesn’t make sense to use it alone but rather coupled with other metrics like recall.&lt;/li&gt;
&lt;li&gt;When raising false alerts is costly when you want all the positive predictions to be worth looking at you should optimize for precision.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  9. Accuracy &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It measures how many observations, both positive and negative, were correctly classified.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Facc_eq.png%3Fw%3D422%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Facc_eq.png%3Fw%3D422%26ssl%3D1" alt="accuracy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You &lt;strong&gt;shouldn’t use accuracy on imbalanced problems&lt;/strong&gt;. Then, it is easy to get a high accuracy score by simply classifying all observations as the majority class. For example in our case, by classifying all transactions as non-fraudulent we can get an accuracy of over 0.9.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# or simply
&lt;/span&gt;
&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Facc.png%3Fw%3D504%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Facc.png%3Fw%3D504%26ssl%3D1" alt="accuracy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that for all the models we beat the dummy model (all clean transactions) by a large margin. Also, the models that we’d expect to be better are in fact at the top.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Facc_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Facc_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="accuracy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With accuracy, you can really use charts like the one above to determine the optimal threshold. In this case, choosing something a bit over standard 0.5 could bump the score by a tiny bit 0.9686-&amp;gt;0.9688.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When your problem is balanced using accuracy is usually a good start. An additional benefit is that it is really easy to explain it to non-technical stakeholders in your project,&lt;/li&gt;
&lt;li&gt;When every class is equally important to you.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  10. F beta score &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Simply put, it combines precision and recall into one metric. The higher the score the better our model is. You can calculate it in the following way:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffbeta_eq.png%3Fw%3D604%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ffbeta_eq.png%3Fw%3D604%26ssl%3D1" alt="f beta score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When choosing beta in your F-beta score &lt;strong&gt;the more you care about recall&lt;/strong&gt; over precision &lt;strong&gt;the higher beta&lt;/strong&gt; you should choose. For example, with F1 score we care equally about recall and precision with F2 score, recall is twice as important to us.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff_by_beta.png%3Fw%3D933%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff_by_beta.png%3Fw%3D933%26ssl%3D1" alt="f beta score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With 01 our optimal threshold moves toward lower thresholds and with beta=1 it is somewhere in the middle.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fbeta_score&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="nf"&gt;fbeta_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  11. F1 score (beta=1) &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It’s the harmonic mean between precision and recall.&lt;/p&gt;

&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff1.png%3Fw%3D501%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff1.png%3Fw%3D501%26ssl%3D1" alt="f1 score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see combining precision and recall gave us a more realistic view of our models. We get 0.808 for the best one and a lot of room for improvement.&lt;/p&gt;

&lt;p&gt;What is good is that it seems to be ranking our models correctly with those larger lightGBMs at the top.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff1_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff1_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="f1 score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can &lt;strong&gt;adjust the threshold to optimize F1 score&lt;/strong&gt;. Notice that for both precision and recall you could get perfect scores by increasing or decreasing the threshold. Good thing is, &lt;strong&gt;you can find a sweet spot&lt;/strong&gt; for F1 metric. As you can see, getting the threshold just right can actually improve your score by a bit 0.8077-&amp;gt;0.8121.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pretty much in every binary classification problem. It is my go-to metric when working on those problems. It can be easily explained to business stakeholders.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  12. F2 score (beta=2) &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It’s a metric that combines precision and recall, putting &lt;strong&gt;2x emphasis on recall&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff2-1.png%3Fw%3D505%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff2-1.png%3Fw%3D505%26ssl%3D1" alt="f2 score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This score is even lower for all the models than F1 but can be increased by adjusting the threshold considerably.&lt;br&gt;
Again, it seems to be ranking our models correctly, at least in this simple example.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff2_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Ff2_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="f2 score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that with a lower threshold and therefore more true positives recalled we get a higher score. You can usually &lt;strong&gt;find a sweet spot&lt;/strong&gt; for the threshold. Possible gain from 0.755 -&amp;gt; 0.803 show how &lt;strong&gt;important&lt;/strong&gt; threshold adjustments can be here.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;I’d consider using it when recalling positive observations (fraudulent transactions) is more important than being precise about it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  13. Cohen Kappa Metric &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;In simple words, Cohen Kappa tells you how much better is your model over the random classifier that predicts based on class frequencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fcohen_kappa_eq.png%3Fzoom%3D1.100000023841858%26fit%3D184%252C76%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fcohen_kappa_eq.png%3Fzoom%3D1.100000023841858%26fit%3D184%252C76%26ssl%3D1" alt="cohen kappa metric"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To calculate it one needs to calculate two things: &lt;strong&gt;“observed agreement” (po)&lt;/strong&gt; and &lt;strong&gt;“expected agreement” (pe)&lt;/strong&gt;. Observed agreement (po) is simply how our classifier predictions agree with the ground truth, which means it is just accuracy. The expected agreement (pe) is how the predictions of the &lt;strong&gt;random classifier that samples according to class frequencies&lt;/strong&gt; agree with the ground truth, or accuracy of the random classifier.&lt;/p&gt;

&lt;p&gt;From an interpretation standpoint, I like that it extends something very easy to explain (accuracy) to situations where your dataset is imbalanced by incorporating a baseline (dummy) classifier.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohen_kappa_score&lt;/span&gt;

&lt;span class="nf"&gt;cohen_kappa_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fkappa.png%3Fw%3D529%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fkappa.png%3Fw%3D529%26ssl%3D1" alt="cohen kappa metric"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can easily distinguish the worst/best models based on this metric. Also, we can see that there is still a lot of room to improve our best model.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fkappa_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fkappa_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="cohen kappa metric"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the chart just like the one above we can find a threshold that optimizes cohen kappa. In this case, it is at 0.31 giving us some improvement 0.7909 -&amp;gt; 0.7947 from the standard 0.5.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;This metric is not used heavily in the context of classification. Yet it can work really well for imbalanced problems and seems like a great companion/alternative to accuracy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  14. Matthews Correlation Coefficient | MCC &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It’s a correlation between predicted classes and ground truth. It can be calculated based on values from the confusion matrix:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fmcc_eq.png%3Fw%3D738%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fmcc_eq.png%3Fw%3D738%26ssl%3D1" alt="matthews correlation coefficient"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, you could also calculate the correlation between y_true and y_pred.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matthews_corrcoef&lt;/span&gt;

&lt;span class="n"&gt;y_pred_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;span class="nf"&gt;matthews_corrcoef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric (threshold=0.5):
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fmcc.png%3Fw%3D564%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fmcc.png%3Fw%3D564%26ssl%3D1" alt="matthews correlation coefficient"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can clearly see improvements in our model quality and a lot of room to grow, which I really like. Also, it ranks our models reasonably and puts models that you’d expect to be better on top. Of course, MCC depends on the threshold that we choose.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it depend on the threshold:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fmcc_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fmcc_by_thres.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="matthews correlation coefficient"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can adjust the threshold to optimize MCC. In our case, the best score is at 0.53 but what I really like is that it is not super sensitive to threshold changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When working on imbalanced problems,&lt;/li&gt;
&lt;li&gt;When you want to have something easily interpretable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  15. ROC Curve &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It is a chart that visualizes the tradeoff between true positive rate (TPR) and false positive rate (FPR). Basically, for every threshold, we calculate TPR and FPR and plot it on one chart.&lt;/p&gt;

&lt;p&gt;Of course, the higher TPR and the lower FPR is for each threshold the better and so classifiers that have curves that are more top-left side are better.&lt;/p&gt;

&lt;p&gt;Extensive discussion of ROC Curve and ROC AUC score can be found in this &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.9777&amp;amp;rep=rep1&amp;amp;type=pdf" rel="noopener noreferrer"&gt;article by Tom Fawcett&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_roc&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;plot_roc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does it look:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Froc_auc_curve.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Froc_auc_curve.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="roc curve"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see a healthy ROC curve, pushed towards the top-left side both for positive and negative class. It is not clear which one performs better across the board as with FPR &amp;lt; ~0.15 positive class is higher and starting from FPR~0.15 the negative class is above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  16. ROC AUC score &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;In order to get one number that tells us how good our curve is, we can calculate the Area Under the ROC Curve, or ROC AUC score. The more top-left your curve is the higher the area and hence higher ROC AUC score.&lt;/p&gt;

&lt;p&gt;Alternatively, &lt;a href="https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Area-under-curve_(AUC)_statistic_for_ROC_curves" rel="noopener noreferrer"&gt;it can be shown that ROC AUC score is equivalent to calculating the rank correlation&lt;/a&gt; between predictions and targets. From an interpretation standpoint, it is more useful because it tells us that this metric shows &lt;strong&gt;how good at ranking predictions your model is&lt;/strong&gt;. It tells you what is the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;roc_auc_score&lt;/span&gt;

&lt;span class="n"&gt;roc_auc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;roc_auc_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Froc_auc_score.png%3Fw%3D500%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Froc_auc_score.png%3Fw%3D500%26ssl%3D1" alt="roc auc score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see improvements and the models that one would guess to be better are indeed scoring higher. Also, the score is independent of the threshold which comes in handy.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You &lt;strong&gt;should use it&lt;/strong&gt; when you ultimately &lt;strong&gt;care about ranking predictions&lt;/strong&gt; and not necessarily about outputting well-calibrated probabilities (read this &lt;a href="https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/" rel="noopener noreferrer"&gt;article by Jason Brownlee&lt;/a&gt; if you want to learn about probability calibration).&lt;/li&gt;
&lt;li&gt;You &lt;strong&gt;should not use it&lt;/strong&gt; when your &lt;strong&gt;data is heavily imbalanced&lt;/strong&gt;. It was discussed extensively in this &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4349800/" rel="noopener noreferrer"&gt;article by Takaya Saito and Marc Rehmsmeier&lt;/a&gt;. The intuition is the following: false positive rate for highly imbalanced datasets is pulled down due to a large number of true negatives.&lt;/li&gt;
&lt;li&gt;You &lt;strong&gt;should use it when you care equally about positive and negative classes&lt;/strong&gt;. It naturally extends the imbalanced data discussion from the last section. If we care about true negatives as much as we care about true positives then it totally makes sense to use ROC AUC.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  17. Precision-Recall Curve &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It is a curve that combines precision (PPV) and Recall (TPR) in a single visualization. For every threshold, you calculate PPV and TPR and plot it. The higher on y-axis your curve is the better your model performance.&lt;/p&gt;

&lt;p&gt;You can use this plot to make an educated decision when it comes to the classic precision/recall dilemma. Obviously, the higher the recall the lower the precision. Knowing &lt;strong&gt;at which recall your precision starts to fall fast&lt;/strong&gt; can help you choose the threshold and deliver a better model.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_precision_recall&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;plot_precision_recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does it look:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fprec_rec_curve.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fprec_rec_curve.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="precision recall curve"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that for the negative class we maintain high precision and high recall almost throughout the entire range of thresholds. For the positive class precision is starting to fall as soon as we are recalling 0.2 of true positives and by the time we hit 0.8, it decreases to around 0.7.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  18. PR AUC score | Average precision &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Similarly to ROC AUC score you can calculate the Area Under the Precision-Recall Curve to get one number that describes model performance.&lt;/p&gt;

&lt;p&gt;You can also think about PR AUC as the average of precision scores calculated for each recall threshold [0.0, 1.0]. You can also adjust this definition to suit your business needs by choosing/clipping recall thresholds if needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;average_precision_score&lt;/span&gt;

&lt;span class="nf"&gt;average_precision_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fpr_rec_score.png%3Fw%3D541%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fpr_rec_score.png%3Fw%3D541%26ssl%3D1" alt="precision recall curve"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models that we suspect to be “truly” better are in fact better in this metric which is definitely a good thing. Overall, we can see high scores but way less optimistic then ROC AUC scores (0.96+).&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;when you want to &lt;strong&gt;communicate precision/recall decision&lt;/strong&gt; to other stakeholders&lt;/li&gt;
&lt;li&gt;when you want to &lt;strong&gt;choose the threshold that fits the business problem&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;when your data is &lt;strong&gt;heavily imbalanced&lt;/strong&gt;. As mentioned before, it was discussed extensively in this &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4349800/" rel="noopener noreferrer"&gt;article by Takaya Saito and Marc Rehmsmeier&lt;/a&gt;. The intuition is the following: since PR AUC focuses mainly on the positive class (PPV and TPR) it cares less about the frequent negative class.&lt;/li&gt;
&lt;li&gt;when &lt;strong&gt;you care more about positive than negative class&lt;/strong&gt;. If you care more about the positive class and hence PPV and TPR you should go with Precision-Recall curve and PR AUC (average precision).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  19. Log loss &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Log loss is often used as the objective function that is optimized under the hood of machine learning models. Yet, it can also be used as a performance metric.&lt;/p&gt;

&lt;p&gt;Basically, we calculate the difference between ground truth and predicted score for every observation and average those errors over all observations. For one observation the error formula reads:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flogloss_eq-1.png%3Fw%3D947%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flogloss_eq-1.png%3Fw%3D947%26ssl%3D1" alt="log loss"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The more certain our model is that an observation is positive when it is, in fact, positive the lower the error. But this is not a linear relationship. It is good to take a look at how the error changes as that difference increases:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flog_los_chart.png%3Fw%3D724%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flog_los_chart.png%3Fw%3D724%26ssl%3D1" alt="log loss"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So our model gets punished very heavily when we are certain about something that is untrue. For example, when we give a score of 0.9999 to an observation that is negative our loss jumps through the roof. That is why sometimes it makes sense to clip your predictions to decrease the risk of that happening.&lt;/p&gt;

&lt;p&gt;If you want to learn more about log-loss read this &lt;a href="https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a" rel="noopener noreferrer"&gt;article by Daniel Godoy&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;log_loss&lt;/span&gt;

&lt;span class="nf"&gt;log_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flog_los_table.png%3Fw%3D500%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flog_los_table.png%3Fw%3D500%26ssl%3D1" alt="log loss"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is difficult to really see strong improvement and get an intuitive feeling for how strong the model is. Also, the model that was chosen as the best one before (BIN-101) is in the middle of the pack. That can suggest that using log-loss as a performance metric can be a risky proposition.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pretty much &lt;strong&gt;always there is&lt;/strong&gt; a performance &lt;strong&gt;metric that better matches your&lt;/strong&gt; business &lt;strong&gt;problem&lt;/strong&gt;.  Because of that, I would use log-loss as an objective for your model with some other metric to evaluate performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  20. Brier score &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It is a measure of how far your predictions lie from the true values. For one observation it simply reads:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fbrier_loss_eq.png%3Fw%3D436%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fbrier_loss_eq.png%3Fw%3D436%26ssl%3D1" alt="brier score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically, it is a mean square error in the probability space and because of that, it is usually used to calibrate probabilities of the machine learning models. If you want to read more about probability calibration I recommend that you read this &lt;a href="https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/" rel="noopener noreferrer"&gt;article by Jason Brownlee&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It can be a great supplement to your ROC AUC score and other metrics that focus on other things.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;brier_score_loss&lt;/span&gt;

&lt;span class="nf"&gt;brier_score_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fbrier_table.png%3Fw%3D504%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fbrier_table.png%3Fw%3D504%26ssl%3D1" alt="brier score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model from the &lt;a href="https://ui.neptune.ml/neptune-ml/binary-classification-metrics/e/BIN-101/logs?utm_source=devto&amp;amp;utm_medium=cross_posting&amp;amp;utm_campaign=evaluation_metrics_for_binary_classification&amp;amp;utm_content=explore_experiment" rel="noopener noreferrer"&gt;experiment BIN-101&lt;/a&gt; has the best calibration and for that model, on average our predictions were off by 0.16 (√0.0263309).&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When &lt;strong&gt;you care about calibrated probabilities&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  21. Cumulative gains chart &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;In simple words, it helps you gauge how much you gain by using your model over a random model for a given fraction of top scored predictions.&lt;/p&gt;

&lt;p&gt;Simply put:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you order your predictions from highest to lowest and&lt;/li&gt;
&lt;li&gt;for every percentile you calculate the fraction of true positive observations up to that percentile.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It makes it easy to see the benefits of using your model to target given groups of users/accounts/transactions especially if you really care about sorting them.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_cumulative_gain&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;plot_cumulative_gain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does it look:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fcum_gain_chart.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fcum_gain_chart.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="cumulative gain chart"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that our cumulative gains chart shoots up very quickly as we increase the sample of highest-scored predictions. By the time we get to the 20th percentile over 90% of positive cases are covered. You could use this chart to prioritize and filter out possible fraudulent transactions for processing. &lt;/p&gt;

&lt;p&gt;Say we were to use our model to assign possible fraudulent transactions for processing and we needed to prioritize. We could use this chart to tell us where it makes the most sense to choose a cutoff.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Whenever you want to select the most promising customers or transactions to target and you want to use your model for sorting.&lt;/li&gt;
&lt;li&gt;It can be a good addition to ROC AUC score which measures ranking/sorting performance of your model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  22. Lift curve | lift chart &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;It is pretty much just a different representation of the cumulative gains chart:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we order the predictions from highest to lowest&lt;/li&gt;
&lt;li&gt;for every percentile, we calculate the fraction of true positive observations up to that percentile for our model and for the random model,&lt;/li&gt;
&lt;li&gt;we calculate the ratio of those fractions and plot it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It tells you how much better your model is than a random model for the given percentile of top scored predictions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_lift_curve&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;plot_lift_curve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does it look:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flift_curve_chart.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Flift_curve_chart.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="lift curve"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So for the top 10% of predictions, our model is over 10x better than random, for 20% is over 4x better and so on.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Whenever you want to select the most promising customers or transactions to target and you want to use your model for sorting.&lt;/li&gt;
&lt;li&gt;It can be a good addition to ROC AUC score which measures ranking/sorting performance of your model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  23. Kolmogorov-Smirnov plot &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;KS plot helps to assess the separation between prediction distributions for positive and negative classes.&lt;/p&gt;

&lt;p&gt;In order to create it you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sort your observations by the prediction score,&lt;/li&gt;
&lt;li&gt;for every cutoff point [0.0, 1.0] of the sorted dataset (depth) calculate the proportion of true positives and true negatives in this depth,&lt;/li&gt;
&lt;li&gt;plot those fractions, positive(depth)/positive(all), negative(depth)/negative(all), on Y-axis and dataset depth on X-axis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So it works similarly to cumulative gains chart but instead of just looking at positive class it looks at the separation between positive and negative class.&lt;/p&gt;

&lt;p&gt;A good explanation of KS plot and KS statistic can be found in this &lt;a href="http://rstudio-pubs-static.s3.amazonaws.com/303414_fb0a43efb0d7433983fdc9adcf87317f.html" rel="noopener noreferrer"&gt;article by Riaz Khan&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_ks_statistic&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;plot_ks_statistic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does it look:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fks_plot.png%3Fresize%3D1024%252C768%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fks_plot.png%3Fresize%3D1024%252C768%26ssl%3D1" alt="kolmogorov smirnov plot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So we can see that the largest difference is at a cutoff point of 0.034 of top predictions. After that threshold, it decreases at a moderate rate as we increase the percentage of top predictions. Around 0.8 it is really getting worse really fast. So even though the best separation is at 0.034 we could potentially push it a bit higher to get more positively classified observations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  24. Kolmogorov-Smirnov statistic &lt;a&gt;&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;If we want to take the KS plot and get one number that we can use as a metric we can look at all thresholds (dataset cutoffs) from KS plot and find the one for which the distance (separation) between the distributions of true positive and true negative observations is the highest.&lt;/p&gt;

&lt;p&gt;If there is a threshold for which all observations above are truly positive and all observations below are truly negative we get a perfect KS statistic of 1.0.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to compute:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scikitplot.helpers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;binary_ks_curve&lt;/span&gt;

&lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;binary_ks_curve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred_pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ks_stat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How models score in this metric:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fks_table.png%3Fw%3D512%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fks_table.png%3Fw%3D512%26ssl%3D1" alt="kolmogorov smirnov statistics"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By using the KS statistic as the metric we were able to rank BIN-101 as the best model which we truly expect to be “truly” best model.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;when your problem is about sorting/prioritizing the most relevant observations and you care equally about positive and negative classes.&lt;/li&gt;
&lt;li&gt;It can be a good addition to ROC AUC score which measures ranking/sorting performance of your model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jump back to the evaluation metrics list -&amp;gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;In this blog post, you’ve learned about various classification metrics and performance charts.&lt;/p&gt;

&lt;p&gt;We went over metric definitions, interpretations, we learned how to calculate them, and talked about when to use them.&lt;/p&gt;

&lt;p&gt;Hopefully, with all that knowledge you will be fully equipped to deal with metric-related problems in your future projects.&lt;/p&gt;

&lt;h1&gt;
  
  
  Bonus
&lt;/h1&gt;

&lt;p&gt;To help you use the information from this blog post to the fullest, I have prepared:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
logging helper function that calculates and logs all the metrics, performance charts, and metric by threshold charts&lt;/li&gt;
&lt;li&gt;
binary classification metrics cheatsheet with everything I talked about digested into a few pages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check those out below!&lt;/p&gt;

&lt;h3&gt;
  
  
  Logging helper function &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;If you want to log all of those metrics and performance charts that we covered for your machine learning project with just one function call and explore them in Neptune.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;install the package:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;neptune-contrib[all]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;import and run:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;neptunecontrib.monitoring.metrics&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;npt_metrics&lt;/span&gt;

&lt;span class="n"&gt;npt_metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_binary_classification_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;explore everything in the app:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpt_logged_metrics.gif%3Ffit%3D800%252C539%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ml%2Fwp-content%2Fuploads%2Fnpt_logged_metrics.gif%3Ffit%3D800%252C539%26ssl%3D1" alt="evaluation metrics logger"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Binary classification metrics cheatsheet &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;We’ve created a nice cheatsheet for you which takes all the content I went over in this blog post and puts it on a few-page, a digestible document which you can print and use whenever you need anything binary classification metrics related.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://neptune.ml/?jet_download=13718" rel="noopener noreferrer"&gt;Download binary classification metrics cheatsheet&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Example script &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;neptune&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;neptunecontrib.monitoring.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pickle_and_send_artifact&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;neptunecontrib.monitoring.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;log_binary_classification_metrics&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;neptunecontrib.versioning.data&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;log_data_version&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rcParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;font.size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rcParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;figure.figsize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;seaborn-whitegrid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define parameters
&lt;/span&gt;&lt;span class="n"&gt;PROJECT_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;neptune-ml/binary-classification-metrics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data/train.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;TEST_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data/test.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;NROWS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;random_state&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Load data
&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NROWS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TEST_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NROWS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;feature_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;isFraud&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;isFraud&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;isFraud&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Start experiment
&lt;/span&gt;&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PROJECT_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lightGBM training&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;upload_source_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;train.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;environment.yaml&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;log_data_version&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TRAIN_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;train_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;log_data_version&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TEST_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Train model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;MODEL_PARAMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Evaluate model
&lt;/span&gt;&lt;span class="n"&gt;y_test_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;log_binary_classification_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;pickle_and_send_artifact&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test_pred&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test_predictions.pkl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>machinelearning</category>
      <category>python</category>
    </item>
  </channel>
</rss>
