<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kamil A. Kaczmarek</title>
    <description>The latest articles on DEV Community by Kamil A. Kaczmarek (@kamil_k7k).</description>
    <link>https://dev.to/kamil_k7k</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F430709%2F4a7f6a2d-c7f8-487f-928e-58d4b75b6195.png</url>
      <title>DEV Community: Kamil A. Kaczmarek</title>
      <link>https://dev.to/kamil_k7k</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kamil_k7k"/>
    <language>en</language>
    <item>
      <title>Random Forest Regression: When Does It Fail and Why?</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Tue, 18 Aug 2020 22:33:26 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/random-forest-regression-when-does-it-fail-and-why-19i8</link>
      <guid>https://dev.to/kamil_k7k/random-forest-regression-when-does-it-fail-and-why-19i8</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/mwitiderrick/"&gt;Derrick Mwiti&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/random-forest-regression-when-does-it-fail-and-why?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-random-forest-regression-when-does-it-fail-and-why"&gt;Neptune blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;In this article, we’ll look at a major problem with using Random Forest for Regression which is &lt;strong&gt;extrapolation.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;We’ll cover the following items:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random Forest Regression vs Linear Regression&lt;/li&gt;
&lt;li&gt;Random Forest Regression Extrapolation Problem&lt;/li&gt;
&lt;li&gt;Potential solutions&lt;/li&gt;
&lt;li&gt;Should you use Random Forest for Regression?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive in.&lt;/p&gt;




&lt;h1&gt;
  
  
  Random Forest Regression vs Linear Regression
&lt;/h1&gt;

&lt;p&gt;Random Forest Regression is quite a robust algorithm, however, the question is &lt;strong&gt;should you use it for regression?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why not use linear regression instead?  The function in a Linear Regression can easily be written as y=mx + c while a function in a complex Random Forest Regression seems like a black box that can’t easily be represented as a function. &lt;/p&gt;

&lt;p&gt;Generally, Random Forests produce better results, work well on large datasets, and are able to work with missing data by creating estimates for them. However, they pose a major challenge that is that they can’t extrapolate outside unseen data. We’ll dive deeper into these challenges in a minute &lt;/p&gt;

&lt;h3&gt;
  
  
  Decision Tree Regression
&lt;/h3&gt;

&lt;p&gt;Decision Trees are great for obtaining non-linear relationships between input features and the target variable.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;inner working of a Decision Tree can be thought of as a bunch of if-else conditions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It starts at the very top with one node. This node then splits into a left and right node — decision nodes. These nodes then split into their respective right and left nodes. &lt;/p&gt;

&lt;p&gt;At the end of the leaf node, the average of the observation that occurs within that area is computed. The most bottom nodes are referred to as leaves or terminal nodes.&lt;/p&gt;

&lt;p&gt;The value in the leaves is usually the mean of the observations occurring within that specific region. For instance in the right most leaf node below, 552.889 is the average of the 5 samples.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--O8Ww2-hK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/d5QSiPN3bIKdLCWkdOlnRLpLleMMo5ut904gtGUBP3Q3244u1BVMHgqkcXeEo9HtLoRU6agt--Y_U_aG1Oxosf7voq9YBcxJOIQ6cW2YiSmQZ2zLLZO-CcVsK46powAHxlPrzoDC" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--O8Ww2-hK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/d5QSiPN3bIKdLCWkdOlnRLpLleMMo5ut904gtGUBP3Q3244u1BVMHgqkcXeEo9HtLoRU6agt--Y_U_aG1Oxosf7voq9YBcxJOIQ6cW2YiSmQZ2zLLZO-CcVsK46powAHxlPrzoDC" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How far this splitting goes is what is known as the depth of the tree. This is one of the hyperparameters that can be tuned. The maximum depth of the tree is specified so as to prevent the tree from becoming too deep — a scenario that leads to overfitting. &lt;/p&gt;

&lt;h3&gt;
  
  
  Random Forest Regression
&lt;/h3&gt;

&lt;p&gt;Random forest is an ensemble of decision trees. This is to say that many trees, constructed in a certain “random” way form a Random Forest. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each tree is created from a different sample of rows and at each node, a different sample of features is selected for splitting. &lt;/li&gt;
&lt;li&gt;Each of the trees makes its own individual prediction. &lt;/li&gt;
&lt;li&gt;These predictions are then averaged to produce a single result. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UaAg2DP7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/ZzEagZv-a3KFCSr610aEvXIPTRTWV_cFdgpXoDFlj_r7A8ex5L0aE33aLuQptfngJLiT5xX3yk8LwGAyvOVY9rIsqe1ZZmIvg71yQIdXuVxuPpgmvm85aAxmP32M-ODq6_E3uxrV" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UaAg2DP7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/ZzEagZv-a3KFCSr610aEvXIPTRTWV_cFdgpXoDFlj_r7A8ex5L0aE33aLuQptfngJLiT5xX3yk8LwGAyvOVY9rIsqe1ZZmIvg71yQIdXuVxuPpgmvm85aAxmP32M-ODq6_E3uxrV" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;source: &lt;a href="https://commons.wikimedia.org/wiki/File:Random_forest_diagram_complete.png"&gt;Wikimedia&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The averaging makes a Random Forest better than a single Decision Tree hence improves its accuracy and reduces overfitting. &lt;/p&gt;

&lt;p&gt;A prediction from the Random Forest Regressor is an average of the predictions produced by the trees in the forest. &lt;/p&gt;

&lt;h3&gt;
  
  
  Example of trained Linear Regression and Random Forest
&lt;/h3&gt;

&lt;p&gt;In order to dive in further, let’s look at an example of a Linear Regression and a Random Forest Regression. For this, we’ll apply the Linear Regression and a Random Forest Regression to the same dataset and compare the result. &lt;/p&gt;

&lt;p&gt;Let’s take this example dataset where you should predict the price of diamonds based on other features like carat, depth, table, x, y and z. If we look at the distribution of price below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YgjJiOEf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/6XreM_NDuvsLtTK-nyeaT2kEbD6-ENHcc5DCbjM4_n04kVXQIzPwU_-rb18OKK_uZfqRQfq7x1CjcJxSJxdENi5KVVaVVuiUZZ3ahOxqWGI96UaEUr955FGw8Tri0KSam_31FvHS" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YgjJiOEf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/6XreM_NDuvsLtTK-nyeaT2kEbD6-ENHcc5DCbjM4_n04kVXQIzPwU_-rb18OKK_uZfqRQfq7x1CjcJxSJxdENi5KVVaVVuiUZZ3ahOxqWGI96UaEUr955FGw8Tri0KSam_31FvHS" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UVYxvrU---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/FQ3TkJhrVdNFzo40aYyHgO2bwXYAbD7kgX6EHi_bvH08kbGEcu55OKNGMKCU5_iSWEaCW71oq5ysyyWFCC0rw0Ahi-miGxWvyFe_TwP2Msjo1E1iOHcdMygFmLKirbuOA7IS8alq" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UVYxvrU---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/FQ3TkJhrVdNFzo40aYyHgO2bwXYAbD7kgX6EHi_bvH08kbGEcu55OKNGMKCU5_iSWEaCW71oq5ysyyWFCC0rw0Ahi-miGxWvyFe_TwP2Msjo1E1iOHcdMygFmLKirbuOA7IS8alq" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that the &lt;strong&gt;price ranges from 326 to 18823.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s train the Linear Regression model and run predictions on the validation set.&lt;/p&gt;

&lt;p&gt;The distribution of predicted prices is the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FGXeTfP4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/K_XtAJSvytdOYMrU6gqmhAPzYwhYd7-PKxfegUceQP3pj8ePXcPDK9QtOVwjeT8SbfQr3RCDG8aKst02qGnvfQ49C8FC1nyx0joQddgEgIlUal4o9Gpgcn7CKfxvhW5NM-Nx51Oq" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FGXeTfP4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/K_XtAJSvytdOYMrU6gqmhAPzYwhYd7-PKxfegUceQP3pj8ePXcPDK9QtOVwjeT8SbfQr3RCDG8aKst02qGnvfQ49C8FC1nyx0joQddgEgIlUal4o9Gpgcn7CKfxvhW5NM-Nx51Oq" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kLhzpidA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/bYkQceurBuITEmOiGaaRY6e0W8DItepqjEWP3QCVS-Wn-UP0EaXr3hjVOSX0-1NtcZ_oxECfSNOSZ_UH5_FWOgc31W0axILap77zFFo3ciEhECzL3At-tX4fgjO69P3GhYjQkmlZ" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kLhzpidA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/bYkQceurBuITEmOiGaaRY6e0W8DItepqjEWP3QCVS-Wn-UP0EaXr3hjVOSX0-1NtcZ_oxECfSNOSZ_UH5_FWOgc31W0axILap77zFFo3ciEhECzL3At-tX4fgjO69P3GhYjQkmlZ" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predicted prices are clearly outside the range of values of ''price'' seen in the training dataset.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Linear Regression model, just like the name suggests, created a linear model on the data. A simple way to think about it is in the form of &lt;code&gt;y = mx+C&lt;/code&gt;. Therefore, since it fits a linear model, it is able to obtain values outside the training set during prediction. &lt;strong&gt;It is able to extrapolate based on the data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s now look at the results obtained from a Random Forest Regressor using the same dataset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fTMXkp8y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/93-amJ52_eGP2poE6wMoMG4RWkvH5Jw-w7fwSfq0unViy0TttQy8pUSHzzAKv9nSrqjU2LJqrht18N4tdRnylHQ5w_A4MGGkAiXzMmL1N_Jd1JuecRTH7d71oav74iv-Ca1Ysxt7" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fTMXkp8y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/93-amJ52_eGP2poE6wMoMG4RWkvH5Jw-w7fwSfq0unViy0TttQy8pUSHzzAKv9nSrqjU2LJqrht18N4tdRnylHQ5w_A4MGGkAiXzMmL1N_Jd1JuecRTH7d71oav74iv-Ca1Ysxt7" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SH294VC0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/cirOkDSeWWMEd947nOxhcQf7jE5nxpiIDd4SsXkTUmigVvCS9-wVenoAdS34EmPomFyR75WsJbwnRkPudVgMY8NUTsk2ahwsJxVdtQmMQxGP_FkyZYhlgDVi4HRT09Yq1ixuuFMK" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SH294VC0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/cirOkDSeWWMEd947nOxhcQf7jE5nxpiIDd4SsXkTUmigVvCS9-wVenoAdS34EmPomFyR75WsJbwnRkPudVgMY8NUTsk2ahwsJxVdtQmMQxGP_FkyZYhlgDVi4HRT09Yq1ixuuFMK" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These values are clearly &lt;strong&gt;within the range of 326 and 18823&lt;/strong&gt;  -  just like in our training set. There are no values outside that range. &lt;strong&gt;Random Forest cannot extrapolate.&lt;/strong&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Extrapolation Problem
&lt;/h1&gt;

&lt;p&gt;As you have seen above, when using a Random Forest Regressor, the predicted values are never outside the training set values for the target variable.&lt;/p&gt;

&lt;p&gt;If you look at prediction values they will look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PwOlqJ92--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/aTc9sANC3x-luOlDmSVMy9rUkSp6K1JMYgODnWl_2iPaAfPgk-ee8Sm2orKIxl-LDnVss8u11_IxgpuLuFhBF_4yOcwl2LwsDXJ2xHHQZS_DUghDK-jU2kX1-tgX3s24WZz-euja" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PwOlqJ92--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/aTc9sANC3x-luOlDmSVMy9rUkSp6K1JMYgODnWl_2iPaAfPgk-ee8Sm2orKIxl-LDnVss8u11_IxgpuLuFhBF_4yOcwl2LwsDXJ2xHHQZS_DUghDK-jU2kX1-tgX3s24WZz-euja" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://www.researchgate.net/publication/327298817_Random_forest_as_a_generic_framework_for_predictive_modeling_of_spatial_and_spatio-temporal_variables"&gt;source&lt;/a&gt;: Hengl, Tomislav et. al “Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables”. PeerJ. 6. e5518. 10.7717/peerj.5518.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Wondering why?&lt;/p&gt;

&lt;p&gt;Let’s explore that phenomenon here. The data used above has the following columns carat, depth, table, x, y, z for predicting the price.&lt;/p&gt;

&lt;p&gt;The diagram below shows one decision tree from the Random Forest Regressor. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nPakNEmN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/geF5UxG5sX2r4OSXvoRW0An6lsrucPUdOzC3_qP2yoVNreLRhx-Q6XOZmEgGiQknDAlh1cIxbubJISwxy2of_QM-KVBvBEWA6PzOcICek3Ol9wqoLb8NWbvbG_lUzn-okQmX1UiP" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nPakNEmN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/geF5UxG5sX2r4OSXvoRW0An6lsrucPUdOzC3_qP2yoVNreLRhx-Q6XOZmEgGiQknDAlh1cIxbubJISwxy2of_QM-KVBvBEWA6PzOcICek3Ol9wqoLb8NWbvbG_lUzn-okQmX1UiP" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s zoom in to a smaller section of this tree. For example, there are 4 samples with depth &amp;lt;= 62.75, x &amp;lt;= 5.545, carat &amp;lt;= 0.905, and z &amp;lt;= 3.915. The price being predicted for these is 2775.75. This figure represents the mean of all these four samples. Therefore, &lt;strong&gt;any value in the test set that falls in this leaf will be predicted as 2775.75.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--anoiuth0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/tpwTzOrAuYhpaWu-sybNBCQkFBdaTZhP845A-YhNs_4fTBV-2b4FcAcuMIUqKycwP5RbT7vmPGEhXNm3AdeGVwyhPfDS_yZyO15pRhlZaslocc8scoasQtdsDtXbTP_gd9z4FweF" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--anoiuth0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/tpwTzOrAuYhpaWu-sybNBCQkFBdaTZhP845A-YhNs_4fTBV-2b4FcAcuMIUqKycwP5RbT7vmPGEhXNm3AdeGVwyhPfDS_yZyO15pRhlZaslocc8scoasQtdsDtXbTP_gd9z4FweF" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is to say that when the Random Forest Regressor is tasked with the problem of predicting for values not previously seen, it will always predict an average of the values seen previously. Obviously the average of a sample can not fall outside the highest and lowest values in the sample. &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Random Forest Regressor is unable to discover trends that would enable it in extrapolating values that fall outside the training set.&lt;/strong&gt; When faced with such a scenario, the regressor assumes that the prediction will fall close to the maximum value in the training set. Figure above illustrates that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Potential Solutions
&lt;/h3&gt;

&lt;p&gt;Ok, so how can you deal with this extrapolation problem?&lt;/p&gt;

&lt;p&gt;There are a couple of options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a linear model such as SVM regression, Linear Regression, etc,&lt;/li&gt;
&lt;li&gt;Build a deep learning model because neural nets are able to extrapolate (they are basically stacked linear regression models on steroids),&lt;/li&gt;
&lt;li&gt;Combine predictors using stacking. For example, you can create a stacking regressor using a Linear model and a Random Forest Regressor.,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use modified versions of random forest&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of such extensions is &lt;a href="https://arxiv.org/pdf/1904.10416.pdf"&gt;Regression-Enhanced Random Forests&lt;/a&gt; (RERFs). The authors of this paper propose a technique borrowed from the strengths of penalized parametric regression to give better results in extrapolation problems.&lt;/p&gt;

&lt;p&gt;Specifically there are two steps to the process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run Lasso before Random Forest, &lt;/li&gt;
&lt;li&gt;train a Random Forest on the residuals from Lasso. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since Random Forest is a fully nonparametric predictive algorithm, it may not efficiently incorporate known relationships between the response and the predictors. The response values are the observed values Y1, . . . , Yn  from the training data. RERFs are able to incorporate known relationships between the responses and the predictors which is another benefit of using Regression-Enhanced Random Forests for regression problems. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sZ-2e2wx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/dbxTPJM7BudrFSYY4lSDneKHYwxkt5XvXgRRU-3FMAzDflmOr96tVdcBW6idIjj556UqOwqLHOkPVF9Fm_pOPWPrxnjI4U8kAoOXFXB9nHWNepcK_ZjQtXfdIFAmRszn-0lF94MO" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sZ-2e2wx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/dbxTPJM7BudrFSYY4lSDneKHYwxkt5XvXgRRU-3FMAzDflmOr96tVdcBW6idIjj556UqOwqLHOkPVF9Fm_pOPWPrxnjI4U8kAoOXFXB9nHWNepcK_ZjQtXfdIFAmRszn-0lF94MO" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://arxiv.org/abs/1904.10416"&gt;source&lt;/a&gt;: Haozhe Zhang et. al 2019, Regression-Enhanced Random Forests&lt;/em&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;At this point, I am sure you might be wondering whether or not you should use a Random Forest for regression problems.&lt;/p&gt;

&lt;p&gt;Let’s look at that. &lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When the data has a non-linear trend and extrapolation outside the training data is not important.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When not to use it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When your data is in time series form. Time series problems require identification of a growing or decreasing trend that a Random Forest Regressor will not be able to formulate. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hopefully, this article gave you some background into the inner workings of Random Forest Regression.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/mwitiderrick/"&gt;Derrick Mwiti&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/random-forest-regression-when-does-it-fail-and-why?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-random-forest-regression-when-does-it-fail-and-why"&gt;Neptune blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
      <category>regression</category>
    </item>
    <item>
      <title>The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Use – Things We Learned from 41 ML Startups</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Thu, 30 Jul 2020 16:02:24 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/the-best-tools-libraries-frameworks-and-methodologies-that-machine-learning-teams-use-things-we-learned-from-41-ml-startups-2ep8</link>
      <guid>https://dev.to/kamil_k7k/the-best-tools-libraries-frameworks-and-methodologies-that-machine-learning-teams-use-things-we-learned-from-41-ml-startups-2ep8</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/jakub-czakon-2b797b69/" rel="noopener noreferrer"&gt;Jakub Czakon&lt;/a&gt; posted on the Neptune blog&lt;/em&gt;. &lt;/p&gt;




&lt;p&gt;Setting up a good tool stack for your Machine Learning team is important to work efficiently and be able to focus on delivering results. If you work at a startup you know that setting up an environment that can grow with your team, needs of the users and rapidly evolving ML landscape is especially important. &lt;/p&gt;

&lt;p&gt;We wondered: &lt;strong&gt;“What are the best tools, libraries and frameworks that ML startups use?”&lt;/strong&gt; to tackle this challenge.&lt;/p&gt;

&lt;p&gt;And to answer that question we asked &lt;strong&gt;41 Machine Learning startups&lt;/strong&gt; from all over the world.&lt;/p&gt;

&lt;p&gt;The result?&lt;/p&gt;

&lt;p&gt;A ton of great advice that we grouped into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Methodology&lt;/li&gt;
&lt;li&gt;Software development setup&lt;/li&gt;
&lt;li&gt;Machine Learning frameworks&lt;/li&gt;
&lt;li&gt;MLOps&lt;/li&gt;
&lt;li&gt;Unexpected 🙂&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read on to figure out what will work for your machine learning team.&lt;/p&gt;

&lt;h1&gt;
  
  
  Good methodology is the key
&lt;/h1&gt;

&lt;p&gt;Tools are only as strong as the methodology that employs them. &lt;/p&gt;

&lt;p&gt;If you run around training models on some randomly acquired data and deploy whatever model you can get your hands on, sooner or later there will be trouble 🙂&lt;/p&gt;

&lt;p&gt;Kai Mildenberger from &lt;a href="https://psyml.co/" rel="noopener noreferrer"&gt;psyML&lt;/a&gt; says that: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To us, the careful versioning of all the training and testing data is probably the most essential tool/methodology. We expect that to remain one of the most key elements in our toolbox, even as all of the techniques and mathematical models iterate forever. A second aspect might be to be extremely hypothesis driven. We use that as the single most important methodology to develop models.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I think having a strong understanding of what you want to use your tools for (and that you actually need them) is the very first step. &lt;/p&gt;

&lt;p&gt;That said it is important to know what is out there and what people in similar situations use successfully. &lt;/p&gt;

&lt;p&gt;Let’s dive right into that!&lt;/p&gt;

&lt;h1&gt;
  
  
  Software development tooling is the backbone of ML teams
&lt;/h1&gt;

&lt;p&gt;Development environment is the foundation of every team’s workflow. So it was very interesting to learn what tools companies around the world consider the best in this area.&lt;br&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-software-development.gif%3Fw%3D1200%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-software-development.gif%3Fw%3D1200%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: giphy.com&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;ML teams use various tools as an IDE. Many teams like &lt;a href="https://simplereport.ca/" rel="noopener noreferrer"&gt;SimpleReport&lt;/a&gt; and &lt;a href="https://www.hypergiant.com/" rel="noopener noreferrer"&gt;Hypergiant&lt;/a&gt; use Jupyter Notebooks and Jupyter Lab with its ecosystem of NB Extensions. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Jupyter Notebook is very useful for quick experiments and visualization, especially when exchanging ideas between multiple team members. Because we use Tensorflow, Google Colab is a natural extension to share our code more easily.” – says Wenxi Chen from Juji.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Various flavours of Jupyter have been mentioned as well. Deepnote (a hosted Jupyter Notebook solution) is “loved for their ML stuff” by the team of Intersect Labs while Google Colab “is a natural extension to share our code more easily” for the &lt;a href="https://juji.io/" rel="noopener noreferrer"&gt;Juji&lt;/a&gt; team.&lt;/p&gt;

&lt;p&gt;Others choose more standard software development IDEs. Among those Pycharm, tooted by Or Izchak from &lt;a href="https://www.hotelmize.com/" rel="noopener noreferrer"&gt;Hotelmize&lt;/a&gt; as “the best Python IDE” and Visual Studio Code used by &lt;a href="https://www.scanta.io/" rel="noopener noreferrer"&gt;Scanta&lt;/a&gt; for its “ease of connectivity with Azure and many ML-based extensions provided” were mentioned the most.&lt;/p&gt;

&lt;p&gt;For teams that use R language like SimpleReport, RStudio was a clear winner when it comes to the IDE of choice. As Kenton White from &lt;a href="https://advancedsymbolics.com/" rel="noopener noreferrer"&gt;Advanced Symbolics&lt;/a&gt; mentions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We mostly use R + RStudio for analysis and model building.  The workhorse for our AI modeling is VARX for time series forecasts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When it comes to code versioning Github is a clear favourite. As Daniel Hanchen from &lt;a href="https://daniel3112.typeform.com/to/K84Qu0" rel="noopener noreferrer"&gt;Umbra AI&lt;/a&gt; mentions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Github (now free for all teams!!) with its super robust version control system and easy repository sharing functionality is super useful for most ML teams.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Among most popular languages we have Python, R and interestingly &lt;a href="https://clojure.org/" rel="noopener noreferrer"&gt;Clojure&lt;/a&gt; mentioned by Wenxi Chen from &lt;a href="https://juji.io/" rel="noopener noreferrer"&gt;Juji&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As for the environment/infrastructure setup notable mentions from ML startups are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“AWS as the platform for deployment” &lt;a href="https://simplereport.ca/" rel="noopener noreferrer"&gt;(Simple Report)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;“Anaconda serves as our goto tool for running ML experiments due to its &lt;em&gt;live code&lt;/em&gt; feature wherein it can be used to combine software code, computational output, explanatory text, and multimedia resources in a single document.” (&lt;a href="http://www.scanta.io/" rel="noopener noreferrer"&gt;Scanta&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;“Redis dominates as an in-memory data structure store due to its support for different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indexes.” (&lt;a href="https://www.scanta.io/" rel="noopener noreferrer"&gt;Scanta&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.snowflake.com/" rel="noopener noreferrer"&gt;“Snowflake&lt;/a&gt; and Amazon S3 for data storage.” (&lt;a href="https://www.hypergiant.com/" rel="noopener noreferrer"&gt;Hypergiant&lt;/a&gt;)
“Spark-pyspark – very simple api for distributing job to work on big data.” (&lt;a href="https://www.hotelmize.com/" rel="noopener noreferrer"&gt;Hotelmize&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Sooo many Machine Learning Frameworks
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-Choices.gif%3Fw%3D1200%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-Choices.gif%3Fw%3D1200%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: giphy.com&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Integrated development environment is crucial, but one needs a good ML framework on top of that to transform the vision into a project. The range of tools pointed out by the startups is quite diverse here. &lt;/p&gt;

&lt;p&gt;For playing with tabular data, Pandas was mentioned the most. &lt;/p&gt;

&lt;p&gt;Additional benefit of using Pandas mentioned by Nemo D’Qrill, the CEO of &lt;a href="https://sigmapolaris.com/" rel="noopener noreferrer"&gt;Sigma Polaris&lt;/a&gt; is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I'd say that Pandas is probably one of the most valuable tools, in particular when working in collaboration with external developers on various projects. Having all data files in the form of data frames, across teams and individual developers, makes for a much smoother collaboration and unnecessary hassle.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Interesting library mentioned by Software Developer from &lt;a href="https://www.hotelmize.com/" rel="noopener noreferrer"&gt;Hotelmize&lt;/a&gt; was &lt;a href="https://github.com/dovpanda-dev/dovpanda" rel="noopener noreferrer"&gt;dovpanda&lt;/a&gt; – python extension library for panda which gives you insights on your panda code and data while working with panda.&lt;/p&gt;

&lt;p&gt;When it comes to visualization matplotlib is used the most by the likes of &lt;a href="https://www.trustium.com/" rel="noopener noreferrer"&gt;Trustium&lt;/a&gt;, &lt;a href="https://www.hotelmize.com/" rel="noopener noreferrer"&gt;Hotelmize&lt;/a&gt;, &lt;a href="https://www.hypergiant.com/" rel="noopener noreferrer"&gt;Hypergiant&lt;/a&gt; and others. &lt;/p&gt;

&lt;p&gt;Plotly was also a common choice. As developers from &lt;a href="https://www.wordnerds.ai/" rel="noopener noreferrer"&gt;Wordnerds&lt;/a&gt;  explain “for great visualisations to make data understandable and look good”. Dash, a tool for building interactive dashboards on top of  Plotly charts, was recommended by Theodoros Giannakopoulos from &lt;a href="https://behavioralsignals.com/" rel="noopener noreferrer"&gt;Behavioral Signals&lt;/a&gt; for ML teams that need to present their analytical results in a nice, user-friendly manner. &lt;/p&gt;

&lt;p&gt;For more standard machine learning problems most teams like &lt;a href="https://www.wordnerds.ai/" rel="noopener noreferrer"&gt;Wordnerds&lt;/a&gt;, &lt;a href="https://www.sensitrust.io/" rel="noopener noreferrer"&gt;Sensitrust&lt;/a&gt; or &lt;a href="https://behavioralsignals.com/" rel="noopener noreferrer"&gt;Behavioral Signals&lt;/a&gt; use Scikit-Learn. ML team from &lt;a href="https://www.ischoolconnect.com/" rel="noopener noreferrer"&gt;iSchoolConnect&lt;/a&gt; explains why it is such a great tool:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It is one of the most popular toolkits used by machine learning researchers, engineers, and developers. The ease with which you can get what you want is amazing! From feature engineering to interpretability, scikit-learn provides you with every functionality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Truth be told Pandas and Sklearn are really the workhorses of ML teams all over the world. &lt;/p&gt;

&lt;p&gt;As Michael Phillips, Data Scientist from &lt;a href="https://numer.ai/" rel="noopener noreferrer"&gt;Numerai&lt;/a&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Modern Python libraries like Pandas and Scikit-learn have 99% of the tools that an ML team needs to excel.  Though simple, these tools have extraordinary power in the hands of an experienced data scientist&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In my opinion, while in the general ML team population this may be true, in the case of ML Startups a lot of work goes into state of the art methods which usually means deep learning models. &lt;/p&gt;

&lt;p&gt;When it comes to general deep learning frameworks we had many different opinions.&lt;/p&gt;

&lt;p&gt;Many teams like &lt;a href="https://www.wordnerds.ai/" rel="noopener noreferrer"&gt;Wordnerds&lt;/a&gt; and &lt;a href="https://behavioralsignals.com/" rel="noopener noreferrer"&gt;Behavioral Signals&lt;/a&gt; choose PyTorch.&lt;/p&gt;

&lt;p&gt;The team of ML experts from &lt;a href="https://dev.toSchool"&gt;iSchoolConnect&lt;/a&gt; tells us why so many ML practitioners and researchers choose PyTorch.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you want to go deep into the waters, PyTorch is the right tool for you! Initially, it will take time to get accustomed to it but once you get comfortable with it there is nothing like it! The library is even optimized for quickly training and evaluating your ML-models.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But it is still Tensorflow and Keras that are leading in popularity.&lt;/p&gt;

&lt;p&gt;Most teams like Strayos and &lt;a href="https://repetere.ai/" rel="noopener noreferrer"&gt;Repetere&lt;/a&gt; choose it as their ML development frameworks. Cedar Milazzo from &lt;a href="https://www.trustium.com/" rel="noopener noreferrer"&gt;Trustium&lt;/a&gt; said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tensorflow, of course. Especially with 2.0! Eager execution was what TF really needed and now it’s here. I should note that when I say “”tensorflow”” I mean “”tensorflow + keras”” since keras is now built into TF.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s also important to mention that you don’t have to choose one framework and exclude others.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href="https://melodia.io/" rel="noopener noreferrer"&gt;Melodia&lt;/a&gt;’s Founder, Omid Aryan said that:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The tools that have been most beneficial to us are TensorFlow, PyTorch, and Python’s old scikit-learn tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are some popular frameworks for more specialized applications.&lt;/p&gt;

&lt;p&gt;In Natural Language Processing we’ve heard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“&lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Huggingface&lt;/a&gt;: it’s the most advanced and highest performance NLP library ever created. It’s the first of its kind in that researchers are directly contributing to a highly scalable NLP library. It separates itself from other similar tools by having production level tools available a few months after a newer model is published” says Ben Lamm, the CEO of &lt;a href="https://www.hypergiant.com/" rel="noopener noreferrer"&gt;Hypergiant&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;“Spacy is a very cool natural language toolkit. NLTK is by far the most popular and I certainly use it, but spacy does lots of things NLTK can’t do so well, such as stemming and dependency parsing.”  mentions Cedar Milazzo, the CEO of &lt;a href="https://www.trustium.com/" rel="noopener noreferrer"&gt;Trustium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;“Gensim is good for word vectors and document vectors too, and I believe it isn’t so popular.” adds Cedar Milazzo.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Computer Vision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“&lt;a href="https://opencv.org/" rel="noopener noreferrer"&gt;OpenCV&lt;/a&gt; is indispensable for computer vision work” for &lt;a href="https://www.hypergiant.com/" rel="noopener noreferrer"&gt;Hypergiant&lt;/a&gt;. Their CEO says *“It’s a classic CV ensemble of methods from the 1960s until 2014 that are useful pre and post processing and can work well in scenarios where a neural network would be overkill.” *&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also it’s worth noting that not every team is implementing deep learning models themselves. &lt;/p&gt;

&lt;p&gt;As Iuliia Gribanova and Lance Seidman from &lt;a href="https://munchron.com/" rel="noopener noreferrer"&gt;Munchron&lt;/a&gt; say, there are now API services where you can outsource some (or all) of the work:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Google ML kit is currently one of the best easy-to-entry tools that lets mobile developers easily embed ML API services like face recognition, image labeling, and other items that Google offers into an Android or iOS App. But additionally, you can also bring in your own TF (TensorFlow) lite models to run experiments and then bring them into production using Google’s ML Kit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I think it’s important to mention that not always you can choose the latest and greatest libraries and the toolstack gets handed to you when you join the team.&lt;/p&gt;

&lt;p&gt;As Naureen Mahmood from &lt;a href="https://www.meshcapade.com/" rel="noopener noreferrer"&gt;Meshcapade&lt;/a&gt; shared:&lt;/p&gt;

&lt;p&gt;*“In the past, some important autodiff libraries that have made it possible for us to run multiple joint optimizations, and in doing so helped us build some of the core tech we still use today, are Chumpy &amp;amp; OpenDR. Now there are fancier and faster ones out there, like Pytorch and TensorFlow.” *&lt;/p&gt;

&lt;p&gt;When it comes to model deployment Patricia Thaine from &lt;a href="https://www.private-ai.ca/" rel="noopener noreferrer"&gt;Private AI&lt;/a&gt; mentions &lt;em&gt;“tflite, flask, tfjs and coreml”&lt;/em&gt; as their frameworks of choice. She also suggests that visualizing models is very important to them and they are using &lt;a href="https://github.com/lutzroeder/netron" rel="noopener noreferrer"&gt;Netron&lt;/a&gt; for that.&lt;/p&gt;

&lt;p&gt;But there are tools that go beyond frameworks that can help ML teams deliver real value quickly. &lt;/p&gt;

&lt;p&gt;This is where MLOps comes in.&lt;/p&gt;

&lt;h1&gt;
  
  
  MLOps starts to be more important for machine learning startups
&lt;/h1&gt;

&lt;p&gt;You may be wondering what MLOps is or why you should care.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-What-did-you-say-1.gif%3Fw%3D1200%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi1.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-What-did-you-say-1.gif%3Fw%3D1200%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: giphy.com&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The term alludes to DevOps and describes tools used for operationalization of machine learning activities.&lt;/p&gt;

&lt;p&gt;Jean-Christophe Petkovich CTO at &lt;a href="https://acerta.ca/" rel="noopener noreferrer"&gt;Acerta&lt;/a&gt; provided us with an extremely thorough explanation of how their ML team approaches MLOps. It was so good that I decided to share it (almost) in full:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I think most of the interesting tools that are going to see broader adoption in 2020 are centered around MLOps. There was a big push to build those tools last year, and this year we’re going to find out who the winners will be. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;For me, MLflow seems to be in the lead for tracking experiments, artifacts, and outcomes. A lot of what we’ve built internally for this purpose are extensions to the functionality of MLflow to incorporate more data tracking similar to how DVC tracks data.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The other big names in MLOps are Kubeflow, Airflow and TFX with Apache Beam—all tools designed for capturing data science workflows and pipelines end-to-end.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;There are several ingredients for a complete MLOps system:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need to keep track of how all three of these things, the models, their code, and their data, are related.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;When it comes to tracking, MLflow is our pick, it’s tried-and true at Acerta, as several of our employees already used it as part of their personal workflows, and now it’s the de facto tracking tool for our data scientists.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For tracking data pipelines or workflows themselves, we are currently developing against Kubeflow since we’re already on Kubernetes making deployment a breeze, and our internal model pipelining infrastructure meshes well with the Kubeflow component concept.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;On top of all of this MLOps development, there’s a shift toward building feature stores—basically specialized data lakes for storing  preprocessed data in various forms—but I haven’t seen any serious contenders that really stand out yet.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;These are all tools that need to be in place—I know a lot of places are doing their own home-baked solutions to this problem, but I think this year we’re going to see a lot more standardization around machine learning applications.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Emily Kruger from &lt;a href="https://kaskada.com/" rel="noopener noreferrer"&gt;Kaskada&lt;/a&gt;, which accidently is a startup building a feature store solution 🙂 adds:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The most useful tools from our perspective are feature stores, automated deployment pipelines, and experimentation platforms. All these tools address challenges with MLOps, which is an important emerging space for data teams, especially those running ML models in production and at scale.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ok so in light of this what are other teams using to solve those problems?&lt;/p&gt;

&lt;p&gt;Some teams prefer end-to-end platforms, others create everything in-house. Many teams are somewhere in between with a mix of some specific tools and home-grown solutions.&lt;/p&gt;

&lt;p&gt;In terms of larger platforms, two names that were mentioned often were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon SageMaker which according to ML team from &lt;a href="https://vcv.ai/" rel="noopener noreferrer"&gt;VCV&lt;/a&gt;  &lt;em&gt;“has a variety of tools for distributed collaboration”&lt;/em&gt; and &lt;a href="https://simplereport.ca/" rel="noopener noreferrer"&gt;SimpleReport&lt;/a&gt; chooses as their platform for deployment.&lt;/li&gt;
&lt;li&gt;Azure which as &lt;a href="https://www.scanta.io/" rel="noopener noreferrer"&gt;Scanta&lt;/a&gt; team tells us &lt;em&gt;“serves as a way to build, train, and deploy our Machine Learning applications as well as it helps in adding intelligence in our applications via their Language, Vision, and Speech recognition support. Azure has been our choice of IaaS due to rapid deployments and low-cost Virtual Machines.”&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Experiment tracking tools come in and we see ML startups use various options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strayos uses Comet ML “for model collaboration and results sharing”.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.hotelmize.com/" rel="noopener noreferrer"&gt;Hotelmize&lt;/a&gt; and others are going with tensorboard which “is the best tool to visualize your model behavior, specially for neural network models.”&lt;/li&gt;
&lt;li&gt;“MLflow seems to be in the lead for tracking experiments, artifacts, and outcomes.” as Jean-Christophe Petkovich CTO at &lt;a href="https://acerta.ca/" rel="noopener noreferrer"&gt;Acerta&lt;/a&gt; mentioned before&lt;/li&gt;
&lt;li&gt;Other teams like &lt;a href="https://repetere.ai/" rel="noopener noreferrer"&gt;Repetere&lt;/a&gt; try to keep it simple and say that &lt;em&gt;”Our tooling is very simple, we use tensorflow and s3 to version model artifacts for analysis”.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typically, experiment tracking tools keep track of metrics and hyperparameters but as James Kaplan from &lt;a href="https://meetkai.com/" rel="noopener noreferrer"&gt;MeetKai&lt;/a&gt; points out:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“The most useful types of ML tools for us are anything that helps with dealing with model regressions caused by everything except the model architecture. Most of these are tools we have built ourselves, but I assume there are many existing options out there. We like to look at confusion matrices that can be visually diff’d under scenarios such as:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;new data added to the training set (and the providence of said data)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;quantization configurations&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;pruning/distillation&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;*We have found that being able to track performance across new data additions is far more important than being able to just track performance across hyper parameters of the model itself. This is especially so when datasets grow/change far faster than model configurations” *&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Speaking of pruning/distillation Malte Pietsch, Co-Founder of deepset explains that:&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We see an increasing need for tools that help us profile &amp;amp; optimize models in terms of speed and hardware utilization. With the growing size of NLP models, it becomes increasingly important to make training and inference more efficient. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;While we are still looking for the ideal tooling here, we found pytest-benchmark, NVIDIA’s Nsight Systems and kernprof quite helpful.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Another interesting tool for benchmarking training/inference is &lt;a href="https://mlperf.org/" rel="noopener noreferrer"&gt;MLPerf&lt;/a&gt; suggested by Anton Lokhmotov from &lt;a href="http://dividiti.com/" rel="noopener noreferrer"&gt;Dividiti&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Experimenting with models is undoubtedly very important but putting models in front of end-users is where the magic happens (for most of us). On that front Rosa Lin from &lt;a href="https://tolstoy.ai/" rel="noopener noreferrer"&gt;Tolstoy&lt;/a&gt; mentioned using streamlit.io which is a “great tool for building ML model web apps easily.”&lt;/p&gt;

&lt;p&gt;Valuable word of warning when it comes to using ML focused solutions comes from Gianvito Pio, Co-Founder of &lt;a href="https://www.sensitrust.io/" rel="noopener noreferrer"&gt;Sensitrust&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“There are also tools like Knife and Orange that allow you to design an entire pipeline in a drag-and-drop fashion, as well as AutoML tools (see AutoWEKA, auto-sklearn and JADBio) that will automatically select the most appropriate model for a specific task.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;However, in my opinion, a strong expertise in the Machine Learning and AI areas are still necessary. Even the “”best, automated”” tool can be misused, without a good background in the field.”&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Unexpected
&lt;/h1&gt;

&lt;p&gt;Ok, when I started working on this, some answers like PyTorch, Pandas or Jupyter Lab were what I expected. &lt;/p&gt;

&lt;p&gt;But one answer we received was really out-of-the-box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-what.gif%3Fw%3D1200%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi2.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FGIF-what.gif%3Fw%3D1200%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: giphy.com&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It put all the other things in perspective and made me think that perhaps we should take a step back and take a look at the larger picture.&lt;/p&gt;

&lt;p&gt;Christopher Penn from &lt;a href="https://www.trustinsights.ai/" rel="noopener noreferrer"&gt;Trust Insights&lt;/a&gt; suggested that ML teams should use a rather interesting “tool”:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Wetware – the hardware and software combination that sits between your ears – is the most important, most useful, most powerful machine learning tool you have. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Far, FAR too many people are hoping AI is a magic wand that solves everything with little to no human input. The reverse is true; AI requires more management and scrutiny than ever, because we lack so much visibility into complex models.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Interpretability and explainability are the greatest challenges we face right now, in the wake of massive scandals about bias and discrimination. And AI vendors make this worse by focusing on post hoc explanations of models instead of building the expensive but worthwhile interpretations and checkpoints into models.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So, wetware – the human in the loop – is the most useful tool in 2020 and for the foreseeable future.”&lt;/em&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Our perspective:
&lt;/h1&gt;

&lt;p&gt;Since we are building tools for ML teams and some of our customers are AI startups I think it makes sense to give you our perspective.&lt;/p&gt;

&lt;p&gt;So we see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A lot of teams use Jupyter ecosystem for exploration and Pycharm/VSCode for development&lt;/li&gt;
&lt;li&gt;For deep learning people are using everything Tensorflow, Keras and Pytorch. Notably, we see more and more people using &lt;a href="https://neptune.ai/blog/model-training-libraries-pytorch-ecosystem?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-tools-libraries-frameworks-methodologies-ml-startups-roundup" rel="noopener noreferrer"&gt;high-level PyTorch training libraries like Lightning, Ignite, Catalyst, fastai and Skorch&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;For visual exploration people are using matplotlib, plotly,  altair and hiplot (hyperparameter visualizations)&lt;/li&gt;
&lt;li&gt;For running hyperparameter sweeps and general run orchestration some &lt;a href="https://medium.com/ynap-tech/part-ii-artificial-intelligence-successfully-navigating-from-experimentation-to-business-value-b37ddf75332c" rel="noopener noreferrer"&gt;teams like YNAP choose AWS SageMaker&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For experiment tracking we see open-source packages like TensorBoard, MLflow and Sacred &lt;a href="https://docs.neptune.ai/integrations/introduction.html?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-tools-libraries-frameworks-methodologies-ml-startups-roundup" rel="noopener noreferrer"&gt;(Neptune integrates with all of them)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;… and since those are our customers naturally they use neptune-notebooks for tracking explorations in jupyter notebooks and neptune for experiment tracking and organization of their machine learning projects. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FNeptune_ai-Infographic-1-1.png%3Fw%3D800%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fneptune.ai%2Fwp-content%2Fuploads%2FNeptune_ai-Infographic-1-1.png%3Fw%3D800%26ssl%3D1" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/jakub-czakon-2b797b69/" rel="noopener noreferrer"&gt;Jakub Czakon&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/tools-libraries-frameworks-methodologies-ml-startups-roundup?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-tools-libraries-frameworks-methodologies-ml-startups-roundup" rel="noopener noreferrer"&gt;Neptune blog&lt;/a&gt;. You can find more in-depth articles for machine learning practitioners there.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>startup</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to Do Data Exploration for Image Segmentation and Object Detection (Things I Had to Learn the Hard Way)</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Tue, 28 Jul 2020 12:29:05 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/how-to-do-data-exploration-for-image-segmentation-and-object-detection-things-i-had-to-learn-the-hard-way-1067</link>
      <guid>https://dev.to/kamil_k7k/how-to-do-data-exploration-for-image-segmentation-and-object-detection-things-i-had-to-learn-the-hard-way-1067</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/jakubcieslik/" rel="noopener noreferrer"&gt;Jakub Cieślik&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/data-exploration-for-image-segmentation-and-object-detection?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-data-exploration-for-image-segmentation-and-object-detection" rel="noopener noreferrer"&gt;Neptune blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I've been working with object detection and image segmentation problems for many years. An important realization I made is that people don't put the same amount of effort and emphasis on data exploration and results analysis as they would normally in any other non-image machine learning project.&lt;/p&gt;

&lt;p&gt;Why is it so?&lt;/p&gt;

&lt;p&gt;I believe there are two major reasons for it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;People don't understand object detection and image segmentation models in depth&lt;/strong&gt; and treat them as black boxes, in that case they don't even know what to look at and what the assumptions are.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It can be quite tedious from a technical&lt;/strong&gt; point of view as we don't have good image data exploration tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my opinion image datasets are not really an exception, understanding how to adjust the system to match our data is a critical step to success.&lt;/p&gt;

&lt;p&gt;In this article I will share with you how I approach data exploration for image segmentation and object detection problems. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why you should care about image and object dimensions,&lt;/li&gt;
&lt;li&gt;Why small objects can be problematic for many deep learning architectures,&lt;/li&gt;
&lt;li&gt;Why tackling class imbalances can be quite hard,&lt;/li&gt;
&lt;li&gt;Why a good visualization is worth a thousand metrics,&lt;/li&gt;
&lt;li&gt;The pitfalls of data augmentation.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  The need for data exploration for image segmentation and object detection
&lt;/h1&gt;

&lt;p&gt;Data exploration is key to a lot of machine learning processes. That said, when it comes to object detection and image segmentation datasets there is &lt;strong&gt;no straightforward way to systematically do data exploration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There are multiple things that distinguish working with regular image datasets from object and segmentation ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The label is strongly bound to the image. Suddenly you have to be careful of whatever you do to your images as it can break the image-label-mapping.&lt;/li&gt;
&lt;li&gt;Usually much more labels per image.&lt;/li&gt;
&lt;li&gt;Much more hyperparameters to tune (especially if you train on your custom datasets)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes evaluation, results exploration and error analysis much harder. You will also find that choosing a single performance measure for your system can be quite tricky - in that case manual exploration might still be a critical step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Quality and Common Problems
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;first thing you should do&lt;/strong&gt; when working on any machine learning problem (image segmentation, object detection included) &lt;strong&gt;is assessing quality and understanding your data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Common &lt;strong&gt;data problems when training&lt;/strong&gt; Object Detection and Image Segmentation models include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image dimensions and aspect ratios (especially dealing with extreme values)&lt;/li&gt;
&lt;li&gt;Labels composition - imbalances, bounding box sizes, aspect ratios (for instance a lot of small objects)&lt;/li&gt;
&lt;li&gt;Data preparation not suitable for your dataset.&lt;/li&gt;
&lt;li&gt;Modelling approach not aligned with the data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those will be &lt;strong&gt;especially important if you train on custom datasets that are significantly different from typical benchmark datasets such as COCO&lt;/strong&gt;. In the next chapters, I will show you how to spot the problems I mentioned and how to address them.&lt;/p&gt;

&lt;h3&gt;
  
  
  General Data Quality
&lt;/h3&gt;

&lt;p&gt;This one is simple and rather obvious, also this step would be the same for all image problems not just object detection or image segmentation. What we need to do here is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;get the general feel of a dataset and inspect it visually.&lt;/li&gt;
&lt;li&gt;make sure it's not corrupt and does not contain any obvious artifacts (for instance black only images)&lt;/li&gt;
&lt;li&gt;make sure that &lt;strong&gt;all&lt;/strong&gt; the files are readable - you don't want to find that out in the middle of your training.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My tip here is to visualize as many pictures as possible. There are multiple ways of doing this. Depending on the size of the datasets some might be more suitable than the others.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plot them in a jupyter notebook using matplotlib.&lt;/li&gt;
&lt;li&gt;Use dedicated tooling like google facets to explore image data (&lt;a href="https://pair-code.github.io/facets/" rel="noopener noreferrer"&gt;https://pair-code.github.io/facets/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use HTML rendering to visualize and explore&lt;/strong&gt; in a notebook.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm a huge fan of the last option, it works great in jupyter notebooks (even for thousands of pictures at the same time!) Try doing that with matplotlib. There is even more: you can install a hover-zoom extension that will allow you to zoom in into individual pictures to inspect them in high-resolution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2ACpgadKPbWUX7TF7PfkMhiw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2ACpgadKPbWUX7TF7PfkMhiw.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
Fig 1. 500 coco pictures visualized using html rendered thumbnails&lt;/p&gt;
&lt;h3&gt;
  
  
  Image sizes and aspect Ratios
&lt;/h3&gt;

&lt;p&gt;In the real world, datasets are unlikely to contain images of the same sizes and aspect ratios. &lt;strong&gt;Inspecting basic datasets&lt;/strong&gt; statistics such as aspect ratios, image widths and heights &lt;strong&gt;will help you make important decisions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you and should you? do destructive resizing ? (destructive means resizing that changes the AR)&lt;/li&gt;
&lt;li&gt;For non-destructive resizing what should be your desired output resolution and amount of padding?&lt;/li&gt;
&lt;li&gt;Deep Learning models might have hyper parameters you have to tune depending on the above (for instance anchor size and ratios) or they might even have strong requirements when it comes to minimum input image size.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://d2l.ai/chapter_computer-vision/anchor.html" rel="noopener noreferrer"&gt;Good resources about anchors&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A special case would be if your dataset consists of images that are really big (4K+), which is not that unusual in satellite imagery or some medical modalities. For most cutting edge models in 2020, you will not be able to fit even a single 4K image per (server grade) GPU due to memory constraints. In that case, you need to figure out what realistically will be useful for your DL algorithms.&lt;/p&gt;

&lt;p&gt;Two approaches that I saw are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training your model on image patches (randomly selected during training or extracted before training)&lt;/li&gt;
&lt;li&gt;resizing the entire dataset to avoid doing this every time you load your data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AYpRbjlwF9S9SRHyS" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AYpRbjlwF9S9SRHyS" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 2. Histogram of image aspect ratios in the coco dataset&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In general I would expect most datasets to fall into one of 3 categories.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Uniformly distributed where most of the images have the same dimensions&lt;/strong&gt; - here the only decision you will have to make is how much to resize (if at all) This will mainly depend on objects area, size and aspect ratios)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slightly bimodal distribution but most of the images are in the aspect ratio range of (0.7 ... 1.5)&lt;/strong&gt; similar to the COCO dataset. I believe other "natural-looking" datasets would follow a similar distribution - for those type of datasets you should be fine by going with a non-destructive resize -&amp;gt; Pad approach. Padding will be necessary but to a degree that is manageable and will not blow the size of the dataset too much.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dataset with a lot of extreme values&lt;/strong&gt; (very wide images mixed with very narrow ones) - this case is much more tricky and there are more advanced techniques to avoid excessive padding. You might consider sampling batches of images based on the aspect ratio. Remember that this can introduce a bias to your sampling process - so make sure its acceptable or not strong enough.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mmdetection framework supports this out of the box by implementing a GroupSampler that samples based on AR's&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AaQX-gxuLbxXHMZpx" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AaQX-gxuLbxXHMZpx" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 3 and 4. Example Images (resized and padded) with a extreme aspect ratios from the coco dataset&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Label (objects) sizes and dimensions
&lt;/h3&gt;

&lt;p&gt;Here we start looking at our targets (labels). Particularly &lt;strong&gt;we are interested in knowing how the sizes and aspect ratios are distributed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why is this important?&lt;/p&gt;

&lt;p&gt;Depending on your modelling approach &lt;strong&gt;most of the frameworks will have design limitations&lt;/strong&gt;. As I mentioned earlier, those models are designed to perform well on benchmark datasets. If for whatever reason your data is different, training them might be impossible. Let's have a look at a &lt;a href="https://github.com/facebookresearch/detectron2/blob/master/configs/Base-RetinaNet.yaml#L8" rel="noopener noreferrer"&gt;default config for Retinanet from detectron2&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ANCHOR_GENERATOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;SIZES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;!!&lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you can see there is, that &lt;strong&gt;for different feature maps the anchors we generate will have a certain size range:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;for instance, if your dataset contains only really big objects - it might be possible to simplify the model a lot,&lt;/li&gt;
&lt;li&gt;on the other side let's assume you have small images with small objects (for instance 10x10px) given this config it can happen you will not be able to train the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most important things to consider when it comes to box or mask dimensions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aspect ratios&lt;/li&gt;
&lt;li&gt;Size (Area)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AxqQm9g2b5CNOFsU9" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AxqQm9g2b5CNOFsU9" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 5. aspect ratio of bounding boxes in the coco dataset&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The tail of this distribution (fig. 3) is quite long. There will be instances with extreme aspect ratios. Depending on the use case and dataset it might be fine to ignore it or not, this should be further inspected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AaBtoIxqKAC_XmbtU" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AaBtoIxqKAC_XmbtU" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 6. Mean area of bounding box per category&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is especially true for anchor-based models (most of object detection / image segmentation models) where there is a step of matching ground truth labels with predefined anchor boxes (aka. Prior boxes).&lt;/p&gt;

&lt;p&gt;Remember that you control how those prior boxes are generated with hyperparameters like the number of boxes, their aspect ratio, and size. Not surprisingly you need to make sure those settings are aligned with your dataset distributions and expectations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AFWNx20uSDKvlazjn" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AFWNx20uSDKvlazjn" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 7. The Image shows &lt;a href="https://d2l.ai/chapter_computer-vision/anchor.html" rel="noopener noreferrer"&gt;anchor boxes&lt;/a&gt; at different scales and aspect ratios.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An important thing to &lt;strong&gt;keep in mind is that labels will be transformed together with the image.&lt;/strong&gt; So if you are making an image smaller during a preprocessing step the absolute size of the ROI's will also shrink.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you feel that object size might be an issue in your problem and you don't want to enlarge the images too much&lt;/strong&gt; (for instance to keep desired performance or memory footprint) you can &lt;strong&gt;try to solve it with a Crop -&amp;gt; Resize approach.&lt;/strong&gt; Keep in mind that this can be quite tricky (you need to handle cases what happens if you cut through a bounding box or segmentation mask)&lt;/p&gt;

&lt;p&gt;Big objects on the other hand are usually not problematic from a modelling perspective (although you still have to make sure that will be matched with anchors). The problem with them is more indirect, essentially &lt;strong&gt;the more big objects a class has the more likely it is that it will be underrepresented in the dataset&lt;/strong&gt;. Most of the time the average area of objects in a given class will be inversely proportional to the (label) count.&lt;/p&gt;
&lt;h3&gt;
  
  
  Partially labeled data
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;When creating and labeling an image detection dataset missing annotations are potentially a huge issue.&lt;/strong&gt; The worst scenario is when you have false negatives already in your ground truth. So essentially you did not annotate objects even though they are present in the dataset.&lt;/p&gt;

&lt;p&gt;In most of the modeling approaches, everything that was not labeled or did not match with an anchor is considered background. This means that &lt;strong&gt;it will generate conflicting signals that will hurt the learning process a LOT.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is also a reason why you can't really mix datasets with non-overlapping classes and train one model (there are some way to mix datasets though - for instance by soft labeling one dataset with a model trained on another one)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AANNWz_TVrivgN7eb" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AANNWz_TVrivgN7eb" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 8. Shows the problem of mixing datasets - notice for example that on the right image a person is not labeled. One way to solve this problem is to soft label the dataset with a model trained on the other one. &lt;a href="https://arxiv.org/pdf/1812.02611.pdf" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Imbalances
&lt;/h3&gt;

&lt;p&gt;Class imbalances can be a bit of a problem when it comes to object detection. Normally in image classification for example, one can easily oversample or downsample the dataset and control each class contribution to the loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AEBRaALlTewzmco2u" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AEBRaALlTewzmco2u" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 9. Object counts per class&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can imagine this is more challenging when you have co-occurring classes object detection dataset since you can't really drop some of the labels (because you would send mixed signals as to what the background is).&lt;/p&gt;

&lt;p&gt;In that case you end up having the same problem as shown in the partially labeled data paragraph. Once you start resampling on an image level you have to be aware of the fact that multiple classes will be upsampled at the same time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You may want to try other solutions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adding weights to the loss (making the contributions of some boxes or pixels higher)&lt;/li&gt;
&lt;li&gt;Preprocessing your data differently: for example you could do some custom cropping that rebalances the dataset on the object level&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Understanding augmentation and preprocessing sequences
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Preprocessing and data augmentation is an integral part of any computer vision system.&lt;/strong&gt; If you do it well you can gain a lot but if you screw up it can really cost you.&lt;/p&gt;

&lt;p&gt;Data augmentation is by far the most important and widely used regularization technique (in image segmentation / object detection ).&lt;/p&gt;

&lt;p&gt;Applying it to object detection and segmentation problems is more challenging than in simple image classification because some transformations (like rotation, or crop) need to be applied not only to the source image but also to the target (masks or bounding boxes). Common transformations that require a target transform include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Affine transformations,&lt;/li&gt;
&lt;li&gt;Cropping,&lt;/li&gt;
&lt;li&gt;Distortions,&lt;/li&gt;
&lt;li&gt;Scaling,&lt;/li&gt;
&lt;li&gt;Rotations&lt;/li&gt;
&lt;li&gt;and many more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is crucial to do data exploration on batches of augmented images and targets to avoid costly mistakes (dropping bounding boxes, etc).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Basic augmentations are a part of deep learning frameworks like PyTorch or Tensorflow but if you need more advanced functionalities you need to use one of the augmentation libraries available in the python ecosystem. My recommendations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/albumentations-team/albumentations" rel="noopener noreferrer"&gt;Albumentations&lt;/a&gt; (I'll use it in this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://imgaug.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;Imgaug&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/mdbloice/Augmentor" rel="noopener noreferrer"&gt;Augmentor&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The minimal preprocessing setup
&lt;/h2&gt;

&lt;p&gt;Whenever I'm building a new system I want to keep it very basic on the preprocessing and augmentation level to minimize the risk of introducing bugs early on. &lt;strong&gt;Basic principles I would recommend you to follow is:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disable augmentation&lt;/li&gt;
&lt;li&gt;Avoid destructive resizing&lt;/li&gt;
&lt;li&gt;Always inspect the outputs visually&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's continue our COOC example. From the previous steps we know that:the majority of our images have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;aspect ratios = width / height = 1.5&lt;/li&gt;
&lt;li&gt;the average avg_width is = 600 and avg_height = 500.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Setting the averages as our basic preprocessing resize values seems to be a reasonable thing to do (unless there is a strong requirement on the model side to have bigger pictures) for instance a resnet50 backbone model has a minimum size requirement of 32×32 (this is related to the number of downsampling layers)&lt;/p&gt;

&lt;p&gt;In Albumentations the basic setup implementation will look something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LongestMaxSize(avg_height) - this will rescale the image based on the longest side preserving the aspect ratio&lt;/li&gt;
&lt;li&gt;PadIfNeeded(avg_height, avg_width, border_mode='FILL', value=0)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AF-TsZa44JUjzwTUl" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AF-TsZa44JUjzwTUl" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 10&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FXp0bPjh77PzyMWkyIuvnBsp_polDV9thCR8tvhwXcKNcPPaVrX0ndh6nUtwSfoNFr4n7tfumQLr5JIMESj-f91wnjldR7WY1IiUvnWDRanLqjeGO44nsJ_rDiqcVUUWoRYhiCr22" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FXp0bPjh77PzyMWkyIuvnBsp_polDV9thCR8tvhwXcKNcPPaVrX0ndh6nUtwSfoNFr4n7tfumQLr5JIMESj-f91wnjldR7WY1IiUvnWDRanLqjeGO44nsJ_rDiqcVUUWoRYhiCr22" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 11&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Fig 10 and 11. MaxSize-&amp;gt;Pad output for two pictures with drastically different aspect ratios&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As you can see on figure 10 and 11 the preprocessing results in an image of 500×600 with reasonable 0-padding for both pictures.&lt;/p&gt;

&lt;p&gt;When you use padding there are many options in which you can fill the empty space. In the basic setup I suggest that you go with default constant 0 value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you experiment with more advanced methods like reflection padding always explore your augmentations visually.&lt;/strong&gt; Remember that you are running the risk of introducing false negatives especially in object detection problems (reflecting an object without having a label for it)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AHzE9VQ0vUXqZz1KB" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AHzE9VQ0vUXqZz1KB" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 12. Notice how reflection-padding creates false negative errors in our annotations. The cat's reflection (top of the picture) has no label!&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Augmentation - Rotations
&lt;/h3&gt;

&lt;p&gt;Rotations are powerful and useful augmentations but they should be used with caution. Have a look at fig 13. below which was generated using a Rotate(45)-&amp;gt;Resize-&amp;gt;Pad pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AyXdtvMq2JmL2Ingj" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AyXdtvMq2JmL2Ingj" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 13. Rotations can be harmful to your bounding box labels&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The problem is that if we use standard bounding boxes (without an angle parameter), covering a rotated object can be less efficient (box-area to object-area will increase). &lt;strong&gt;This happens during rotation augmentations and it can harm the data.&lt;/strong&gt; Notice that we have also introduced false positive labels in the top left corner. This is because we crop-rotated the image.&lt;/p&gt;

&lt;p&gt;My recommendation is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You might want to give up on those if you have a lot of objects with aspect ratios far from one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another thing you can consider is using 90,180, 270 degree non-cropping rotations (if they make sense) for your problem (they will not destroy any bounding boxes)&lt;/p&gt;
&lt;h3&gt;
  
  
  Augmentations - Key takeaways
&lt;/h3&gt;

&lt;p&gt;As you see, spatial transforms can be quite tricky and a lot of unexpected things can happen (especially for object detection problems).&lt;/p&gt;

&lt;p&gt;So if you decide to use those spatial augmentations make sure to do some data exploration and visually inspect your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Do you really need spatial augmentations? I believe that in many scenarios you will not need them and as usual keep things simpler and gradually add complexity.&lt;/p&gt;

&lt;p&gt;From my experience a good starting point (without spatial transforms) and for natural looking datasets (similar to coco) is the following pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;transforms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;LongestMaxSize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;HorizontalFlip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;PadIfNeeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;border_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;JpegCompression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quality_lower&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quality_upper&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;RandomBrightnessContrast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Cutout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_h_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_w_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course things like max_size or cutout sizes are arbitrary and have to be adjusted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AMnvctm1YC5A9r5Wq" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AMnvctm1YC5A9r5Wq" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fig 14. Augmentation results with cutout, jpeg compression and contrast/brightness adjustments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best Practice:&lt;/strong&gt;&lt;br&gt;
One thing I did not mention yet that I feel is pretty important: &lt;strong&gt;Always load the whole dataset (together with your preprocessing and augmentation pipeline).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data_loader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Two lines of code that will save you a lot of time.&lt;/strong&gt; First of all, you will understand what the overhead of the data loading is and if you see a clear performance bottleneck you might consider fixing it right away. More importantly, you will catch potential issues with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;corrupted files,&lt;/li&gt;
&lt;li&gt;labels that can't be transformed etc&lt;/li&gt;
&lt;li&gt;anything fishy that can interrupt training down the line.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results understanding
&lt;/h2&gt;

&lt;p&gt;Inspecting model results and performing error analysis can be a tricky process for those types of problems. Having one metric rarely tells you the whole story and if you do have one interpreting it can be a relatively hard task.&lt;/p&gt;

&lt;p&gt;Let's have a look at the official coco challenge and how the evaluation process looks there (all the results i will be showing are for a MASK R-CNN model with a resnet50 backbone).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A0tn5D-W79GPdXdWK" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A0tn5D-W79GPdXdWK" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 15. Coco evaluation output&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It returns the &lt;a href="https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173" rel="noopener noreferrer"&gt;AP&lt;/a&gt; and AR for various groups of observations partitioned by IOU (Intersection over Union of predictions and ground truth) and Area. So even the official COCO evaluation is not just one metric and there is a good reason for it.&lt;/p&gt;

&lt;p&gt;Lets focus on the IoU=0.50:0.95 notation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this means is the following: AP and AR is calculated as the average of precisions and recalls calculated for different IoU settings (from 0.5 to 0.95 with a 0.05 step)&lt;/strong&gt;. What we gain here is a more robust evaluation process, in such a case a model will score high if its pretty good at both (localizing and classifying).&lt;/p&gt;

&lt;p&gt;Of course, your problem and dataset might be different. Maybe you need an extremely accurate detector, in that case, choosing &lt;a href="mailto:AP@0.90IoU"&gt;AP@0.90IoU&lt;/a&gt; might be a good idea.&lt;/p&gt;

&lt;p&gt;The downside (of the coco eval tool) is that by default all the values are averaged for all the classes and all images. This might be fine in a competition-like setup where we want to evaluate the models on all the classes but &lt;strong&gt;in real-life situations where you train models on custom datasets (often with fewer classes) you really want to know how your model performs on a per-class basis&lt;/strong&gt;. Looking at per-class metrics is extremely valuable, as it might give you important insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;help you compose a new dataset better&lt;/li&gt;
&lt;li&gt;make better decisions when it comes to data augmentation, data sampling etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AyHvhOTpID76kxXsr" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AyHvhOTpID76kxXsr" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 16. Per class AP&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Figure 16. gives you a lot of useful information there are few things you might consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add more data to low performing classes&lt;/li&gt;
&lt;li&gt;For classes that score well, maybe you can consider downsampling them to speed up the training and maybe help with the performance of other less frequent classes.&lt;/li&gt;
&lt;li&gt;Spot any obvious correlations for instance classes with small objects performing poorly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Visualizing results
&lt;/h3&gt;

&lt;p&gt;Ok, so if looking at single metrics is not enough what should you do?&lt;/p&gt;

&lt;p&gt;I would definitely suggest spending some time on manual results exploration, with the combination of hard metrics from the previous analysis - visualizations will help you get the big picture.&lt;/p&gt;

&lt;p&gt;Since exploring predictions of image detection and image segmentation models can get quite messy I would suggest you do it step by step. On the gif below I show how this can be done using the coco inspector tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lh3.googleusercontent.com/Qbepbcs4jQzdyaFjnxEEj3Ostz2P76UTjgXGMOmjxpk77EvLHyGEJVrlJtKzsZRcAQLbZsRdGMJFih27VXMzF9iXiMt_6t0pODXGZF5fJ26Yma3M-c3urYa90ZTzMtD6NqAbJRGB" rel="noopener noreferrer"&gt;gif available here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the gif we can see how all the important information is visualized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Red masks - predictions&lt;/li&gt;
&lt;li&gt;Orange masks - overlap of predictions and ground truth masks&lt;/li&gt;
&lt;li&gt;Green masks - ground truth&lt;/li&gt;
&lt;li&gt;Dashed bounding boxes - false positives (predictions without a match)&lt;/li&gt;
&lt;li&gt;Orange boxes true positive&lt;/li&gt;
&lt;li&gt;Green boxes - ground truth&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Results understanding - per image scores
&lt;/h3&gt;

&lt;p&gt;By looking at the hard metrics and inspecting images visually we most likely have a pretty good idea of what's going on. But looking at results of random images (or grouped by class) is likely not an optimal way of doing this. &lt;strong&gt;If you want to really dive in and spot edge cases of your model, I suggest calculating per image metrics&lt;/strong&gt; (for instance AP or Recall).&lt;/p&gt;

&lt;p&gt;Below and example of an image I found by doing exactly that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ALEfTyeknmjLbGW_h" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ALEfTyeknmjLbGW_h" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 18. Image with a very low AP score&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the example above (Fig 18.) we can see two false positive stop sign predictions - from that we can deduce that our model understands what a stop sign is but not what other traffic signs are.&lt;/p&gt;

&lt;p&gt;Perhaps we can add new classes to our dataset or use our "stop sign detector" to label other traffic signs and then create a new "traffic sign" label to overcome this problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ASPzPxcmCWvQVWbuU" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ASPzPxcmCWvQVWbuU" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 19. Example of an image with a good score &amp;gt; 0.5 AP&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sometimes we will also learn that our model is doing better that it would seem from the scores alone.&lt;/strong&gt; That's also useful information, for instance in the example above our model detected a keyboard on the laptop but this is actually not labeled in the original dataset.&lt;/p&gt;
&lt;h2&gt;
  
  
  COCO format
&lt;/h2&gt;

&lt;p&gt;The way a coco dataset is organized can be a bit intimidating at first.&lt;/p&gt;

&lt;p&gt;It consists of a set of dictionaries mapping from one to another. It's also intended to be used together with the pycocotools / cocotools library that builds a rather confusing API on top of the dataset metadata file.&lt;/p&gt;

&lt;p&gt;Nonetheless, &lt;strong&gt;the coco dataset (and the coco format) became a standard way of organizing object detection and image segmentation datasets.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In COCO we follow the &lt;strong&gt;xywh&lt;/strong&gt; convention for bounding box encodings or as I like to call it tlwh: &lt;strong&gt;(top-left-width-height)&lt;/strong&gt; that way you can not confuse it with for instance cwh: &lt;strong&gt;(center-point, w, h)&lt;/strong&gt;. Mask labels (segmentations) are run-length encoded &lt;a href="https://www.kaggle.com/c/data-science-bowl-2018/overview/evaluation" rel="noopener noreferrer"&gt;(RLE explanation)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ALuK6g3Oqw1ttGtou" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ALuK6g3Oqw1ttGtou" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 20. The coco dataset annotations format&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There are still very important advantages of having a widely adopted standard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Labeling tools and services export and import COCO-like datasets&lt;/li&gt;
&lt;li&gt;Evaluation and scoring code (used for the coco competition) is pretty well optimized and battle tested.&lt;/li&gt;
&lt;li&gt;Multiple open source datasets follow it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the previous paragraph, I used the COCO eval functionality which is another benefit of following the COCO standard. To take advantage of that you need to format your predictions in the same way as your coco dataset is constructed- then calculating metrics is as simple as calling: COCOeval(gt_dataset, pred_dataset)&lt;/p&gt;


&lt;h1&gt;
  
  
  COCO dataset explorer
&lt;/h1&gt;

&lt;p&gt;In order to streamline the process of data and results exploration (especially for object detection) I wrote a tool that operates on COCO datasets.&lt;/p&gt;

&lt;p&gt;Essentially you provide it with the ground truth dataset and the predictions dataset (optionally) and it will do the rest for you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calculate most of the metrics I presented in this post&lt;/li&gt;
&lt;li&gt;Easily visualize the datasets ground truths and predictions&lt;/li&gt;
&lt;li&gt;Inspect coco metrics, per class AP metrics&lt;/li&gt;
&lt;li&gt;Inspect per-image scores&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2Ap-IhXKE8GGdSWpSaiMf30g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2Ap-IhXKE8GGdSWpSaiMf30g.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To use COCO dataset explorer tool you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clone the project &lt;a href="https://github.com/i008/COCO-dataset-explorer" rel="noopener noreferrer"&gt;repository&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/i008/COCO-dataset-explorer.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Download example data I used for the examples or use your own data in the COCO format:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://drive.google.com/file/d/1wxIagenNdCt_qphEe8gZYK7H2_to9QXl/view" rel="noopener noreferrer"&gt;Example COCO format dataset with predictions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you downloaded the example data you will need to extract it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; coco_data.tar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should have the following directory structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;COCO-dataset-explorer
    |coco_data
        |images
            |000000000139.jpg
            |000000000285.jpg
            |000000000632.jpg
            |...
        |ground_truth_annotations.json
        |predictions.json
|coco_explorer.py
|Dockerfile
|environment.yml
|...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*Set up the environment with all the dependencies&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conda &lt;span class="nb"&gt;env &lt;/span&gt;update&lt;span class="p"&gt;;&lt;/span&gt;
conda activate cocoexplorer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Run streamlit app specifying a file with ground truth and predictions in the COCO format and the image directory:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;streamlit run coco_explorer.py &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--coco_train&lt;/span&gt; coco_data/ground_truth_annotations.json &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--coco_predictions&lt;/span&gt; coco_data/predictions.json  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--images_path&lt;/span&gt; coco_data/images/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; You can also run this with docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8501:8501 &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/coco_data:/coco_data i008/coco_explorer  &lt;span class="se"&gt;\&lt;/span&gt;
    streamlit run  coco_explorer.py &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--coco_train&lt;/span&gt; /coco_data/ground_truth_annotations.json &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--coco_predictions&lt;/span&gt; /coco_data/predictions.json  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--images_path&lt;/span&gt; /coco_data/images/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;explore the dataset in the browser. By default, it will run on &lt;a href="http://localhost:8501/" rel="noopener noreferrer"&gt;http://localhost:8501/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Final words
&lt;/h1&gt;

&lt;p&gt;I hope that with this post I convinced you that data exploration in object detection and image segmentation is as important as in any other branch of machine learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'm confident that the effort we make at this stage of the project pays off in the long run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The knowledge we gather allows us to make better-informed modeling decisions, avoid multiple training pitfalls and gives you more confidence in the training process, and the predictions your model produces.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/jakubcieslik/" rel="noopener noreferrer"&gt;Jakub Cieślik&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/data-exploration-for-image-segmentation-and-object-detection?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-data-exploration-for-image-segmentation-and-object-detection" rel="noopener noreferrer"&gt;Neptune blog&lt;/a&gt;. You can find more in-depth articles for machine learning practitioners there.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>computerscience</category>
      <category>python</category>
    </item>
    <item>
      <title>The Best NLP/NLU Papers from the ICLR 2020 Conference</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Fri, 24 Jul 2020 16:17:23 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/the-best-nlp-nlu-papers-from-the-iclr-2020-conference-3ipg</link>
      <guid>https://dev.to/kamil_k7k/the-best-nlp-nlu-papers-from-the-iclr-2020-conference-3ipg</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally posted on the &lt;a href="https://neptune.ai/blog/iclr-2020-nlp-nlu?utm_source=hashnode&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-iclr-2020-nlp-nlu" rel="noopener noreferrer"&gt;Neptune blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;The International Conference on Learning Representations &lt;strong&gt;(ICLR)&lt;/strong&gt; took place last week, and I had a pleasure to participate in it. ICLR is an event dedicated to &lt;strong&gt;research on all aspects of representation learning, commonly known as deep learning&lt;/strong&gt;. This year the event was a bit different as it went virtual due to the coronavirus pandemic. However, the online format didn't change the great atmosphere of the event. It was engaging and interactive and attracted 5600 attendees (twice as many as last year). If you're interested in what organizers think about the unusual online arrangement of the conference, you can read about it here.&lt;/p&gt;

&lt;p&gt;Over 1300 speakers presented many interesting papers, so I decided to create a series of blog posts summarizing the best of them in four main areas: deep learning, reinforcement learning, generative modeling, NLP/NLU.&lt;/p&gt;

&lt;p&gt;This is the last post of the series, in which I want to share &lt;strong&gt;10 best Natural Language Processing/Understanding contributions from the ICLR&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ALBERT: A Lite BERT for Self-supervised Learning of Language Representations&lt;/li&gt;
&lt;li&gt;A Mutual Information Maximization Perspective of Language Representation Learning&lt;/li&gt;
&lt;li&gt;Mogrifier LSTM&lt;/li&gt;
&lt;li&gt;High Fidelity Speech Synthesis with Adversarial Networks&lt;/li&gt;
&lt;li&gt;Reformer: The Efficient Transformer&lt;/li&gt;
&lt;li&gt;DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling&lt;/li&gt;
&lt;li&gt;Depth-Adaptive Transformer&lt;/li&gt;
&lt;li&gt;On Identifiability in Transformers&lt;/li&gt;
&lt;li&gt;Mirror-Generative Neural Machine Translation&lt;/li&gt;
&lt;li&gt;FreeLB: Enhanced Adversarial Training for Natural Language Understanding&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  Best Natural Language Processing/Understanding Papers
&lt;/h1&gt;

&lt;h3&gt;
  
  
  1. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
&lt;/h3&gt;

&lt;p&gt;A new pretraining method that establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=H1eA7AEtvS" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; | &lt;a href="https://github.com/google-research/ALBERT" rel="noopener noreferrer"&gt;Code&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AchjpPFJWawOzxmXZ" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AchjpPFJWawOzxmXZ" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
The L2 distances and cosine similarity (in terms of degree) of the input and output embedding of each layer for BERT-large and ALBERT-large.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2A2UyoFHZCX4RIx7Ik" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2A2UyoFHZCX4RIx7Ik" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Zhenzhong Lan&lt;br&gt;
| &lt;a href="https://www.linkedin.com/in/zhenzhong-lan/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  2. A Mutual Information Maximization Perspective of Language Representation Learning
&lt;/h3&gt;

&lt;p&gt;Word representation is a common task in NLP. Here, authors formulate new frameworks that combine classical word embedding techniques (like Skip-gram) with more modern approaches based on contextual embedding (BERT, XLNet).&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=Syx79eBKwr" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2Ar7dNdOiu73fGJlLH" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2Ar7dNdOiu73fGJlLH" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
The left plot shows F1 scores of BERT-NCE and INFOWORD as we increase the percentage of training examples on SQuAD (dev). The right plot shows F1 scores of INFOWORD on SQuAD (dev) as a function of λDIM.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2ADivJX3GGqbeAf-fC" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2ADivJX3GGqbeAf-fC" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Lingpeng Kong&lt;br&gt;
| &lt;a href="https://twitter.com/ikekong?lang=en" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://ikekonglp.github.io/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="(https://ikekonglp.github.io/)"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Mogrifier LSTM
&lt;/h3&gt;

&lt;p&gt;An LSTM extension with state-of-the-art language modelling results.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=SJe5P6EYvS" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2Az0B2sm3LF_SMEezI" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2Az0B2sm3LF_SMEezI" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
Mogrifier with 5 rounds of updates. The previous state h0 = hprev is transformed linearly (dashed arrows), fed through a sigmoid and gates x −1 = x in an elementwise manner producing x1 . Conversely, the linearly transformed x1 gates h 0 and produces h2 . After a number of repetitions of this mutual gating cycle, the last values of h∗ and x∗ sequences are fed to an LSTM cell. The prev subscript of h is omitted to reduce clutter.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AGijr_t1eTrT6quVZ" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AGijr_t1eTrT6quVZ" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Gábor Melis&lt;br&gt;
&lt;a href="https://twitter.com/gabormelis" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/melisgabor/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/melisgl" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="http://quotenil.com/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  4. High Fidelity Speech Synthesis with Adversarial Networks
&lt;/h3&gt;

&lt;p&gt;We introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech, which achieves Mean Opinion Score (MOS) 4.2.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=r1gfQgSFDr" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; | &lt;a href="https://github.com/mbinkowski/DeepSpeechDistances" rel="noopener noreferrer"&gt;Code&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AfQ7K26DNWdOL7Tm-" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AfQ7K26DNWdOL7Tm-" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
Residual blocks used in the model. Convolutional layers have the same number of input and output channels and no dilation unless stated otherwise. h - hidden layer representation, l - linguistic features, z - noise vector, m - channel multiplier, m = 2 for downsampling blocks (i.e. if their downsample factor is greater than 1) and m = 1 otherwise, M- G's input channels, M = 2N in blocks 3, 6, 7, and M = N otherwise; size refers to kernel size.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F1%2AymJxmlBbiuvd4p2rgRT0Nw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F1%2AymJxmlBbiuvd4p2rgRT0Nw.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Mikołaj Bińkowski&lt;br&gt;
| &lt;a href="https://www.linkedin.com/in/mikolaj-binkowski/?originalSubdomain=uk" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/mbinkowski" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Reformer: The Efficient Transformer
&lt;/h3&gt;

&lt;p&gt;Efficient Transformer with locality-sensitive hashing and reversible layers.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=rkgNKkHtvB" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; | &lt;a href="https://github.com/google/trax/tree/master/trax/models/reformer" rel="noopener noreferrer"&gt;Code&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A06wKAXu043gtGfXU" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A06wKAXu043gtGfXU" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An angular locality sensitive hash uses random rotations of spherically projected points to establish buckets by an argmax over signed axes projections. In this highly simplified 2D depiction, two points x and y are unlikely to share the same hash buckets (above) for the three different angular hashes unless their spherical projections are close to one another (below).&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Main authors&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AjmzbXWOxRg0k1VOw" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AjmzbXWOxRg0k1VOw" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nikita Kitaev&lt;br&gt;
| &lt;a href="https://www.linkedin.com/in/nikitakitaev/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/nikitakit" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://kitaev.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2Aw1zDSuhVvz7FET4n" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2Aw1zDSuhVvz7FET4n" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Łukasz Kaiser&lt;br&gt;
| &lt;a href="https://twitter.com/lukaszkaiser?lang=en" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/lukaszkaiser/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/lukaszkaiser" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  6. DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling
&lt;/h3&gt;

&lt;p&gt;DeFINE uses a deep, hierarchical, sparse network with new skip connections to learn better word embeddings efficiently.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=rJeXS04FPH" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A3if_ghSXQ-Or99FN" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A3if_ghSXQ-Or99FN" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
With DeFINE, Transformer-XL learns input (embedding) and output (classification) representations in low n-dimensional space rather than high m-dimensional space, thus reducing parameters significantly while having a minimal impact on the performance.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F1%2A_YYFhHPem3SvDfKNz8zUzA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F1%2A_YYFhHPem3SvDfKNz8zUzA.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Sachin Mehta&lt;br&gt;
| &lt;a href="https://twitter.com/sacmehtauw" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/sachinmehtangb/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/sacmehta" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://sacmehta.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  7. Depth-Adaptive Transformer
&lt;/h3&gt;

&lt;p&gt;Sequence model that dynamically adjusts the amount of computation for each input.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=SJg7KhVKPH" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AB6ozj_uaztXK5Lmh" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AB6ozj_uaztXK5Lmh" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Training regimes for decoder networks able to emit outputs at any layer. Aligned training optimizes all output classifiers Cn simultaneously assuming all previous hidden states for the current layer are available. Mixed training samples M paths of random exits at which the model is assumed to have exited; missing previous hidden states are copied from below.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AaeJhXYq_e--ECoKH" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AaeJhXYq_e--ECoKH" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Maha Elbayad&lt;br&gt;
| &lt;a href="https://twitter.com/melbayad" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/elbayadm/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/elbayadm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="http://elbayadm.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  8. On Identifiability in Transformers
&lt;/h3&gt;

&lt;p&gt;We investigate the identifiability and interpretability of attention distributions and tokens within contextual embeddings in the self-attention based BERT model.&lt;br&gt;
&lt;em&gt;(TL;DR, from &lt;a href="https://openreview.net/group?id=ICLR.cc/2020/Conference" rel="noopener noreferrer"&gt;OpenReview.net&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=BJg1f6EFDB" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AVO5UxKp6p02Jr3bD" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2AVO5UxKp6p02Jr3bD" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
a) Each point represents the Pearson correlation coefficient of effective attention and raw attention as a function of token length. (b) Raw attention vs. (c) effective attention, where each point represents the average (effective) attention of a given head to a token type.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2A3Tq8rFaA96Buo4r0" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2A3Tq8rFaA96Buo4r0" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Gino Brunner&lt;br&gt;
| &lt;a href="https://twitter.com/ginozkz" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/gino-brunner-7a3a6582/?originalSubdomain=ch" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://disco.ethz.ch/members/brunnegi" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  9. Mirror-Generative Neural Machine Translation
&lt;/h3&gt;

&lt;p&gt;Translation approaches known as Neural Machine Translation models (NMT), depend on availability of large corpus, constructed as a language pair. Here, a new method is proposed for translations in both directions using generative neural machine translation.&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=HkxQRTNYPH" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A3L5JLcbCNSBF6xLU" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2A3L5JLcbCNSBF6xLU" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graphical model of MGNMT.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AufVZqYFwXaAB4bOX" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F0%2AufVZqYFwXaAB4bOX" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Zaixiang Zheng&lt;br&gt;
| &lt;a href="https://twitter.com/zaixiang93" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://zhengzx-nlp.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h3&gt;
  
  
  10. FreeLB: Enhanced Adversarial Training for Natural Language Understanding
&lt;/h3&gt;

&lt;p&gt;Here, the authors propose a new algorithm, called FreeLB that formulate a novel approach to the adversarial training of the language model is proposed.&lt;/p&gt;

&lt;p&gt;| &lt;a href="https://openreview.net/forum?id=BygzbyHFvB" rel="noopener noreferrer"&gt;Paper&lt;/a&gt; | &lt;a href="https://github.com/zhuchen03/FreeLB" rel="noopener noreferrer"&gt;Code&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ABYo9D2wKf_Pdl3_Y" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F0%2ABYo9D2wKf_Pdl3_Y" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
algorithm's pseudo-code.&lt;br&gt;
&lt;em&gt;(source: Fig 1, from the paper)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F1%2A7XGs1UhLdiOuQ6Lc93Uubg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F450%2F1%2A7XGs1UhLdiOuQ6Lc93Uubg.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First author: Chen Zhu&lt;br&gt;
| &lt;a href="https://www.linkedin.com/in/zhuchen917/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/zhuchen03" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="http://www.cs.umd.edu/~chenzhu/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; |&lt;/p&gt;




&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Depth and breadth of the ICLR publications is quite inspiring. This post focuses on the "Natural Language Processing" topic, which is one of the main areas discussed during the conference. According to &lt;a href="https://www.analyticsvidhya.com/blog/2020/05/key-takeaways-iclr-2020/" rel="noopener noreferrer"&gt;this analysis&lt;/a&gt;, these areas include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deep learning&lt;/li&gt;
&lt;li&gt;Reinforcement learning&lt;/li&gt;
&lt;li&gt;Generative models&lt;/li&gt;
&lt;li&gt;Natural Language Processing/Understanding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In order to create a more complete overview of the top papers at ICLR, we have built a series of posts, each focused on one topic mentioned above. This is the last one, so you may want to check the others for a more complete overview.&lt;/p&gt;

&lt;p&gt;We would be happy to extend our list, so feel free to share other interesting NLP/NLU papers with us.&lt;/p&gt;

&lt;p&gt;In the meantime - happy reading!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally posted on the &lt;a href="https://neptune.ai/blog" rel="noopener noreferrer"&gt;Neptune blog&lt;/a&gt; where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Tabular Data Binary Classification: All Tips and Tricks from 5 Kaggle Competitions</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Wed, 22 Jul 2020 09:37:19 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/tabular-data-binary-classification-all-tips-and-tricks-from-5-kaggle-competitions-1aim</link>
      <guid>https://dev.to/kamil_k7k/tabular-data-binary-classification-all-tips-and-tricks-from-5-kaggle-competitions-1aim</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/shahules/"&gt;Shahul Es&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/tabular-data-binary-classification-tips-and-tricks-from-5-kaggle-competitions?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-tabular-data-binary-classification-tips-and-tricks-from-5-kaggle-competitions"&gt;Neptune blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;In this article, I will discuss some great tips and tricks to improve the performance of your structured data binary classification model. These tricks are obtained from solutions of some of Kaggle’s top tabular data competitions. Without much lag, let’s begin.&lt;/p&gt;

&lt;p&gt;These are the five competitions that I have gone through to create this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/home-credit-default-risk/"&gt;Home credit default risk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/santander-customer-transaction-prediction/notebooks"&gt;Santander Customer Transaction Prediction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/vsb-power-line-fault-detection/overview/evaluation"&gt;VSB Power Line Fault Detection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/microsoft-malware-prediction/overview"&gt;Microsoft Malware Prediction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/c/ieee-fraud-detection/overview/evaluation/"&gt;IEEE-CIS Fraud Detection&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Dealing with larger datasets
&lt;/h1&gt;

&lt;p&gt;One issue you might face in any machine learning competition is the size of your data set. If the size of your data is large, that is 3GB + for kaggle kernels and more basic laptops you could find it difficult to load and process with limited resources. Here is the link to some of the articles and kernels that I have found useful in such situations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster &lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/59575"&gt;data loading with pandas&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Data compression techniques to &lt;a href="https://www.kaggle.com/nickycan/compress-70-of-dataset"&gt;reduce the size of data by 70%&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Optimize the memory by &lt;a href="https://www.kaggle.com/shrutimechlearn/large-data-loading-trick-with-ms-malware-data"&gt;reducing the size of some attributes&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt;Use open-source libraries such as &lt;a href="https://www.kaggle.com/yuliagm/how-to-work-with-big-datasets-on-16g-ram-dask"&gt;Dask to read and manipulate the data&lt;/a&gt;, it performs parallel computing and saves up memory space. &lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/rapidsai/cudf"&gt;cudf&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Convert data to &lt;a href="https://arrow.apache.org/docs/python/parquet.html"&gt;parquet&lt;/a&gt; format.&lt;/li&gt;
&lt;li&gt;Converting data to &lt;a href="https://medium.com/@snehotosh.banerjee/feather-a-fast-on-disk-format-for-r-and-python-data-frames-de33d0516b03"&gt;feather&lt;/a&gt; format.&lt;/li&gt;
&lt;li&gt;Reducing memory usage for &lt;a href="https://www.kaggle.com/mjbahmani/reducing-memory-size-for-ieee"&gt;optimizing RAM&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Data exploration
&lt;/h1&gt;

&lt;p&gt;Data exploration always helps to better understand the data and gain insights from it. Before starting to develop machine learning models, top competitors always read/do a lot of exploratory data analysis for the data. This helps in feature engineering and cleaning of the data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EDA for Microsoft &lt;a href="https://www.kaggle.com/youhanlee/my-eda-i-want-to-see-all"&gt;malware detection&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Time Series &lt;a href="https://www.kaggle.com/cdeotte/time-split-validation-malware-0-68"&gt;EDA for malware detection&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Complete &lt;a href="https://www.kaggle.com/codename007/home-credit-complete-eda-feature-importance"&gt;EDA for home credit loan prediction&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Complete &lt;a href="https://www.kaggle.com/gpreda/santander-eda-and-prediction"&gt;EDA for Santader prediction&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;EDA for &lt;a href="https://www.kaggle.com/go1dfish/basic-eda"&gt;VSB Power Line Fault Detection&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Data preparation
&lt;/h1&gt;

&lt;p&gt;After data exploration, the first thing to do is to use those insights to prepare the data. To tackle issues like class imbalance, encoding categorical data, etc. Let’s see the methods used to do it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Methods to &lt;a href="https://www.kaggle.com/shahules/tackling-class-imbalance"&gt;tackle class imbalance&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Data augmentation by &lt;a href="https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/"&gt;Synthetic Minority Oversampling Technique&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/"&gt;Fast inplace shuffle for augmentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Finding &lt;a href="https://www.kaggle.com/yag320/list-of-fake-samples-and-public-private-lb-split"&gt;synthetic samples in the dataset&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/jackvial/dwt-signal-denoising"&gt;Signal denoising&lt;/a&gt; used in signal processing competitions.&lt;/li&gt;
&lt;li&gt;Finding &lt;a href="https://www.kaggle.com/jpmiller/patterns-of-missing-data"&gt;patterns of missing data&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Methods to handle &lt;a href="https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779"&gt;missing data&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;An overview of various &lt;a href="https://www.kaggle.com/shahules/an-overview-of-encoding-techniques"&gt;encoding techniques for categorical data&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Building &lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/64598"&gt;model to predict missing values&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Random &lt;a href="https://www.kaggle.com/brandenkmurray/randomly-shuffled-data-also-work"&gt;shuffling of data&lt;/a&gt; to create new synthetic training set.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Feature engineering
&lt;/h1&gt;

&lt;p&gt;Next, you can check the most popular feature and feature engineering techniques used in these top kaggle competitions. The feature engineering part varies from problem to problem depending on the domain.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Target &lt;a href="https://medium.com/@pouryaayria/k-fold-target-encoding-dfe9a594874b"&gt;encoding cross validation&lt;/a&gt; for better encoding.&lt;/li&gt;
&lt;li&gt;Entity embedding to &lt;a href="https://www.kaggle.com/abhishek/entity-embeddings-to-handle-categories"&gt;handle categories&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt;Encoding &lt;a href="https://www.kaggle.com/avanwyk/encoding-cyclical-features-for-deep-learning"&gt;cyclic features for deep learning&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Manual &lt;a href="https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering"&gt;feature engineering methods&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Automated feature engineering techniques &lt;a href="https://www.kaggle.com/willkoehrsen/automated-feature-engineering-basics"&gt;using featuretools&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Top hard crafted features used in &lt;a href="https://www.kaggle.com/sanderf/7th-place-solution-microsoft-malware-prediction"&gt;microsoft malware detection&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Denoising NN for &lt;a href="https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798"&gt;feature extraction&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Feature engineering &lt;a href="https://www.kaggle.com/cdeotte/rapids-feature-engineering-fraud-0-96/"&gt;using RAPIDS framework&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Things to remember while processing &lt;a href="https://www.kaggle.com/c/ieee-fraud-detection/discussion/108575"&gt;features using LGBM&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/64593"&gt;Lag features and moving averages&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-6be91734f567"&gt;Principal component analysis&lt;/a&gt; for dimensionality reduction.&lt;/li&gt;
&lt;li&gt;LDA for &lt;a href="https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-6be91734f567"&gt;dimensionality reduction&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Best hand crafted LGBM features for &lt;a href="https://www.kaggle.com/c/microsoft-malware-prediction/discussion/85157"&gt;microsoft malware detection&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Generating &lt;a href="https://www.kaggle.com/philippsinger/frequency-features-without-test-data-information"&gt;frequency features&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Dropping variables with &lt;a href="https://www.kaggle.com/bogorodvo/lightgbm-baseline-model-using-sparse-matrix"&gt;different train and test distribution&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/64593"&gt;Aggregate time series features&lt;/a&gt; for home credit competition.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/64593"&gt;Time Series features&lt;/a&gt; used in home credit default risk.&lt;/li&gt;
&lt;li&gt;Scale,Standardize and &lt;a href="https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02"&gt;normalize with sklearn&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Handcrafted features for &lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/57750"&gt;Home default risk competition&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Handcrafted  &lt;a href="https://www.kaggle.com/c/santander-customer-transaction-prediction/discussion/89070"&gt;features used in Santander Transaction Prediction.&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Feature selection
&lt;/h1&gt;

&lt;p&gt;After generating many features from your data, you need to decide which all features to use in your model to get the maximum performance out of your model. This step also includes identifying the impact each feature is having on your model. Let’s see some of the most popular feature selection methods.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Six ways to do &lt;a href="https://www.kaggle.com/sz8416/6-ways-for-feature-selection"&gt;features selection using sklearn&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/ieee-fraud-detection/discussion/107877#latest-635386"&gt;Permutation feature importance&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/tunguz/adversarial-ieee/"&gt;Adversarial feature validation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Feature selection using &lt;a href="https://www.kaggle.com/ogrellier/feature-selection-with-null-importances"&gt;null importance&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Tree explainer using &lt;a href="https://github.com/slundberg/shap"&gt;SHAP&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;DeepNN explainer using &lt;a href="https://github.com/slundberg/shap"&gt;SHAP&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Modeling
&lt;/h1&gt;

&lt;p&gt;After handcrafting and selecting your features, you should choose the right Machine learning algorithm to make your prediction. These are the collection of some of the most used ML models in structured data classification challenges.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html"&gt;Random forest classifier&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;XGBoost : &lt;a href="https://xgboost.readthedocs.io/en/latest/"&gt;Gradient boosted decision trees&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://lightgbm.readthedocs.io/en/latest/"&gt;LightGBM&lt;/a&gt; for distributed and faster training.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://catboost.ai/docs/concepts/about.html"&gt;CatBoost&lt;/a&gt; to handle categorical data.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/cdeotte/modified-naive-bayes-santander-0-899"&gt;Naive bayes&lt;/a&gt; classifier.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/blackblitz/gaussian-naive-bayes"&gt;Gaussian naive bayes&lt;/a&gt; model.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/nawidsayed/lightgbm-and-cnn-3rd-place-solution/notebook"&gt;LGBM + CNN  model&lt;/a&gt; used in 3rd place solution of Santander Customer Transaction Prediction&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/mathormad/knowledge-distillation-with-nn-rankgauss"&gt;Knowledge distillation&lt;/a&gt; in Neural Network.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/@dhirajreddy13/factorization-machines-and-follow-the-regression-leader-for-dummies-7657652dce69"&gt;Follow the regularized leader&lt;/a&gt; method.&lt;/li&gt;
&lt;li&gt;Comparison between &lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/60921"&gt;LGB boosting methods&lt;/a&gt; (goss, gbdt and dart).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/abazdyrev/keras-nn-focal-loss-experiments"&gt;NN + focal loss experiment&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Keras &lt;a href="https://www.kaggle.com/ryches/keras-nn-starter-w-time-series-split"&gt;NN with timeseries splitter&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;5th place &lt;a href="https://www.kaggle.com/c/santander-customer-transaction-prediction/discussion/88929"&gt;NN architecture with code&lt;/a&gt; for Santander Transaction prediction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Hyperparameter tuning
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;LGBM &lt;a href="https://www.kaggle.com/mlisovyi/lightgbm-hyperparameter-optimisation-lb-0-761"&gt;hyperparameter tuning&lt;/a&gt; methods.&lt;/li&gt;
&lt;li&gt;Automated &lt;a href="https://www.kaggle.com/willkoehrsen/automated-model-tuning"&gt;model tuning&lt;/a&gt; methods.&lt;/li&gt;
&lt;li&gt;Parametre tuning with &lt;a href="https://www.kaggle.com/bigironsphere/parameter-tuning-in-one-function-with-hyperopt"&gt;hyper plot&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://krasserm.github.io/2018/03/21/bayesian-optimization/"&gt;Bayesian optimization&lt;/a&gt; for hyperparameter tuning.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/nicapotato/gpyopt-hyperparameter-optimisation-gpu-lgbm"&gt;Gpyopt Hyperparameter Optimisation&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Evaluation
&lt;/h1&gt;

&lt;p&gt;Choosing a suitable validation strategy is very important to avoid huge shake-ups or poor performance of the model in the private test set. &lt;/p&gt;

&lt;p&gt;The traditional 80:20 split wouldn’t work for many cases. Cross-validation works in most cases over the traditional single train-validation split to estimate the model performance. &lt;/p&gt;

&lt;p&gt;There are different variations of KFold cross-validation such as group k-fold that should be chosen accordingly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/k-fold-cross-validation/"&gt;K-fold cross-validation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html"&gt;Stratified KFold cross-validation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html"&gt;Group KFold&lt;/a&gt;&lt;/li&gt;
&lt;li&gt; &lt;a href="https://neptune.ai/blog/tabular-data-binary-classification-tips-and-tricks-from-5-kaggle-competitions"&gt;Adversarial validation&lt;/a&gt; to check if train and test distributions are similar or not.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/cdeotte/time-split-validation-malware-0-68"&gt;Time Series split&lt;/a&gt; validation.&lt;/li&gt;
&lt;li&gt;Extensive &lt;a href="https://www.kaggle.com/mpearmain/extended-timeseriessplitter"&gt;time series splitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;There are various metrics that you can use to evaluate the performance of your tabular models. A bunch of useful &lt;a href="https://neptune.ai/blog/evaluation-metrics-binary-classification?utm_source=hackernoon&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-tabular-data-binary-classification-tips-and-tricks-from-5-kaggle-competitions"&gt;classification metrics are listed and explained here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Other training tricks
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/vinhnguyen/gpu-acceleration-for-lightgbm"&gt;GPU acceleration&lt;/a&gt; for LGBM.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/89498"&gt;Use the GPU efficiently&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/89498"&gt;Free keras memory&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/save-load-keras-deep-learning-models/"&gt;Save and load models&lt;/a&gt; to save runtime and memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Ensemble
&lt;/h1&gt;

&lt;p&gt;If you’re in the competing environment one won’t get to the top of the leaderboard without ensembling. Selecting the appropriate ensembling/stacking method is very important to get the maximum performance out of your models. &lt;/p&gt;

&lt;p&gt;Let’s see some of the popular ensembling techniques used in kaggle competitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/weighted-average-ensemble-for-deep-learning-neural-networks/"&gt;Weighted average ensemble&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/"&gt;Stacked generalization ensemble&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52224"&gt;Out of folds predictions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/suicaokhoailang/blending-with-linear-regression-0-688-lb"&gt;Blending with linear regression&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/optuna/optuna"&gt;optuna&lt;/a&gt; to determine blending weights.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/data-design/reaching-the-depths-of-power-geometric-ensembling-when-targeting-the-auc-metric-2f356ea3250e"&gt;Power average ensemble&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/100661"&gt;Power 3.5 blending strategy&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/c/microsoft-malware-prediction/discussion/80368#478088"&gt;Blending diverse models&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Different &lt;a href="https://www.kaggle.com/stocks/stacking-higher-and-higher"&gt;stacking approaches&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/lucaskg/20th-solution-part-2-auc-weight-optimization"&gt;AUC weight optimization&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/paulorzp/gmean-of-low-correlation-lb-0-952x"&gt;Geometric mean&lt;/a&gt; for low correlation predictions.&lt;/li&gt;
&lt;li&gt;Weighted &lt;a href="https://www.kaggle.com/shaz13/magic-of-weighted-average-rank-0-80/input"&gt;rank average&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final thoughts
&lt;/h1&gt;

&lt;p&gt;In this article, you saw many popular and effective ways to improve the performance of your tabular data binary classification model. Hopefully, you will find them useful in your projects&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/shahules/"&gt;Shahul Es&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/tabular-data-binary-classification-tips-and-tricks-from-5-kaggle-competitions?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-tabular-data-binary-classification-tips-and-tricks-from-5-kaggle-competitions"&gt;Neptune blog&lt;/a&gt;, where you can find more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Keep Track of PyTorch Lightning Experiments with Neptune</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Thu, 16 Jul 2020 14:57:37 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/how-to-keep-track-of-pytorch-lightning-experiments-with-neptune-13h3</link>
      <guid>https://dev.to/kamil_k7k/how-to-keep-track-of-pytorch-lightning-experiments-with-neptune-13h3</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/jakub-czakon-2b797b69/"&gt;Jakub Czakon&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/pytorch-lightning-neptune-integration?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;Neptune blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Working with PyTorch Lightning and wondering which logger should you choose to keep track of your experiments?&lt;/p&gt;

&lt;p&gt;Thinking of using PyTorch Lightning to structure your Deep Learning code and wouldn't mind learning about it's logging functionality?&lt;/p&gt;

&lt;p&gt;Didn't know that Lightning has a pretty awesome Neptune integration?&lt;/p&gt;

&lt;p&gt;This article is (very likely) for you.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why PyTorch Lightning and Neptune?
&lt;/h1&gt;

&lt;p&gt;If you never heard of it, &lt;a href="https://github.com/PyTorchLightning/pytorch-lightning"&gt;PyTorch Lightning&lt;/a&gt; is a very lightweight wrapper on top of PyTorch which is more like a coding standard than a framework. The format allows you to get rid of a ton of boilerplate code while keeping it easy to follow.&lt;/p&gt;

&lt;p&gt;The result is a framework that gives researchers, students, and production teams the ultimate flexibility to try crazy ideas without having to learn yet another framework while automating away all the engineering details.&lt;/p&gt;

&lt;p&gt;Some great features that you can get out-of-the-box are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train on CPU, GPU or TPUs without changing your code,&lt;/li&gt;
&lt;li&gt;Trivial multi-GPU and multi-node training&lt;/li&gt;
&lt;li&gt;Trivial 16 bit precision support&lt;/li&gt;
&lt;li&gt;Built-in performance profiler (Trainer(profile=True))&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and a &lt;a href="https://pytorch-lightning.readthedocs.io/en/latest/"&gt;ton of other great functionalities&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But with this great power of running experiments easily and flexibility in tweaking anything you want comes a problem.&lt;/p&gt;

&lt;p&gt;How to keep track of all the changes like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;losses and metrics,&lt;/li&gt;
&lt;li&gt;hyperparameters&lt;/li&gt;
&lt;li&gt;model binaries&lt;/li&gt;
&lt;li&gt;validation predictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and other things that will help you organize your experimentation process?&lt;/p&gt;

&lt;p&gt;Fortunately, PyTorch lightning gives you an option to easily connect loggers to the &lt;code&gt;pl.Trainer&lt;/code&gt; and one of the supported loggers that can track all of the things mentioned before (and many others) is the &lt;code&gt;NeptuneLogger&lt;/code&gt; which saves your experiments in… you guessed it &lt;a href="https://docs.neptune.ai/integrations/pytorch_lightning.html?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;Neptune&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Neptune not only tracks your experiment artifacts but also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;let's you monitor everything live,&lt;/li&gt;
&lt;li&gt;gives you a nice UI where you can filter, group and compare various experiment runs&lt;/li&gt;
&lt;li&gt;access experiment data that you logged programmatically from a Python script or Jupyter Notebook&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best part is that this integration really is trivial to use.&lt;/p&gt;

&lt;p&gt;Let me show you how it looks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;:&lt;br&gt;
You can also check out this &lt;a href="https://colab.research.google.com/github/neptune-ai/neptune-colab-examples/blob/master/pytorch_lightning-integration.ipynb"&gt;colab notebook&lt;/a&gt; and play with the examples we will talk about yourself.&lt;/p&gt;
&lt;h1&gt;
  
  
  Basic Integration
&lt;/h1&gt;

&lt;p&gt;In the simplest case you just create the &lt;code&gt;NeptuneLogger&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning.logging.neptune&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NeptuneLogger&lt;/span&gt;
&lt;span class="n"&gt;neptune_logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NeptuneLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ANONYMOUS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"shared/pytorch-lightning-integration"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and pass it to the logger argument of &lt;code&gt;Trainer&lt;/code&gt; and fit your model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;By doing so you get your:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics and losses logged and charts created,&lt;/li&gt;
&lt;li&gt;Hyperparameters saved (if defined via lightning &lt;code&gt;hparams&lt;/code&gt;),&lt;/li&gt;
&lt;li&gt;Hardware utilization logged&lt;/li&gt;
&lt;li&gt;Git info and execution script logged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out &lt;a href="https://ui.neptune.ai/shared/pytorch-lightning-integration/e/PYTOR-121/details?utm_source=medium&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration&amp;amp;utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;this experiment&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a6bW2jJq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2A5zDMu_mVK1sZ3eXSXHHZ4A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a6bW2jJq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2A5zDMu_mVK1sZ3eXSXHHZ4A.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
You can monitor your experiments, compare them, and share them with others.&lt;br&gt;
Not too bad for a 4-liner.&lt;br&gt;
But with just a bit more effort you can get a lot more.&lt;/p&gt;
&lt;h1&gt;
  
  
  Advanced Options
&lt;/h1&gt;

&lt;p&gt;Neptune gives you a lot of customization options and you can simply log more experiment-specific things, like image predictions, model weights, performance charts and more.&lt;/p&gt;

&lt;p&gt;All of that functionality is available for Lightning users and in the next sections I will show you how to leverage Neptune to the fullest.&lt;/p&gt;
&lt;h3&gt;
  
  
  Logging extra information at NeptuneLogger creation
&lt;/h3&gt;

&lt;p&gt;When you are creating the logger you can log additional useful information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;code: snapshot scripts, jupyter notebooks, config files, and more&lt;/li&gt;
&lt;li&gt;hyperparameters: log learning rate, number of epochs and other things (if you are using lightning &lt;code&gt;hparams&lt;/code&gt; object from lightning it will be logged automatically)&lt;/li&gt;
&lt;li&gt;properties: log data locations, data versions, or other things&lt;/li&gt;
&lt;li&gt;tags: add tags like "resnet50" or "no-augmentation" to organize your runs.&lt;/li&gt;
&lt;li&gt;name: every experiment deserves a meaningful name so let's not use "default" every time 🙂 shall we&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just pass this information to your logger:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NeptuneLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ANONYMOUS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"shared/pytorch-lightning-integration"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;experiment_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Optional,
&lt;/span&gt;    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"max_epochs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"batch_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Optional,
&lt;/span&gt;    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"pytorch-lightning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"mlp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Optional,
&lt;/span&gt;    &lt;span class="n"&gt;upload_source_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"**/*.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*.yaml"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Optional,
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;… and proceed as before to get an &lt;a href="https://ui.neptune.ai/shared/pytorch-lightning-integration/experiments?viewId=1e01a374-00a6-4cef-af20-ea99e1fc9fab&amp;amp;utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;organized dashboard like this one&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3rWlCfS1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AwJnwGnRmFKRXUSEw" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3rWlCfS1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AwJnwGnRmFKRXUSEw" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Logging extra things during training
&lt;/h3&gt;

&lt;p&gt;A lot of interesting information can be logged during training.&lt;/p&gt;

&lt;p&gt;You may be interested in monitoring things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model predictions after each epochs (think prediction masks or overlaid bounding boxes)&lt;/li&gt;
&lt;li&gt;diagnostic charts like ROC AUC curve or Confusion Matrix&lt;/li&gt;
&lt;li&gt;model checkpoints, or other objects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is really simple. Just go to your &lt;code&gt;LightningModule&lt;/code&gt; and call methods of the Neptune experiment available as &lt;code&gt;self.logger.experiment&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, we can log histograms of losses after each epoch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CoolSystem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LightningModule&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validation_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;avg_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# log debugging images like histogram of losses
&lt;/span&gt;        &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;losses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;losses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'loss_histograms'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'avg_val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://ui.neptune.ai/shared/pytorch-lightning-integration/e/PYTOR-119/logs?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;Explore them for yourself&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_bEcaG0j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A1UzbqF71meU9Oqlk" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_bEcaG0j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A1UzbqF71meU9Oqlk" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other things you may want to log during training are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;self.logger.experiment.log_metric&lt;/code&gt; # log custom metrics&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;self.logger.experiment.log_text&lt;/code&gt; # log text values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;self.ogger.experiment.log_artifact&lt;/code&gt; # log files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;self.logger.experiment.log_image&lt;/code&gt; # log images, charts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;self.logger.experiment.set_property&lt;/code&gt; # add key:value pairs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;self.logger.experiment.append_tag&lt;/code&gt; # add tags for organization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pretty cool right?&lt;/p&gt;

&lt;p&gt;But … that is not all you can do!&lt;/p&gt;

&lt;h3&gt;
  
  
  Logging things after training has finished
&lt;/h3&gt;

&lt;p&gt;Tracking your experiment doesn't have to finish after your .fit loop ends.&lt;/p&gt;

&lt;p&gt;You may want to track the metrics of the &lt;code&gt;trainer.test(model)&lt;/code&gt; or calculate some additional validation metrics and log them.&lt;/p&gt;

&lt;p&gt;To do that you just need to tell &lt;code&gt;NeptuneLogger&lt;/code&gt; not to close after fit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NeptuneLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ANONYMOUS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"shared/pytorch-lightning-integration"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;close_after_fit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;… and you can keep logging 🙂&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test metrics:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Additional (external) metrics:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'test_accuracy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance charts on test set:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'confusion_matrix'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The whole model checkpoints directory:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'my/checkpoints'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://ui.neptune.ai/shared/pytorch-lightning-integration/e/PYTOR-119/logs?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;Go to this experiment&lt;/a&gt; to see how those objects are logged:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cI2O-jVZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2ABtuhJUdbldY6b_0WFPT5lA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cI2O-jVZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2ABtuhJUdbldY6b_0WFPT5lA.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But … there is even more!&lt;/p&gt;

&lt;p&gt;Neptune lets you fetch experiments after training.&lt;/p&gt;

&lt;p&gt;Let me show you how.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fetching your experiment information directly to the notebooks
&lt;/h3&gt;

&lt;p&gt;You can fetch experiments after they have finished, analyze the results and update metrics, artifacts or other things if you want to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptune&lt;/span&gt;

&lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'shared/pytorch-lightning-integration'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_leaderboard&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;For example, let's fetch the experiments dashboard to a pandas DataFrame or visualize it with HiPlot via &lt;a href="https://docs.neptune.ai/integrations/hiplot.html?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;neptune HiPlot integration&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;neptunecontrib.viz&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_parallel_coordinates_plot&lt;/span&gt;

&lt;span class="n"&gt;make_parallel_coordinates_plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'train_loss'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'test_accuracy'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
           &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'max_epochs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'batch_size'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'lr'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--q6w4etk5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2Axi1VBLUTgN4enzBQphYgQA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--q6w4etk5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2Axi1VBLUTgN4enzBQphYgQA.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;or fetch a single experiment and update it with some external metric calculated after training:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_experiments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'PYTOR-63'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'some_external_metric'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As you can see there are a lot of things you can log to Neptune from Pytorch Lightning.&lt;/p&gt;

&lt;p&gt;If you want to go deeper into this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.neptune.ai/integrations/pytorch_lightning.html?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;read the integration docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://neptune.ai/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;go check out Neptune&lt;/a&gt; to see other things it can do,&lt;/li&gt;
&lt;li&gt;&lt;a href="https://colab.research.google.com/github/neptune-ai/neptune-colab-examples/blob/master/pytorch_lightning-integration.ipynb"&gt;try out Lightning + Neptune on colab&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final Thought
&lt;/h1&gt;

&lt;p&gt;Pytorch Lightning is a great library that helps you with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;organizing your deep learning code to make it easily understandable to other people,&lt;/li&gt;
&lt;li&gt;outsourcing development boilerplate to a team of seasoned engineers,&lt;/li&gt;
&lt;li&gt;accessing a lot of state-of-the-art functionalities with almost no changes to your code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Neptune integration, you get some additional things for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you can monitor and keep track of your deep learning experiments&lt;/li&gt;
&lt;li&gt;you can share your research with other people easily&lt;/li&gt;
&lt;li&gt;you and your team can access experiment metadata and collaborate more efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hopefully, with all that power you will know exactly what you (and other people) tried and your deep learning research will be moving at a lightning speed 🙂&lt;/p&gt;

&lt;h1&gt;
  
  
  Bonus: Full PyTorch Lightning tracking script
&lt;/h1&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; torch pytorch-lightning &lt;span class="se"&gt;\&lt;/span&gt;
    neptune-client neptune-contrib[viz] &lt;span class="se"&gt;\&lt;/span&gt;
    matplotlib scikit-plot 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;torch.nn&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;functional&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;torch.utils.data&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;torchvision.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MNIST&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;torchvision&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;transforms&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;

&lt;span class="n"&gt;MAX_EPOCHS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;
&lt;span class="n"&gt;LR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.02&lt;/span&gt;
&lt;span class="n"&gt;BATCHSIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;
&lt;span class="n"&gt;CHECKPOINTS_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'my_models/checkpoints'&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CoolSystem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LightningModule&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CoolSystem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# not the best model...
&lt;/span&gt;        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;training_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# REQUIRED
&lt;/span&gt;        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_entropy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'train_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validation_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_entropy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validation_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;avg_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;losses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'val_loss'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;losses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'loss_histograms'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'avg_val_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
        &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'test_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_entropy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="n"&gt;avg_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'test_loss'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'test_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'avg_test_loss'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'log'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensorboard_logs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;configure_optimizers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# REQUIRED
&lt;/span&gt;        &lt;span class="c1"&gt;# can return multiple optimizers and learning_rate schedulers
&lt;/span&gt;        &lt;span class="c1"&gt;# (LBFGS it is automatically supported, no need for closure function)
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Adam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_loader&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_dataloader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# REQUIRED
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BATCHSIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_loader&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;val_dataloader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BATCHSIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data_loader&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_dataloader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# OPTIONAL
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BATCHSIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning.loggers.neptune&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NeptuneLogger&lt;/span&gt;

&lt;span class="n"&gt;neptune_logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NeptuneLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ANONYMOUS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"shared/pytorch-lightning-integration"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;close_after_fit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;experiment_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Optional,
&lt;/span&gt;    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"max_epochs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MAX_EPOCHS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"batch_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BATCHSIZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LR&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Optional,
&lt;/span&gt;    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"pytorch-lightning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"mlp"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;upload_source_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'*.py'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'*.yaml'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;upload_stderr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;upload_stdout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model_checkpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ModelCheckpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CHECKPOINTS_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pytorch_lightning&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CoolSystem&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MAX_EPOCHS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;checkpoint_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_checkpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get predictions on external test
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freeze&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;test_loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MNIST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[],[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_loader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;detach&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;detach&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_loader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
&lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hstack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hstack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Log additional metrics
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;

&lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'test_accuracy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Log charts
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scikitplot.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plot_confusion_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'confusion_matrix'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save checkpoints folder
&lt;/span&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CHECKPOINTS_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# You can stop the experiment
&lt;/span&gt;&lt;span class="n"&gt;neptune_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;






&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/jakub-czakon-2b797b69/"&gt;Jakub Czakon&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/pytorch-lightning-neptune-integration?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-pytorch-lightning-neptune-integration"&gt;Neptune blog&lt;/a&gt;. You can find there more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Understanding LightGBM Parameters (and How to Tune Them)</title>
      <dc:creator>Kamil A. Kaczmarek</dc:creator>
      <pubDate>Tue, 14 Jul 2020 21:13:08 +0000</pubDate>
      <link>https://dev.to/kamil_k7k/understanding-lightgbm-parameters-and-how-to-tune-them-14n0</link>
      <guid>https://dev.to/kamil_k7k/understanding-lightgbm-parameters-and-how-to-tune-them-14n0</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/mjbahmani/"&gt;MJ Bahmani&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/lightgbm-parameters-guide?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-lightgbm-parameters-guide"&gt;Neptune blog.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I've been using &lt;a href="https://github.com/microsoft/LightGBM/tree/master/python-package"&gt;lightGBM&lt;/a&gt; for a while now. It's been my go-to algorithm for most tabular data problems. The list of awesome features is long and I suggest that you take a look if you haven't already.&lt;br&gt;
But I was always interested in understanding which parameters have the biggest impact on performance and how I should tune lightGBM parameters to get the most out of it.&lt;br&gt;
I figured I should do some research, understand more about lightGBM parameters… and share my journey.&lt;br&gt;
Specifically I:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Took a deep-dive into &lt;a href="https://lightgbm.readthedocs.io/en/latest/index.html"&gt;LightGBM's documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Went through Laurae articles &lt;a href="https://sites.google.com/view/lauraepp/parameters"&gt;Lauraepp: xgboost / LightGBM parameters&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Looked into the &lt;a href="https://github.com/microsoft/LightGBM"&gt;LightGBM GitHub Repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ran some experiments myself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I was doing that I gained a lot more knowledge about lightGBM parameters. My hope is that after reading this article you will be able to answer the following questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which Gradient Boosting methods are implemented in LightGBM and what are its differences?&lt;/li&gt;
&lt;li&gt;Which parameters are important in general?&lt;/li&gt;
&lt;li&gt;Which regularization parameters need to be tuned?&lt;/li&gt;
&lt;li&gt;How to tune lightGBM parameters in python?&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Gradient Boosting methods
&lt;/h1&gt;

&lt;p&gt;With LightGBM you can run different types of Gradient Boosting methods. You have: GBDT, DART, and GOSS which can be specified with the "boosting" parameter.&lt;br&gt;
In the next sections, I will explain and compare these methods with each other.&lt;/p&gt;
&lt;h3&gt;
  
  
  lgbm gbdt (gradient boosted decision trees)
&lt;/h3&gt;

&lt;p&gt;This method is the traditional Gradient Boosting Decision Tree that was first suggested in this article and is the algorithm behind some great libraries like XGBoost and pGBRT.&lt;br&gt;
These days gbdt is widely used because of its accuracy, efficiency, and stability. You probably know that gbdt is an ensemble model of decision trees but what does it mean exactly?&lt;br&gt;
Let me give you a gist.&lt;br&gt;
It is based on three important principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weak learners (decision trees)&lt;/li&gt;
&lt;li&gt;Gradient Optimization&lt;/li&gt;
&lt;li&gt;Boosting Technique&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So in the gbdt method we have a lot of decision trees(weak learners). Those trees are built sequentially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first tree learns how to fit to the target variable&lt;/li&gt;
&lt;li&gt;second tree learns how to fit to the residual (difference) between the predictions of the first tree and the ground truth&lt;/li&gt;
&lt;li&gt;The third tree learns how to fit the residuals of the second tree and so on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All those trees are trained by propagating the gradients of errors throughout the system.&lt;br&gt;
The main drawback of gbdt is that finding the best split points in each tree node is time-consuming and memory-consuming operation other boosting methods try to tackle that problem.&lt;/p&gt;
&lt;h3&gt;
  
  
  dart gradient boosting
&lt;/h3&gt;

&lt;p&gt;In this outstanding paper, you can learn all the things about DART gradient boosting which is a method that uses dropout, standard in Neural Networks, to improve model regularization and deal with some other less-obvious problems.&lt;br&gt;
Namely, gbdt suffers from over-specialization, which means trees added at later iterations tend to impact the prediction of only a few instances and make a negligible contribution towards the remaining instances. Adding dropout makes it more difficult for the trees at later iterations to specialize on those few samples and hence improves the performance.&lt;/p&gt;
&lt;h3&gt;
  
  
  lgbm goss (Gradient-based One-Side Sampling)
&lt;/h3&gt;

&lt;p&gt;In fact, the most important reason for naming this method lightgbm is using &lt;a href="https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf"&gt;the Goss&lt;/a&gt; method based on this paper. Goss is the newer and lighter gbdt implementation (hence "light" gbm).&lt;br&gt;
The standard gbdt is reliable but it is not fast enough on large datasets. Hence, goss suggests a sampling method based on the gradient to avoid searching for the whole search space. We know that for each data instance when the gradient is small that means no worries data is well-trained and when the gradient is large that should be retrained again. So we have &lt;strong&gt;two sides&lt;/strong&gt; here, data instances with large and small gradients. Thus, goss keeps all data with a large gradient and does a random sampling &lt;strong&gt;(that's why it is called One-Side Sampling)&lt;/strong&gt; on data with a small gradient. This makes the search space smaller and goss can converge faster. Finally, for gaining more insight about goss, you can check this &lt;a href="https://towardsdatascience.com/what-makes-lightgbm-lightning-fast-a27cf0d9785e"&gt;blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's put those differences in a table:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JgfRmcSF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2ASTNImQL15FVgTJ-ieySrYw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JgfRmcSF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2ASTNImQL15FVgTJ-ieySrYw.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Note&lt;/strong&gt;: If you set boosting as RF then the lightgbm algorithm behaves as random forest and not boosted trees! According to the documentation, to use RF you must use bagging_fraction and feature_fraction smaller than 1.&lt;/p&gt;
&lt;h1&gt;
  
  
  Regularization
&lt;/h1&gt;

&lt;p&gt;In this section, I will cover some important regularization parameters of lightgbm. Obviously, those are the parameters that you need to tune to fight overfitting.&lt;br&gt;
You should be aware that for small datasets (&amp;lt;10000 records) lightGBM may not be the best choice. Tuning lightgbm parameters may not help you there.&lt;br&gt;
In addition, lightgbm uses &lt;a href="https://lightgbm.readthedocs.io/en/latest/Features.html#leaf-wise-best-first-tree-growth"&gt;leaf-wise&lt;/a&gt; tree growth algorithm whileXGBoost uses depth-wise tree growth. Leaf-wise method allows the trees to converge faster but the chance of over-fitting increases.&lt;br&gt;
Maybe this talk from one of the PyData conferences gives you more insights about Xgboost and Lightgbm. Worth to watch!&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/5CWwwtEM2TA"&gt;
&lt;/iframe&gt;
&lt;br&gt;
&lt;strong&gt;Note&lt;/strong&gt;: If someone asks you what is the main difference between LightGBM and XGBoost? You can easily say, their difference is in how they are implemented.&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html#deal-with-over-fitting"&gt;lightGBM documentation&lt;/a&gt;, when facing  overfitting you may want to do the following parameter tuning: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use small max_bin&lt;/li&gt;
&lt;li&gt;Use small num_leaves&lt;/li&gt;
&lt;li&gt;Use min_data_in_leaf and min_sum_hessian_in_leaf&lt;/li&gt;
&lt;li&gt;Use bagging by set bagging_fraction and bagging_freq&lt;/li&gt;
&lt;li&gt;Use feature sub-sampling by set feature_fraction&lt;/li&gt;
&lt;li&gt;Use bigger training data&lt;/li&gt;
&lt;li&gt;Try lambda_l1, lambda_l2 and min_gain_to_split for regularization&lt;/li&gt;
&lt;li&gt;Try max_depth to avoid growing deep tree&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the following sections, I will explain each of those parameters in a bit more detail.&lt;/p&gt;
&lt;h3&gt;
  
  
  lambda_l1
&lt;/h3&gt;

&lt;p&gt;Lambda_l1 (and lambda_l2) control to l1/l2 and along with min_gain_to_split are used to combat over-fitting. I highly recommend you to use parameter tuning (explored in the later section) to figure out the best values for those parameters.&lt;/p&gt;
&lt;h3&gt;
  
  
  num_leaves
&lt;/h3&gt;

&lt;p&gt;Surely &lt;strong&gt;num_leaves&lt;/strong&gt; is one of the most important parameters that controls the &lt;strong&gt;complexity&lt;/strong&gt; of the model. With it, you set the maximum number of leaves each weak learner has. Large num_leaves increases accuracy on the training set and also the chance of getting hurt by overfitting. According to the documentation, one simple way is that &lt;strong&gt;num_leaves = 2^(max_depth)&lt;/strong&gt; however, considering that in lightgbm a leaf-wise tree is deeper than a level-wise tree you need to be careful about overfitting!&lt;br&gt;
&lt;strong&gt;As a result, It is necessary to tune num_leaves with the max_depth together.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3A8G3t_T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AAPaeoCx2c_0z-VaR" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3A8G3t_T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AAPaeoCx2c_0z-VaR" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W_zd0LJ---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AxELcdiOJlE8vhJnO" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W_zd0LJ---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AxELcdiOJlE8vhJnO" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
Photo on &lt;a href="https://lightgbm.readthedocs.io/en/latest/Features.html"&gt;lightgbm documentation&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  subsample
&lt;/h3&gt;

&lt;p&gt;With &lt;a href="https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters"&gt;subsample&lt;/a&gt; (or bagging_fraction) you can specify the percentage of rows used per tree building iteration. That means some rows will be randomly selected for fitting each learner (tree). This improved generalization but also speed of training.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CRrO50hn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A6QxzzJiv3nHjQIC7" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CRrO50hn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2A6QxzzJiv3nHjQIC7" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I suggest using smaller subsample values for the baseline models and later increase this value when you are done with other experiments (different feature selections, different tree architecture).&lt;/p&gt;
&lt;h3&gt;
  
  
  feature_fraction
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters"&gt;Feature fraction&lt;/a&gt; or sub_feature deals with column sampling, LightGBM will randomly select a subset of features on each iteration (tree). For example, if you set it to 0.6, LightGBM will select 60% of features before training each tree.&lt;br&gt;
There are two usage for this feature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can be used to speed up training&lt;/li&gt;
&lt;li&gt;Can be used to deal with overfitting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bDuz3aY_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AqGw4lx7pINrkVNYX" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bDuz3aY_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AqGw4lx7pINrkVNYX" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  max_depth
&lt;/h3&gt;

&lt;p&gt;This parameter control max depth of each trained tree and will have impact on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The best value for the num_leaves parameter&lt;/li&gt;
&lt;li&gt;Model Performance&lt;/li&gt;
&lt;li&gt;Training Time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pay attention If you use a large value of max_depth, your model will likely be &lt;strong&gt;over fit&lt;/strong&gt; to the train set.&lt;/p&gt;
&lt;h3&gt;
  
  
  max_bin
&lt;/h3&gt;

&lt;p&gt;Binning is a technique for representing data in a discrete view(histogram). Lightgbm uses a histogram based algorithm to find the optimal split point while creating a weak learner. Therefore, each continuous numeric feature (e.g. number of views for a video) should be split into discrete bins.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ga8_saQL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AoMI3hXxBdqCj2Cgb" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ga8_saQL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/0%2AoMI3hXxBdqCj2Cgb" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
The photo on &lt;a href="https://mlexplained.com/2018/01/05/lightgbm-and-xgboost-explained/"&gt;LightGBM and XGBoost Explained&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, in this &lt;a href="https://github.com/huanzhang12/lightgbm-gpu"&gt;GitHub repo&lt;/a&gt;, you can find some comprehensive experiments which completely explains the effect of changing max_bin on CPU and GPU.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qoT7d_VL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/znWhx-b6ov_c8rjk3HTPoyWjIJP0M-1owWsPT_xH6OnVl02o5vxQmdvtruQiRZBVUm0bWUIoUAw1lrYbaN3KGsYsVKC8Sya6YePyiWNDtFBNNBZUSYfZJf3Zp9V8mM4XIkIGVI9C" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qoT7d_VL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/znWhx-b6ov_c8rjk3HTPoyWjIJP0M-1owWsPT_xH6OnVl02o5vxQmdvtruQiRZBVUm0bWUIoUAw1lrYbaN3KGsYsVKC8Sya6YePyiWNDtFBNNBZUSYfZJf3Zp9V8mM4XIkIGVI9C" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
Clock time after 500 iterations - &lt;a href="https://github.com/huanzhang12/lightgbm-gpu"&gt;GitHub repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you define max_bin 255 that means we can have a maximum of 255 unique values per feature. Then Small max_bin causes faster speed and large value improves accuracy.&lt;/p&gt;
&lt;h1&gt;
  
  
  Training parameters
&lt;/h1&gt;

&lt;p&gt;Training time! When you want to train your model with lightgbm, Some typical issues that may come up when you train lightgbm models are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training is a time-consuming process&lt;/li&gt;
&lt;li&gt;Dealing with Computational Complexity (CPU/GPU RAM constraints)&lt;/li&gt;
&lt;li&gt;Dealing with categorical features&lt;/li&gt;
&lt;li&gt;Having an unbalanced dataset&lt;/li&gt;
&lt;li&gt;The need for custom metrics&lt;/li&gt;
&lt;li&gt;Adjustments that need to be made for Classification or Regression problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this section, we will try to explain those points in detail.&lt;/p&gt;
&lt;h3&gt;
  
  
  num_iterations
&lt;/h3&gt;

&lt;p&gt;Num_iterations specifies the number of boosting iterations (trees to build). The more trees you build the more accurate your model can be at the cost of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Longer training time&lt;/li&gt;
&lt;li&gt;Higher chance of overfitting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Start with a lower number of trees to build a baseline and increase it later when you want to squeeze the last % out of your model.&lt;br&gt;
It is recommended to use smaller &lt;strong&gt;learning_rate&lt;/strong&gt; with larger &lt;strong&gt;num_iterations&lt;/strong&gt;. Also, you should use early_stopping_rounds if you go for higher num_iterations to stop your training when it is not learning anything useful.&lt;/p&gt;
&lt;h3&gt;
  
  
  early_stopping_rounds
&lt;/h3&gt;

&lt;p&gt;This parameter will stop training if the validation metric is not improving after the last early stopping round. That should be defined in pair with a number &lt;strong&gt;of iterations&lt;/strong&gt;. If you set it too large you increase the change of &lt;strong&gt;overfitting&lt;/strong&gt; (but your model can be better).&lt;br&gt;
The rule of thumb is to have it at 10% of your num_iterations.&lt;/p&gt;
&lt;h3&gt;
  
  
  lightgbm categorical_feature
&lt;/h3&gt;

&lt;p&gt;One of the advantages of using lightgbm is that it can handle categorical features very well. Yes, this algorithm is very powerful but you have to be careful about how to use its parameters. lightgbm uses a special &lt;strong&gt;&lt;a href="https://lightgbm.readthedocs.io/en/latest/Features.html#optimal-split-for-categorical-features"&gt;integer-encoded&lt;/a&gt;&lt;/strong&gt; method (proposed by &lt;a href="http://www.csiss.org/SPACE/workshops/2004/SAC/files/fisher.pdf"&gt;Fisher&lt;/a&gt;) for handling categorical features&lt;br&gt;
Experiments show that this method brings better performance than, often used, &lt;strong&gt;one-hot encoding&lt;/strong&gt;.&lt;br&gt;
The default value for it is "auto" that means: let lightgbm decide which means lightgbm will infer which features are categorical.&lt;br&gt;
It doesn't always work well (some experiment show why &lt;a href="https://www.kaggle.com/mlisovyi/beware-of-categorical-features-in-lgbm"&gt;here&lt;/a&gt; and &lt;a href="https://www.kaggle.com/c/home-credit-default-risk/discussion/58950"&gt;here&lt;/a&gt;) and I highly recommend you set categorical feature manually simply with this code&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cat_col = dataset_name.select_dtypes('object').columns.tolist()&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But what happens behind the scenes and how lightgbm deals with the categorical features?&lt;br&gt;
According to the &lt;a href="https://lightgbm.readthedocs.io/en/latest/Features.html#optimal-split-for-categorical-features"&gt;documentation&lt;/a&gt; of lightgbm, we know that tree learners cannot work well with one hot encoding method because they grow deeply through the tree. In the proposed alternative method, tree learners are optimally constructed. For example for one feature with k different categories, there are 2^(k-1) - 1 possible partition and with fisher method that can improve to &lt;strong&gt;k * log(k)&lt;/strong&gt; by finding the best-split way on the sorted histogram of values in the categorical feature.&lt;/p&gt;
&lt;h3&gt;
  
  
  lightgbm is_unbalance vs scale_pos_weight
&lt;/h3&gt;

&lt;p&gt;One of the problems you may face in the binary classification problems is how to deal with the unbalanced datasets. Obviously, you need to balance positive/negative samples but how exactly can you do that in lightgbm?&lt;br&gt;
There are two parameters in lightgbm that allow you to deal with this issue &lt;strong&gt;is_unbalance and scale_pos_weight&lt;/strong&gt;, but what is the difference between them and How to use them?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you set Is_unbalace: True, the algorithm will try to Automatically balance the weight of the dominated label (with the pos/neg fraction in train set)&lt;/li&gt;
&lt;li&gt;If you want change &lt;strong&gt;scale_pos_weight&lt;/strong&gt; (it is by default 1 which mean assume both positive and negative label are equal) in case of unbalance dataset you can use following formula(based on this issue on lightgbm repository) to set it correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;sample_pos_weight = number of negative samples / number of positive samples&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  lgbm feval
&lt;/h3&gt;

&lt;p&gt;Sometimes you want to define a custom evaluation function to measure the performance of your model you need to create a "feval" function.&lt;br&gt;
&lt;strong&gt;Feval function&lt;/strong&gt; should accept two parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preds&lt;/li&gt;
&lt;li&gt;train_data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and return&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;eval_name&lt;/li&gt;
&lt;li&gt;eval_result&lt;/li&gt;
&lt;li&gt;is_higher_better&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's create a custom metrics function step by step.&lt;br&gt;
Define a separate python function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;feval_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="c1"&gt;# Define a formula that evaluates the results
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'feval_func_name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Use this function as a parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Start training...'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lgb_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; 
                      &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                      &lt;span class="n"&gt;feval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;feval_func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: to use feval function instead of metric, you should set metric parameter "None".&lt;/p&gt;

&lt;h3&gt;
  
  
  classification params vs regression params
&lt;/h3&gt;

&lt;p&gt;Most of the things I mentioned before are true both for classification and regression but there are things that need to be adjusted.&lt;br&gt;
Specifically you should:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LB2IaRhS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2A5oqCyXngI_R7Sv45dTz8mg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LB2IaRhS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2A5oqCyXngI_R7Sv45dTz8mg.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  The most important lightgbm parameters
&lt;/h1&gt;

&lt;p&gt;We have reviewed and learned a bit about lightgbm parameters in the previous sections but no boosted trees article would be complete without mentioning the incredible benchmarks from Laurae 🙂&lt;br&gt;
You can learn about best default parameters for many problems both for lightGBM and XGBoost.&lt;br&gt;
You can check it out here but some most important takeaways are:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cWu7ZYjs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2A336N5hnBXXJjgIkEqixiVw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cWu7ZYjs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2A336N5hnBXXJjgIkEqixiVw.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--owgXkYhK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AUCIwYfqRrFjqNirO8wF6vw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--owgXkYhK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AUCIwYfqRrFjqNirO8wF6vw.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n2aRF2oq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AFiuB05sQADOiyke8w6GhQQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n2aRF2oq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AFiuB05sQADOiyke8w6GhQQ.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: You should never take any parameter value for granted and adjust it based on your problem. That said, those parameters are a great starting point for your hyperparameter tuning algorithms&lt;/p&gt;
&lt;h1&gt;
  
  
  Lightgbm parameter tuning example in python (lightgbm tuning)
&lt;/h1&gt;

&lt;p&gt;Finally, after the explanation of all important parameters, it is time to perform some experiments!&lt;/p&gt;

&lt;p&gt;I will use one of the popular Kaggle competitions: &lt;a href="https://www.kaggle.com/c/santander-customer-transaction-prediction/data"&gt;Santander Customer Transaction Prediction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I will use this article which explains &lt;a href="https://neptune.ai/blog/hyperparameter-tuning-on-any-python-script"&gt;how to run hyperparameter tuning in Python&lt;/a&gt; on any script.&lt;/p&gt;

&lt;p&gt;Worth a read!&lt;/p&gt;

&lt;p&gt;Before we start, one important question! &lt;strong&gt;What parameters should we tune?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pay attention to the problem you want to solve, for instance Santander dataset is &lt;strong&gt;highly imbalanced&lt;/strong&gt;, and should consider that in your tuning! &lt;a href="https://github.com/Laurae2"&gt;Laurae2&lt;/a&gt;, one of the contributors to lightgbm, explained this well here.&lt;/li&gt;
&lt;li&gt;Some parameters are interdependent and must be adjusted together or tuned one by one. For instance, min_data_in_leaf depends on the number of training samples and num_leaves.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: It's a good idea to create two dictionaries for hyperparameters, one contains parameters and values that you don't want to tune, the other contains parameter and value ranges that you do want to tune.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;FIXED_PARAMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s"&gt;'is_unbalance'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s"&gt;'boosting'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'gbdt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s"&gt;'num_boost_round'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s"&gt;'early_stopping_rounds'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;By doing that you keep your baseline values separated from the search space!&lt;br&gt;
Now, here's what we'll do.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we generate the code in the &lt;a href="https://ui.neptune.ai/mjbahmani/LightGBM-hyperparameters/experiments?viewId=standard-view&amp;amp;utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-lightgbm-parameters-guide"&gt;Notebook&lt;/a&gt;. It is public and you can download it.&lt;/li&gt;
&lt;li&gt;Second, we track the result of each experiment on &lt;a href="https://ui.neptune.ai/mjbahmani/LightGBM-hyperparameters/e/LGB-7/logs?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-lightgbm-parameters-guide"&gt;Neptune.ai&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4TPihQb7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AkzhhCnNtRMhzMmZa3N_jCA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4TPihQb7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AkzhhCnNtRMhzMmZa3N_jCA.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  analysis of results
&lt;/h3&gt;

&lt;p&gt;If you have checked the previous section, you've noticed that I've done more than 14 different experiments on the dataset. Here I explain how to tune the value of the hyperparameters step by step.&lt;br&gt;
Create a baseline training code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;roc_auc_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roc_curve&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptunecontrib.monitoring.skopt&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sk_utils&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;neptune&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;skopt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;FIXED_PARAMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'binary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="s"&gt;'is_unbalance'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="s"&gt;'bagging_freq'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="s"&gt;'boosting'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'dart'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="s"&gt;'num_boost_round'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="s"&gt;'early_stopping_rounds'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="c1"&gt;# you can download the dataset from this link(https://www.kaggle.com/c/santander-customer-transaction-prediction/data)
&lt;/span&gt;   &lt;span class="c1"&gt;# import Dataset to play with it
&lt;/span&gt;   &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sample_train.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ID_code'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;FIXED_PARAMS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'metric'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
             &lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;FIXED_PARAMS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'objective'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
             &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;search_params&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                     
                     &lt;span class="n"&gt;valid_sets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                     &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FIXED_PARAMS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'num_boost_round'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                     &lt;span class="n"&gt;early_stopping_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FIXED_PARAMS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'early_stopping_rounds'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                     &lt;span class="n"&gt;valid_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
   &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'valid'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'auc'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Use the hyperparameter optimization library of your choice (for example scikit-optimize)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'mjbahmani/LightGBM-hyperparameters'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'lgb-tuning_final'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;upload_source_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'*.*'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                              &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'lgb-tuning'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'dart'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SEARCH_PARAMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;SPACE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'log-uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'num_leaves'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'feature_fraction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;space&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Real&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'subsample'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'uniform'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;use_named_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;train_evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sk_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NeptuneMonitor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skopt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forest_minimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                &lt;span class="n"&gt;n_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_random_starts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;sk_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;neptune&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Try different types of configuration and track your results in &lt;a href="https://ui.neptune.ai/mjbahmani/LightGBM-hyperparameters/experiments?viewId=standard-view&amp;amp;utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-lightgbm-parameters-guide"&gt;Neptune&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8mzZuL9j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2APesKsNfCJS62g_em1tHpKw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8mzZuL9j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2APesKsNfCJS62g_em1tHpKw.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, in the following table, you can see what changes have taken place in the parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H5K1LeOZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AGy7-P7XDNaOB5zEIHKY3Iw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H5K1LeOZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/600/1%2AGy7-P7XDNaOB5zEIHKY3Iw.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Final Thoughts:
&lt;/h1&gt;

&lt;p&gt;Long story short, you learned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the main lightgbm parameters,&lt;/li&gt;
&lt;li&gt;how to create custom metrics with the feval function,&lt;/li&gt;
&lt;li&gt;what are the good default values of major parameters,&lt;/li&gt;
&lt;li&gt;saw and example of how to tune lightgbm parameters to improve model performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And some other things 🙂 For more detailed information, please refer to the resources.&lt;/p&gt;

&lt;h1&gt;
  
  
  Resources:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Laurae extensive guide with good defaults etc &lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/LightGBM/tree/master/python-package"&gt;https://github.com/microsoft/LightGBM/tree/master/python-package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lightgbm.readthedocs.io/en/latest/index.html"&gt;https://lightgbm.readthedocs.io/en/latest/index.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf"&gt;https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://statweb.stanford.edu/%7Ejhf/ftp/trebst.pdf"&gt;https://statweb.stanford.edu/~jhf/ftp/trebst.pdf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This article was originally written by &lt;a href="https://www.linkedin.com/in/mjbahmani/"&gt;MJ Bahmani&lt;/a&gt; and posted on the &lt;a href="https://neptune.ai/blog/lightgbm-parameters-guide?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog-lightgbm-parameters-guide"&gt;Neptune blog&lt;/a&gt;. You can find there more in-depth articles for machine learning practitioners.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
  </channel>
</rss>
