<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joel Buenrostro</title>
    <description>The latest articles on DEV Community by Joel Buenrostro (@joelbuenrostro).</description>
    <link>https://dev.to/joelbuenrostro</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F571623%2F1c6b8707-9bb0-44a1-b503-7e513d234a9b.png</url>
      <title>DEV Community: Joel Buenrostro</title>
      <link>https://dev.to/joelbuenrostro</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joelbuenrostro"/>
    <language>en</language>
    <item>
      <title>Becas digitales para Talent Land 2022</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Mon, 04 Jul 2022 03:07:54 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/becas-digitales-para-talent-land-2022-3ako</link>
      <guid>https://dev.to/joelbuenrostro/becas-digitales-para-talent-land-2022-3ako</guid>
      <description>&lt;p&gt;Hola a todos, en esta ocasión, y como embajador de &lt;strong&gt;Talent Network&lt;/strong&gt;. Quiero aprovechar para extenderles la más cordial invitación a &lt;strong&gt;Talent Land 2022 digital&lt;/strong&gt; y una beca a &lt;strong&gt;Talent World&lt;/strong&gt; durante un año con mucho contenido bajo demanda y muchos eventos más.&lt;/p&gt;

&lt;p&gt;Talent Network es una empresa 100% mexicana, En nuestra espina dorsal fluye el ADN del trabajo en equipo, la responsabilidad social, el compromiso con nuestro ecosistema de talento, la exigencia para brindar soluciones efectivas, el imperativo de ayudar, la voluntad de impulsar al talento joven y el ímpetu de experimentar en grande.&lt;/p&gt;

&lt;p&gt;Con el esfuerzo de incentivar el desarrollo del talento, IBM y Fundación &lt;strong&gt;Talent Land&lt;/strong&gt; ponen a disposición becas a alumnos, docentes y personal educativo para el acceso al mayor evento de innovación y tecnología de Mexico, Jalisco Talent Land, donde podrán acceder a diversos contenidos, realizar networking y participar en diversas competencias.&lt;/p&gt;

&lt;h2&gt;
  
  
  ¿Que incluye?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Acceso por un año a Talent World.&lt;/li&gt;
&lt;li&gt;Acceso a Jalisco Talent Land Digital 2022 en directo y diferido.&lt;/li&gt;
&lt;li&gt;Acceso a Blockchain Land Nuevo León Digital 2022 en directo y diferido.&lt;/li&gt;
&lt;li&gt;Acceso a Talent Land Latinoamérica 2022 en directo y diferido.&lt;/li&gt;
&lt;li&gt;Contenido exclusivo.&lt;/li&gt;
&lt;li&gt;Formación y acompañamiento.&lt;/li&gt;
&lt;li&gt;Giveaways exclusivos para la comunidad.&lt;/li&gt;
&lt;li&gt;Acceso anticipado a nuestros eventos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/uPc3jmLsALI"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  ¿Como aplico?
&lt;/h2&gt;

&lt;p&gt;El primer paso, tienen que escribir su nombre, correo electrónico, seleccionar "EMBAJADORES" y enviar el formulario.&lt;/p&gt;

&lt;p&gt;El segundo paso, es llenar los datos de los temas que les llaman la atención y en la última pregunta colocar el nombre de quien los invito, en este caso yo.&lt;/p&gt;

&lt;p&gt;Espero puedan disfrutar del evento y el aprendizaje en la plataforma&lt;/p&gt;

&lt;p&gt;Enlace para aplicar a la beca de acceso gratuito:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.talent-land.mx/beca-te" rel="noopener noreferrer"&gt;https://www.talent-land.mx/beca-te&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;En el evento se realizará un Hackathon con 5 tracks diferentes y premios.&lt;/p&gt;

&lt;h2&gt;
  
  
  ¿Qué es Talent Hackathon?
&lt;/h2&gt;

&lt;p&gt;Una competencia contra reloj en la que los participantes podrán buscar soluciones en grupo a problemáticas sociales, tecnológicas y/o sustentables con la asesoría y acompañamiento de mentores expertos tanto del sector privado y público.&lt;/p&gt;

&lt;p&gt;Talent Hackathon emerge como la convocatoria y desafío por excelencia dentro de nuestro ecosistema, y de cara a los Objetivos de Desarrollo Sustentable (ODS) de la ONU, buscamos que, tanto participantes como actores de la 4ta hélice; Industria, Gobierno, Academia y Sociedad, propongan mecanismos y proyectos que promuevan la disminución de las desigualdades sociales y de calidad de vida.&lt;/p&gt;

&lt;p&gt;Los participantes en Talent Hackathon tienen la oportunidad de recibir mentoría, y por supuesto interacción con otros talentos en una competencia donde la innovación y el conocimiento son indispensables entre grupos multidisciplinarios que buscan soluciones y propuestas a las adversidades y las problemáticas determinadas.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/F813321pM3o"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Espero puedan disfrutar del contenido OnDemand y del evento digital, quedo a sus órdenes ante cualquier duda sobre el evento.&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>conference</category>
      <category>mexico</category>
      <category>online</category>
    </item>
    <item>
      <title>HTML-Starter-Templates</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Mon, 22 Mar 2021 04:49:35 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/html-starter-templates-5ebp</link>
      <guid>https://dev.to/joelbuenrostro/html-starter-templates-5ebp</guid>
      <description>&lt;p&gt;Hi all HTML and CSS users, I am starting a GitHub repository to collect some basic open-source templates to start web projects. &lt;br&gt;
I want to ask all of you for your advice on good practices on web pages and layout, you are welcome to collaborate in the repository, it is intended to be a good first commit if you are starting with GitHub and you know some tips about HTML and CSS.&lt;/p&gt;

&lt;p&gt;Here is a small description of the project.&lt;/p&gt;
&lt;h2&gt;
  
  
  Scope
&lt;/h2&gt;

&lt;p&gt;HTML starter templates seek to collect various templates with different approaches to starting web projects quickly, easily, and efficiently.&lt;/p&gt;
&lt;h2&gt;
  
  
  Structure
&lt;/h2&gt;

&lt;p&gt;Each folder within this repository contains a template design with its respective HTML and CSS files.&lt;/p&gt;
&lt;h2&gt;
  
  
  Contributing to HTML Starter Templates
&lt;/h2&gt;

&lt;p&gt;First off, thanks for taking the time to contribute!&lt;/p&gt;
&lt;h2&gt;
  
  
  Submitting changes
&lt;/h2&gt;

&lt;p&gt;Please send a GitHub Pull Request with a clear list of what you've done and make sure all of your commits are atomic (one feature per commit).&lt;/p&gt;

&lt;p&gt;Always write a clear log message for your commits. One-line messages are fine for small changes, but bigger changes should have clearer descriptions.&lt;/p&gt;
&lt;h2&gt;
  
  
  Coding conventions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We indent using two spaces (soft tabs)&lt;/li&gt;
&lt;li&gt;We use HTML for all views&lt;/li&gt;
&lt;li&gt;We ALWAYS put spaces after list items and method parameters ([1, 2, 3], not [1,2,3]), around operators (x += 1, not x+=1), and around hash arrows.&lt;/li&gt;
&lt;li&gt;This is open-source software. Consider the people who will read your code, and make it look nice for them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This repo is under the MIT License and Contributor Covenant Code of Conduct.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.dev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/JoelBuenrostro" rel="noopener noreferrer"&gt;
        JoelBuenrostro
      &lt;/a&gt; / &lt;a href="https://github.com/JoelBuenrostro/HTML-Starter-Templates" rel="noopener noreferrer"&gt;
        HTML-Starter-Templates
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      HTML starter templates seek to collect various templates with different approaches to starting web projects quickly, easily, and efficiently.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;HTML-Starter-Templates&lt;/h1&gt;

&lt;/div&gt;
&lt;p&gt;HTML is the standard markup language for creating Web pages.&lt;/p&gt;
&lt;p&gt;CSS is the language we use to style an HTML document.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Table of contents&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/JoelBuenrostro/HTML-Starter-Templates#scope" rel="noopener noreferrer"&gt;Scope&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/JoelBuenrostro/HTML-Starter-Templates#structure" rel="noopener noreferrer"&gt;Structure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/JoelBuenrostro/HTML-Starter-Templates#acknowledgments" rel="noopener noreferrer"&gt;Acknowledgments&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Scope&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;HTML starter templates seek to collect various templates with different approaches to starting web projects quickly, easily and efficiently.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Structure&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;Each folder within this repository contains a template design with its respective HTML and CSS files.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Acknowledgments&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.w3schools.com/" rel="nofollow noopener noreferrer"&gt;W3schools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://frontendchecklist.io/" rel="nofollow noopener noreferrer"&gt;Front-end chacklist&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;



&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/JoelBuenrostro/HTML-Starter-Templates" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


</description>
      <category>html</category>
      <category>css</category>
      <category>githunt</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Train and evaluate regression models - Part 2</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Sun, 21 Feb 2021 20:37:37 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/train-and-evaluate-regression-models-part-2-3a3j</link>
      <guid>https://dev.to/joelbuenrostro/train-and-evaluate-regression-models-part-2-3a3j</guid>
      <description>&lt;p&gt;There are lots of machine learning algorithms for supervised learning, and they can be broadly divided into two types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Regression algorithms: Algorithms that predict a y value that is a numeric value, such as the price of a house or the number of sales transactions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Classification algorithms: Algorithms that predict to which category, or class, an observation belongs. The y value in a classification model is a vector of probability values between 0 and 1, one for each class, indicating the probability of the observation belonging to each class.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Calculating a regression line for a simple binomial (two-variable) function from first principles is possible, but involves some mathematical effort. When you consider a real-world dataset in which x is not a single feature value such as temperature, but a vector of multiple variables such as temperature, day of the week, month, rainfall, and so on; the calculations become more complex.&lt;/p&gt;

&lt;p&gt;For this reason, data scientists generally use specialized machine learning frameworks to perform model training and evaluation. Such frameworks encapsulate common algorithms and provide useful functions for preparing data, fitting data to a model, and calculating model evaluation metrics.&lt;/p&gt;

&lt;p&gt;One of the most commonly used machine learning frameworks for Python is scikit-learn, and in this hands-on exercise, you'll use scikit-learn to train and evaluate a regression model.&lt;/p&gt;

&lt;p&gt;The data used in this exercise is derived from &lt;a href="https://www.capitalbikeshare.com/system-data" rel="noopener noreferrer"&gt;Capital Bikeshare&lt;/a&gt; and is used under the published &lt;a href="https://www.capitalbikeshare.com/data-license-agreement" rel="noopener noreferrer"&gt;license agreement&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Explore the data
&lt;/h1&gt;

&lt;p&gt;The first step in any machine learning project is to explore the data that you will use to train a model. The goal of this exploration is to try to understand the relationships between its attributes; in particular, any apparent correlation between the features and the label your model will try to predict. This may require some work to detect and fix issues in the data (such as dealing with missing values, errors, or outlier values), deriving new feature columns by transforming or combining existing features (a process known as feature engineering), normalizing numeric features (values you can measure or count) so they're on a similar scale, and encoding categorical features (values that represent discrete categories) as numeric indicators.&lt;/p&gt;

&lt;p&gt;Let's start by loading the bicycle sharing data as a Pandas DataFrame and viewing the first few rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# load the training dataset
&lt;/span&gt;&lt;span class="n"&gt;bike_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;daily-bike-share.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this dataset, rentals represent the label (the y value) our model must be trained to predict. The other columns are potential features (x values).&lt;/p&gt;

&lt;p&gt;As mentioned previously, you can perform some feature engineering to combine or derive new features. For example, let's add a new column named day to the dataframe by extracting the day component from the existing dteday column. The new column represents the day of the month from 1 to 31.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DatetimeIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dteday&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;
&lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OK, let's start our analysis of the data by examining a few key descriptive statistics. We can use the dataframe's describe method to generate these for the numeric features as well as the rentals label column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;numeric_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;atemp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hum&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;windspeed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;numeric_features&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rentals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The statistics reveal some information about the distribution of the data in each of the numeric fields, including the number of observations (there are 731 records), the mean, standard deviation, minimum and maximum values, and the quartile values (the threshold values for 25%, 50% - which is also the median, and 75% of the data). From this, we can see that the mean number of daily rentals is around 848; but there's a comparatively large standard deviation, indicating a lot of variance in the number of rentals per day.&lt;/p&gt;

&lt;p&gt;We might get a clearer idea of the distribution of rentals values by visualizing the data. Common plot types for visualizing numeric data distributions are histograms and box plots, so let's use Python's matplotlib library to create one of each of these for the rentals column.&lt;/p&gt;

&lt;h1&gt;
  
  
  Visualize the data
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="c1"&gt;# This ensures plots are displayed inline in the Jupyter notebook
&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt;

&lt;span class="c1"&gt;# Get the label column
&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rentals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="c1"&gt;# Create a figure for 2 subplots (2 rows, 1 column)
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;figsize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Plot the histogram   
&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add lines for the mean, median, and mode
&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;magenta&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dashed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cyan&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dashed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Plot the boxplot   
&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;boxplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vert&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rentals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add a title to the Figure
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suptitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rental Distribution&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Show the figure
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwjed0ahjvdass2yixdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwjed0ahjvdass2yixdg.png" alt="Alt Text" width="605" height="805"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The plots show that the number of daily rentals ranges from 0 to just over 3,400. However, the mean (and median) number of daily rentals is closer to the low end of that range, with most of the data between 0 and around 2,200 rentals. The few values above this are shown in the box plot as small circles, indicating that they are outliers - in other words, unusually high or low values beyond the typical range of most of the data.&lt;/p&gt;

&lt;p&gt;We can do the same kind of visual exploration of the numeric features. Let's create a histogram for each of these.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;numeric_features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gca&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;feature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;magenta&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dashed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cyan&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dashed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flte9avhj79rztsapfu4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flte9avhj79rztsapfu4v.png" alt="Alt Text" width="589" height="767"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Floswqt0sck8edcgyh78n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Floswqt0sck8edcgyh78n.png" alt="Alt Text" width="572" height="777"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've explored the distribution of the numeric values in the dataset, but what about the categorical features? These aren't continuous numbers on a scale, so we can't use histograms, but we can plot a bar chart showing the count of each discrete value for each category.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# plot a bar plot for each categorical feature count
&lt;/span&gt;&lt;span class="n"&gt;categorical_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;season&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mnth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;holiday&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weekday&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;workingday&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weathersit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;categorical_features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sort_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gca&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;steelblue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; counts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Frequency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv25cyoe2paq1gg3j74nz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv25cyoe2paq1gg3j74nz.png" alt="Alt Text" width="586" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we know something about the distribution of the data in our columns, we can start to look for relationships between the features and the rentals label we want to be able to predict.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;numeric_features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gca&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;feature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rentals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;correlation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bike Rentals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rentals vs &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;- correlation: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;correlation&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rmg5mk401xjiz2ad2ts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rmg5mk401xjiz2ad2ts.png" alt="Alt Text" width="603" height="792"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results aren't conclusive, but if you look closely at the scatter plots for temp and atemp, you can see a vague diagonal trend showing that higher rental counts tend to coincide with higher temperatures; and a correlation value of just over 0.5 for both of these features supports this observation. Conversely, the plots for hum and windspeed show a slightly negative correlation, indicating that there are fewer rentals on days with high humidity or wind speed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# plot a boxplot for the label by each categorical feature
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;categorical_features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gca&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;bike_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boxplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rentals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Label by &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bike Rentals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vc7whdisaozkx60bk4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vc7whdisaozkx60bk4r.png" alt="Alt Text" width="637" height="821"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6mk65ttera0di9pzd5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6mk65ttera0di9pzd5v.png" alt="Alt Text" width="616" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The plots show some variance in the relationship between some category values and rentals. For example, there's a clear difference in the distribution of rentals on weekends (weekday 0 or 6) and those during the working week (weekday 1 to 5). Similarly, there are notable differences between the holiday and working day categories. There's a noticeable trend that shows different rental distributions in the summer and fall months compared to spring and winter months. The weather's category also seems to make a difference in rental distribution. The day feature we created for the day of the month shows little variation, indicating that it's probably not predictive of the number of rentals.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>programming</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Train and evaluate regression models - Part 1</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Thu, 18 Feb 2021 03:16:08 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/train-and-evaluate-regression-models-42hm</link>
      <guid>https://dev.to/joelbuenrostro/train-and-evaluate-regression-models-42hm</guid>
      <description>&lt;p&gt;Regression is a commonly used kind of machine learning for predicting numeric values.&lt;/p&gt;

&lt;p&gt;Machine learning is based on statistics and math, and it's important to be aware of specific terms that statisticians and mathematicians (and therefore data scientists) use. You can think of the difference between a predicted label value and the actual label value as a measure of error. However, in practice, the "actual" values are based on sample observations (which themselves may be subject to some random variance). To make it clear that we're comparing a predicted value (ŷ) with an observed value (y) we refer to the difference between them as the residuals.&lt;/p&gt;

&lt;p&gt;We can summarize the residuals for all of the validation data predictions to calculate the overall loss in the model as a measure of its predictive performance.&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Regression is a form of machine learning in which the goal is to create a model that can predict a numeric, quantifiable value, such as a price, amount, size, or another scalar number.&lt;/p&gt;

&lt;p&gt;For example, a company that rents bicycles might want to predict the expected number of rentals in a given day, based on the season, day of the week, weather conditions, and so on.&lt;/p&gt;

&lt;p&gt;Bike-sharing is very popular albeit still new and experimental. Utilizing a mobile phone, a rider can sign up online, download a phone application, locate bicycles, and rent one. This model creates an entire ecosystem where nobody needs to talk or meet in person to start enjoying this service.&lt;/p&gt;

&lt;h1&gt;
  
  
  Train and evaluate a regression model
&lt;/h1&gt;

&lt;p&gt;Regression works by establishing a relationship between variables in the data that describe characteristics (known as the features) of the thing being observed, and the variable we're trying to predict (known as the label). In this case, we're seeing information about days, so the features include things like the day of the week, month, temperature, rainfall, and so on, and the label is the number of bicycle rentals.&lt;/p&gt;

&lt;p&gt;To train the model, we start with a data sample containing the features as well as known values for the label - so in this case, we need historic data that includes dates, weather conditions, and the number of bicycle rentals. We'll then split this data sample into two subsets.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A training dataset to which we'll apply an algorithm that determines a function encapsulating the relationship between the feature values and the known label values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;validation or test dataset that we can use to evaluate the model by using it to generate predictions for the label and comparing them to the actual known label values.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The use of historic data with known label values to train a model makes regression an example of supervised machine learning.&lt;/p&gt;

&lt;h1&gt;
  
  
  A simple example
&lt;/h1&gt;

&lt;p&gt;Let's take a simple example to see how the training and evaluation process works in principle. Suppose we simplify the scenario so that we use a single feature, average daily temperature, to predict the bicycle rentals label.&lt;/p&gt;

&lt;p&gt;We start with some data that includes known values for the average daily temperature feature and the bicycle rentals label.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Temperature&lt;/th&gt;
&lt;th&gt;Rentals&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;137&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;140&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;152&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;156&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;114&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;129&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now we'll take the first five of these observations and use them to train a regression model (in reality, you'd randomly split the data into training and validation datasets - the split needs to be random to ensure that each subset is statistically similar)&lt;/p&gt;

&lt;p&gt;Our goal in training the model is to find a function (let's call it f) that we can apply to the temperature feature (which we'll call x) to calculate the rentals label (which we'll call y). In other words, we need to define the following function:&lt;/p&gt;

&lt;p&gt;f(x) = y&lt;/p&gt;

&lt;p&gt;Our training dataset looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;X&lt;/th&gt;
&lt;th&gt;Y&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;137&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;140&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;152&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's start by plotting the training values for x and y on a chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zcrufijugakvm4zvylf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zcrufijugakvm4zvylf.png" alt="Temperature vs Rentals" width="523" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we need to fit these values to a function, allowing for some random variation. You can probably see that the plotted points form an almost straight diagonal line - in other words, there's an apparent linear relationship between x and y, so we need to find a linear function that's the best fit for the data sample. There are various algorithms we can use to determine this function, which will ultimately find a straight line with minimal overall variance from the plotted points; like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezshmwvwy0i6qjru48pi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezshmwvwy0i6qjru48pi.png" alt="Temperature vs Rentals lineal" width="523" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The line represents a linear function that can be used with any value of x to apply the slope of the line and its intercept (where the line crosses the y axis when x is 0) to calculate y. In this case, if we extended the line to the left we'd find that when x is 0, y is around 20, and the slope of the line is such that for each unit of x you move along to the right, y increases by around 1.7. Our f function therefore can be calculated as 20 + 1.7x.&lt;/p&gt;

&lt;p&gt;Now that we've defined our predictive function, we can use it to predict labels for the validation data we held back and compare the predicted values (which we typically indicate with the symbol ŷ, or "y-hat") with the actual known y values.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;x&lt;/th&gt;
&lt;th&gt;Y&lt;/th&gt;
&lt;th&gt;ŷ&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;156&lt;/td&gt;
&lt;td&gt;159.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;114&lt;/td&gt;
&lt;td&gt;111.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;129&lt;/td&gt;
&lt;td&gt;125.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's see how the y and ŷ values compare in a plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzk8yfvkoglbu143ipj9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzk8yfvkoglbu143ipj9t.png" alt="Plotted Function" width="523" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The plotted points that are on the function line are the predicted ŷ values calculated by the function and the other plotted points are the actual y values.&lt;/p&gt;

&lt;p&gt;There are various ways we can measure the variance between the predicted and actual values, and we can use these metrics to evaluate how well the model predicts.&lt;/p&gt;

&lt;p&gt;One of the most common ways to measure the loss is to square the individual residuals, sum the squares, and calculate the mean. Squaring the residuals has the effect of basing the calculation on absolute values (ignoring whether the difference is negative or positive) and giving more weight to larger differences. This metric is called the Mean Squared Error.&lt;/p&gt;

&lt;p&gt;For our validation data, the calculation looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;y&lt;/th&gt;
&lt;th&gt;ŷ&lt;/th&gt;
&lt;th&gt;y - ŷ&lt;/th&gt;
&lt;th&gt;(y - ŷ)2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;156&lt;/td&gt;
&lt;td&gt;159.4&lt;/td&gt;
&lt;td&gt;-3.4&lt;/td&gt;
&lt;td&gt;11.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;114&lt;/td&gt;
&lt;td&gt;111.8&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;4.84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;129&lt;/td&gt;
&lt;td&gt;125.4&lt;/td&gt;
&lt;td&gt;3.6&lt;/td&gt;
&lt;td&gt;12.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sum&lt;/td&gt;
&lt;td&gt;∑&lt;/td&gt;
&lt;td&gt;29.36&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean&lt;/td&gt;
&lt;td&gt;x̄&lt;/td&gt;
&lt;td&gt;9.79&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We do know that the lower the value is, the less loss there is in the model and therefore, the better it is predicting. This makes it a useful metric to compare two models and find the one that performs best.&lt;/p&gt;

&lt;p&gt;Sometimes, it's more useful to express the loss in the same unit of measurement as the predicted label value itself - in this case, the number of rentals. It's possible to do this by calculating the square root of the MSE, which produces a metric known, unsurprisingly, as the Root Mean Squared Error (RMSE).&lt;/p&gt;

&lt;p&gt;√9.79 = 3.13&lt;/p&gt;

&lt;p&gt;So our model's RMSE indicates that the loss is just over 3, which you can interpret loosely as meaning that on average, incorrect predictions are wrong by around 3 rentals.&lt;/p&gt;

&lt;p&gt;Many other metrics can be used to measure loss in a regression. For example, R2 (R-Squared) (sometimes known as the coefficient of determination) is the correlation between x and y squared. This produces a value between 0 and 1 that measures the amount of variance that can be explained by the model. Generally, the closer this value is to 1, the better the model predicts.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>programming</category>
      <category>datascience</category>
    </item>
    <item>
      <title>What does data exploration mean in data science?</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Sun, 14 Feb 2021 22:34:47 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/what-does-data-exploration-mean-in-data-science-1gj3</link>
      <guid>https://dev.to/joelbuenrostro/what-does-data-exploration-mean-in-data-science-1gj3</guid>
      <description>&lt;p&gt;Data exploration and analysis is at the core of data science. Data scientists require skills in languages like Python to explore, visualize, and manipulate data.&lt;/p&gt;

&lt;p&gt;Unsurprisingly, the role of a data scientist primarily involves exploring and analyzing data. The results of this analysis might form the basis of a report or a machine learning model, but it all begins with data.&lt;/p&gt;

&lt;p&gt;Usually, a data analysis project is designed to establish insights around a particular scenario or test a hypothesis. For example, suppose a university professor collects data from data science students, including the number of lectures attended, the hours spent studying, and the final grade achieved on the end of term exam. The professor could then take a sample of the data and analyze it to determine if there is a relationship between the amount of study a student undertakes and the final grade they achieve. &lt;/p&gt;

&lt;p&gt;They might use the data to test a hypothesis that only students who study for a minimum number of hours can expect to achieve a passing grade or even prepare the data to train a machine learning model that predicts a student's grade based on their study habits.&lt;/p&gt;

&lt;p&gt;This was one of the problems to solve that was presented to the participants in IBM Behind the code 2020 in one of the eight weekly challenges with data from the Anahuac University of Mexico.&lt;/p&gt;

&lt;h1&gt;
  
  
  Explore data
&lt;/h1&gt;

&lt;p&gt;Data exploration and analysis is typically an iterative process, in which the data scientist takes a sample of data, and performs the following kinds of a task to analyze it and test hypotheses.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Clean data to handle errors, missing values, and other issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply statistical techniques to better understand the data and how the sample might be expected to represent the real-world population of data, allowing for random variation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visualize data to determine relationships between variables, and in the case of a machine learning project, identify features that are potentially predictive of the label.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Derive new features from existing ones that might better encapsulate relationships within the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Revise the hypothesis and repeat the process.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data scientists can use a variety of tools and techniques to explore, visualize, and manipulate data. One of the most common ways in which data scientists work with data is to use the Python language and some specific packages for data processing.&lt;/p&gt;

&lt;p&gt;In the following Jupyter notebook you can see an example of the analysis of an emergency call data set and how information was extracted from the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgjkzblspsaw7rifji73z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgjkzblspsaw7rifji73z.png" alt="Emergency Calls" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nbviewer.jupyter.org/github/JoelBuenrostro/100-Days-of-ML-Code/blob/master/Jupyter%20Notebooks/Emergency%20911%20calls.ipynb" rel="noopener noreferrer"&gt;Emergency Calls&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hope you liked this content, thanks for reading me, and happy coding!!!&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>jupyter</category>
      <category>python</category>
    </item>
    <item>
      <title>The Process of learning</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Wed, 10 Feb 2021 05:28:43 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/the-process-of-learning-5daf</link>
      <guid>https://dev.to/joelbuenrostro/the-process-of-learning-5daf</guid>
      <description>&lt;h1&gt;
  
  
  So much to learn, so little time
&lt;/h1&gt;

&lt;p&gt;One of the main characteristics of the information age is that information is widely available. If you want to learn something, the challenge is no longer finding information, it's filtering information.&lt;/p&gt;

&lt;p&gt;Even today, a university education is one of the best ways to get a solid grounding in technology fundamentals, but, this focus on fundamentals makes them weak when it comes to having the latest information, and the fact that they may not always be working with the latest technology means that they may also be weak when it comes to skills.&lt;/p&gt;

&lt;p&gt;While schools, in many cases, do remain a one-stop-shop for learning technology, there are now alternatives, and many of them are considerably less expensive and faster.&lt;/p&gt;

&lt;h1&gt;
  
  
  Content curation
&lt;/h1&gt;

&lt;p&gt;In a world where information is available in vast quantities and at low cost, curation has become increasingly valuable.&lt;/p&gt;

&lt;p&gt;In some cases, you simply can't understand the content because you haven't yet learned the prerequisites, the concepts and terminology on which it is based or you have to struggle at great length to understand something that you would have absorbed easily had you first taken the time to learn some fundamental concept.&lt;/p&gt;

&lt;p&gt;Here are some approaches that you can use to help define your own curriculum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;With so much information available for free, many in technology are reluctant to spend money on books, but books still excel in one area, they are a source of curation. A book provides content in a logical order where the concepts build on each other, the content of the book is curated by the author who determines which content is important and what is not, the curation along is what justified the cost of a book.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Different people prefer different types of content. Some learn by reading, some by watching videos or listening to audio content. You should consider the type of content that works best for you when creating your own learning plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;People will rarely pay you for what you know. They pay you for what you can do, for your ability to use knowledge and information to solve problems. You can read all of the programming books and watch all of the programming videos in the world, but you're not a programmer until you've written code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;One of the biggest challenges to learning on your own is finding the necessary motivation and discipline. There are so many distractions in our lives that can get in the way, so it is very important to stay focused.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A certificate on a single topic may also prove valuable and maybe the deciding factor as to whether or not you get a job. The value of a certificate also depends on who issues it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I know not everyone may agree with this, but I believe that learning and working in technology shouldn't just be about a paycheck, it should be fun. Even if you are passionate about your work, take the time if you can to play around with some other technologies.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the question now is, Where do you go for fun?&lt;/p&gt;

&lt;p&gt;This post was inspired by Dan Appleman's course, Learning Technology in the Information Age on Pluralsigth.&lt;/p&gt;

&lt;p&gt;I hope you liked this content, thanks for reading me, and happy coding!!!&lt;/p&gt;

</description>
      <category>career</category>
      <category>productivity</category>
      <category>devjournal</category>
      <category>devlive</category>
    </item>
    <item>
      <title>Install coding tools for Python development</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Sat, 06 Feb 2021 22:37:59 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/install-coding-tools-for-python-development-5eif</link>
      <guid>https://dev.to/joelbuenrostro/install-coding-tools-for-python-development-5eif</guid>
      <description>&lt;p&gt;In this post, you'll be introduced to Visual Studio Code, Python, and Jupyter Notebooks. You'll learn how to install all the software and packages you'll need to begin writing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A Windows, Mac, or Linux computer&lt;/li&gt;
&lt;li&gt;Knowledge of how to download programs from the Internet&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is code?
&lt;/h2&gt;

&lt;p&gt;Lines of code are instructions that humans give to computers to make them do things. While you may hear about how smart and amazing computers are, on their own, they are only good at one thing: following explicit instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Visual Studio Code?
&lt;/h2&gt;

&lt;p&gt;Visual Studio Code (often referred to as VS Code) is a free, open-source, extensible, code editor. We can break this description down a little further to get a better understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Code editor: A code editor is made specifically for writing, running, and debugging code. Code editors can be compared to an application like Microsoft Word, but having additional functionality such as autocomplete of code functions and the ability to execute, or run, code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open-source: Open Source Software (OSS) has its code available for anyone to explore, modify, and enhance. The main takeaway from this is that anyone, even you, can build a new feature in software such as Visual Studio Code and contribute it back to the source code for others to use. You can find the open-source Visual Studio project on GitHub&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extensible: Extensible means that something can be extended and expanded. In the context of Visual Studio Code, being extensible means that you can download, or even create, extensions to make Visual Studio Code exactly right for your work style. Think mods in a game or customizations. If the color scheme is hurting your eyes or an obscure programming language isn't supported, you can customize Visual Studio Code to add new colors or support for the language. You can find the &lt;a href="https://marketplace.visualstudio.com/VSCode" rel="noopener noreferrer"&gt;Visual Studio Code extensions on the marketplace&lt;/a&gt; or discover how to build your extension in the Visual Studio Code documentation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Install Python extension
&lt;/h2&gt;

&lt;p&gt;Click on the extension tab in Visual Studio Code to access all the Python open-source tools that'll be helpful for this learning path. The extension tab looks like three blocks in the shape of an "L" with another block floating to the right.&lt;br&gt;
If you want to, you can explore the extensions marketplace and install any extensions you'd like to have, but for this tutorial, we'll install the Python extension. When you're in the extension marketplace, go to the search bar and type "Python". Look for the extension named solely Python and published by Microsoft - it should be the first result. Click on the extension and then click "install".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2n6gfqd4p33b552yhfu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2n6gfqd4p33b552yhfu.png" alt="Python Extension" width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Install IntelliCode extension
&lt;/h2&gt;

&lt;p&gt;While you're in the extensions tab, go back to the search bar and type in "intellicode". Select the specified box. It should be the first result. Select Install. IntelliCode will recommend code when you're writing programs, which is a great tool for beginners who might not know the exact syntax.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3uc00n8ezbbalkb5szbw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3uc00n8ezbbalkb5szbw.png" alt="Intellicode Extension" width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Information about Jupyter Notebooks
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://code.visualstudio.com/docs/python/jupyter-support" rel="noopener noreferrer"&gt;Visual Studio Code Jupyter Notebooks&lt;/a&gt; has good documentation about how to use the environment. In Jupyter notebooks, you write code in cells. Click on the plus button to the left of a cell to create a new cell below the current cell. Click on the garbage can to the right to delete the selected cell and use the arrow buttons to move the cell up or down in relation to the cells around it.&lt;br&gt;
Use the green play button in each cell to run that cell. After you run a cell, a number will appear, surrounded by square brackets. This number is to help you keep track of which cells you run. This is important because, as you'll remember, you can re-run cells within a Jupyter Notebook, which might change variables or program state.&lt;br&gt;
If you look at the top of the file, you will see that you can run all cells above or below the current cell by using the play button. Finally, you can click the red pause button to force stop the program at any time.&lt;/p&gt;

&lt;p&gt;The Jupyter notebooks have been widely adopted by the data science community of practitioners to hypothesize and visualize data in a format that enables rapid prototyping of their applications.&lt;/p&gt;

&lt;p&gt;As always, thanks for reading and happy coding!!!&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>python</category>
      <category>jupyter</category>
      <category>vscode</category>
    </item>
    <item>
      <title>What is Python?</title>
      <dc:creator>Joel Buenrostro</dc:creator>
      <pubDate>Thu, 04 Feb 2021 04:54:11 +0000</pubDate>
      <link>https://dev.to/joelbuenrostro/what-is-python-473o</link>
      <guid>https://dev.to/joelbuenrostro/what-is-python-473o</guid>
      <description>&lt;p&gt;Python is one of the most popular and fastest-growing programming languages in the world. It's used for all sorts of tasks including web programming and data analysis, and it's emerged as the language to learn for machine learning. That popularity means that python developers are in demand and python programming jobs can be lucrative.&lt;/p&gt;

&lt;p&gt;Created in the early 1990s, it enjoys a wide range of uses from automating repetitive tasks and writing web apps to build machine learning models and implementing neural networks. Researchers, mathematicians, and data scientists in particular like Python because of its rich and easy-to-understand syntax and the wide range of open-source packages available. Packages are commonly used, shared code libraries that are freely available for anyone to use.&lt;/p&gt;

&lt;p&gt;Python has a simple, easy to learn syntax which emphasizes readability. Applications written in Python can run on almost any computer, including those running Windows, macOS, and popular distributions of Linux. Furthermore, the ecosystem contains a rich set of development tools for writing, debugging, and publishing Python applications.&lt;/p&gt;

&lt;p&gt;Finally, Python is supported by an active user community that is eager to help new programmers learn the &lt;em&gt;Pythonic way&lt;/em&gt; where you don't just get the syntax right, but use the language the way it was intended.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PSF
&lt;/h2&gt;

&lt;p&gt;The Python Software Foundation is an organization devoted to advancing open source technology related to the Python programming language. Also produce and underwrite the &lt;a href="https://us.pycon.org/" rel="noopener noreferrer"&gt;PyCon US Conference&lt;/a&gt;, the largest annual gathering for the Python community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing
&lt;/h2&gt;

&lt;p&gt;Installing Python is generally easy, and nowadays many Linux and UNIX distributions include a recent Python. Even some Windows computers (notably those from HP) now come with Python already installed. If you do need to install Python and aren't confident about the task you can find a few notes on the &lt;a href="http://wiki.python.org/moin/BeginnersGuide/Download" rel="noopener noreferrer"&gt;BeginnersGuide/Download&lt;/a&gt; wiki page, but installation is unremarkable on most platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learning
&lt;/h2&gt;

&lt;p&gt;Before getting started, you may want to find out which &lt;a href="http://wiki.python.org/moin/IntegratedDevelopmentEnvironments" rel="noopener noreferrer"&gt;IDEs&lt;/a&gt; and &lt;a href="http://wiki.python.org/moin/PythonEditors" rel="noopener noreferrer"&gt;text editors&lt;/a&gt; are tailored to make Python editing easy, browse the list of &lt;a href="http://wiki.python.org/moin/IntroductoryBooks" rel="noopener noreferrer"&gt;introductory books&lt;/a&gt;, or look at &lt;a href="http://wiki.python.org/moin/BeginnersGuide/Examples" rel="noopener noreferrer"&gt;code samples&lt;/a&gt; that you might find helpful.&lt;/p&gt;

&lt;p&gt;There is a list of tutorials suitable for experienced programmers on the &lt;a href="http://wiki.python.org/moin/BeginnersGuide/Programmers" rel="noopener noreferrer"&gt;BeginnersGuide/Tutorials&lt;/a&gt; page. There is also a list of &lt;a href="https://www.python.org/doc/nonenglish/" rel="noopener noreferrer"&gt;resources in other languages&lt;/a&gt; that might be useful if English is not your first language.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://docs.python.org/" rel="noopener noreferrer"&gt;online documentation&lt;/a&gt; is your first port of call for definitive information. There is a fairly brief tutorial that gives you basic information about the language and gets you started. You can follow this by looking at the library reference for a full description of Python's many libraries and the language reference for a complete (though somewhat dry) explanation of Python's syntax. If you are looking for common Python recipes and patterns, you can browse the &lt;a href="http://code.activestate.com/recipes/langs/python/" rel="noopener noreferrer"&gt;ActiveState Python Cookbook&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking to Help?
&lt;/h2&gt;

&lt;p&gt;If you want to help to develop Python, take a look at the &lt;a href="https://www.python.org/dev/" rel="noopener noreferrer"&gt;developer area&lt;/a&gt; for further information. Please note that you don't have to be an expert programmer to help. The documentation is just as important as the compiler, and still needs plenty of work!&lt;/p&gt;

&lt;p&gt;That's all for a quick introduction to the Python world and with this information now you can do your first steps in the Python community.&lt;/p&gt;

&lt;p&gt;Enjoy and happy coding!!!&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>firstpost</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
