<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jeremy Barton</title>
    <description>The latest articles on DEV Community by Jeremy Barton (@jsbwxyz).</description>
    <link>https://dev.to/jsbwxyz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F407710%2Fa7af48ae-b6a8-4893-a601-f9156bbecc72.png</url>
      <title>DEV Community: Jeremy Barton</title>
      <link>https://dev.to/jsbwxyz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jsbwxyz"/>
    <language>en</language>
    <item>
      <title>IBM Data Science Capstone: Car Accident Severity Report</title>
      <dc:creator>Jeremy Barton</dc:creator>
      <pubDate>Fri, 28 Aug 2020 16:59:44 +0000</pubDate>
      <link>https://dev.to/jsbwxyz/ibm-data-science-capstone-car-accident-severity-report-3g6j</link>
      <guid>https://dev.to/jsbwxyz/ibm-data-science-capstone-car-accident-severity-report-3g6j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction | Business Undertanding
&lt;/h2&gt;

&lt;p&gt;In an effort to reduce the frequency of car collisions in a community, an algorithim must be developed to predict the severity of an accident given the current weather, road and visibility conditions. When conditions are bad, this model will alert drivers to remind them to be more careful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Understanding
&lt;/h2&gt;

&lt;p&gt;Our predictor or target variable will be 'SEVERITYCODE' because it is used measure the severity of an accident from 0 to 5 within the dataset. Attributes used to weigh the severity of an accident are 'WEATHER', 'ROADCOND' and 'LIGHTCOND'.&lt;/p&gt;

&lt;p&gt;Severity codes are as follows:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* 0 : Little to no Probability (Clear Conditions)

&lt;ul&gt;
&lt;li&gt;1 : Very Low Probablility - Chance or Property Damage&lt;/li&gt;
&lt;li&gt;2 : Low Probability - Chance of Injury&lt;/li&gt;
&lt;li&gt;3 : Mild Probability - Chance of Serious Injury&lt;/li&gt;
&lt;li&gt;4 : High Probability - Chance of Fatality
&lt;/li&gt;
&lt;/ul&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;


Extract Dataset &amp;amp; Convert
&lt;/h3&gt;


&lt;p&gt;In it's original form, this data is not fit for analysis. For one, there are many columns that we will not use for this model. Also, most of the features are of type object, when they should be numerical type.&lt;/p&gt;

&lt;p&gt;We must use label encoding to covert the features to our desired data type.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ficadxir8nc4agrh2mv49.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ficadxir8nc4agrh2mv49.png" alt="Alt Text" width="631" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the new columns, we can now use this data in our analysis and ML models!&lt;/p&gt;

&lt;p&gt;Now let's check the data types of the new columns in our dataframe. Moving forward, we will only use the new columns for our analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fho4kty0dz8gyqkvx170w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fho4kty0dz8gyqkvx170w.png" alt="Alt Text" width="365" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Balancing the Dataset
&lt;/h4&gt;

&lt;p&gt;Our target variable SEVERITYCODE is only 42% balanced. In fact, severitycode in class 1 is nearly three times the size of class 2.&lt;/p&gt;

&lt;p&gt;We can fix this by downsampling the majority class.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fowt8a0ptc68nzkntehue.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fowt8a0ptc68nzkntehue.png" alt="Alt Text" width="283" height="55"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Perfectly balanced.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;p&gt;Our data is now ready to be fed into machine learning models.&lt;/p&gt;

&lt;p&gt;We will use the following models:&lt;/p&gt;

&lt;h5&gt;
  
  
  K-Nearest Neighbor (KNN)
&lt;/h5&gt;

&lt;p&gt;KNN will help us predict the severity code of an outcome by finding the most similar to data point within k distance.&lt;/p&gt;

&lt;h5&gt;
  
  
  Decision Tree
&lt;/h5&gt;

&lt;p&gt;A decision tree model gives us a layout of all possible outcomes so we can fully analyze the concequences of a decision. It context, the decision tree observes all possible outcomes of different weather conditions.&lt;/p&gt;

&lt;h5&gt;
  
  
  Logistic Regression
&lt;/h5&gt;

&lt;p&gt;Because our dataset only provides us with two severity code outcomes, our model will only predict one of those two classes. This makes our data binary, which is perfect to use with logistic regression.&lt;/p&gt;

&lt;p&gt;Let's get started!&lt;/p&gt;

&lt;h3&gt;
  
  
  Initialization
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Define X and y
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1a2o7uz8w0y6smfsgz1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1a2o7uz8w0y6smfsgz1q.png" alt="Alt Text" width="657" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Normalize the dataset
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Femcsck5jnf7o29dyqznl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Femcsck5jnf7o29dyqznl.png" alt="Alt Text" width="726" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Train/Test Split
&lt;/h4&gt;

&lt;p&gt;We will use 30% of our data for testing and 70% for training.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj1218bmsdj4hwcbfajhj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj1218bmsdj4hwcbfajhj.png" alt="Alt Text" width="620" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we will begin our modeling and predictions...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5hrdmvlbrfh98tzyn0am.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5hrdmvlbrfh98tzyn0am.png" alt="Alt Text" width="654" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvm860lwm2xlwxgmtx0og.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvm860lwm2xlwxgmtx0og.png" alt="Alt Text" width="654" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fksxnemhvsa6xteob1dpt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fksxnemhvsa6xteob1dpt.png" alt="Alt Text" width="679" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Results &amp;amp; Evaluation
&lt;/h2&gt;

&lt;p&gt;Now we will check the accuracy of our models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6cp9rajstv6hwewyfp0h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6cp9rajstv6hwewyfp0h.png" alt="Alt Text" width="452" height="830"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;In the beginning of this notebook, we had categorical data that was of type 'object'. This is not a data type that we could have fed through an algoritim, so label encoding was used to created new classes that were of type int8; a numerical data type.&lt;/p&gt;

&lt;p&gt;After solving that issue we were presented with another - imbalanced data. As mentioned earlier, class 1 was nearly three times larger than class 2. The solution to this was downsampling the majority class with sklearn's resample tool. We downsampled to match the minority class exactly with 58188 values each.&lt;/p&gt;

&lt;p&gt;Once we analyzed and cleaned the data, it was then fed through three ML models; K-Nearest Neighbor, Decision Tree and Logistic Regression. Although the first two are ideal for this project, logistic regression made most sense because of its binary nature.&lt;/p&gt;

&lt;p&gt;Evaluation metrics used to test the accuracy of our models were jaccard index, f-1 score and logloss for logistic regression. Choosing different k, max depth and hyparameter C values helped to improve our accuracy to be the best possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Based on historical data from weather conditions pointing to certain classes, we can conclude that particular weather conditions have a somewhat impact on whether or not travel could result in property damage (class 1) or injury (class 2).&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Web Dev &amp; Data Science: Finding My Niche</title>
      <dc:creator>Jeremy Barton</dc:creator>
      <pubDate>Sun, 26 Jul 2020 00:23:12 +0000</pubDate>
      <link>https://dev.to/jsbwxyz/web-dev-data-science-finding-my-niche-3k6b</link>
      <guid>https://dev.to/jsbwxyz/web-dev-data-science-finding-my-niche-3k6b</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I've always been one to get to the bottom of things. Solving problems and explaining complex information is something I very much enjoy doing. I decided to pursue web development because I am always fascinated by the limitless potential of the web. Copious amounts of data are passed through every single day, which prompted my interest in data science. Building scalable web applications that provide data insights to help businesses make decisions is my hope for the prospect. This is how I see my career concentrations relating to one another.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Coming around 'full circle'&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since the beginning of the current year, I've seen a vision for my professional future that involved web development, database and data science. It's the perfect trio; it just makes sense.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Web
&lt;/h4&gt;

&lt;p&gt;Building for the web is one of my passions. I aspire to create beautiful sites and user interfaces that feel as natural as the fluent motion of touching a screen. At my current skill level, I am starting to apply my knowledge to larger projects. As I grow in my web dev journey, I will learn to create more powerful, high-quality works.&lt;/p&gt;

&lt;p&gt;The other thing I admire about the web is the vast amount of information collected about people and things all over the globe.  All of this collected data is turned into information that is used to help us make decisions in the &lt;strong&gt;real world&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Database
&lt;/h4&gt;

&lt;p&gt;Data is all around us. Everything is an entity that can be broken down into attributes and turned into a dataset. This very concept blows my mind. &lt;/p&gt;

&lt;p&gt;Where there is data, there are questions. &lt;br&gt;
Where there are questions, there are always answers.&lt;/p&gt;

&lt;p&gt;There are many methods of processing this data, thus the many Database Management Systems (DBMS) options that are available. We use this software to define, manipulate, retrieve and manage data in a database. Each kind of DBMS can be better at processing the data in certain ways than another.&lt;/p&gt;

&lt;h4&gt;
  
  
  Data Science
&lt;/h4&gt;

&lt;p&gt;How we get the data is database, but how we &lt;strong&gt;use&lt;/strong&gt; it is data science.&lt;/p&gt;

&lt;p&gt;This is where I feel these concentrations come 'full circle'. The final stage in practical use of data that is making decisions based on our findings and insights. The methodology of data science involves understanding what kinds of data are required to prepare a model and evaluate a solution to the question at hand.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Limitless potential... that is the product of these fields of study.&lt;/p&gt;

&lt;p&gt;Web design, the looks. Data science, the brains. &lt;br&gt;
Database, the glue that holds them together.&lt;/p&gt;

&lt;p&gt;I am looking forward to what I will learn in the coming years.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>database</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
