<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tangram</title>
    <description>The latest articles on DEV Community by Tangram (@tangram).</description>
    <link>https://dev.to/tangram</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F4918%2F4fd7c3dc-cc65-4d3c-a37c-28bf8beacd26.png</url>
      <title>DEV Community: Tangram</title>
      <link>https://dev.to/tangram</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tangram"/>
    <language>en</language>
    <item>
      <title>Train a Machine Learning Model to Predict the Programming Language in a Code Snippet</title>
      <dc:creator>Isabella Tromba</dc:creator>
      <pubDate>Tue, 15 Feb 2022 19:35:06 +0000</pubDate>
      <link>https://dev.to/tangram/train-a-machine-learning-model-to-predict-the-programming-language-in-a-code-snippet-153d</link>
      <guid>https://dev.to/tangram/train-a-machine-learning-model-to-predict-the-programming-language-in-a-code-snippet-153d</guid>
      <description>&lt;p&gt;We are going to build a web application that has a code editor that automatically predicts the programming language of the code contained in it. This is similar to &lt;a href="https://visualstudiomagazine.com/articles/2021/09/07/vs-code-aug21.aspx"&gt;VSCode's language detection feature&lt;/a&gt; that predicts the programming language and performs automatic syntax highlighting. &lt;/p&gt;

&lt;p&gt;As a programmer I know that the following code is python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is ruby:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;say_hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="no"&gt;Hello&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;”&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this is javascript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;myFunction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="nx"&gt;hello&lt;/span&gt; &lt;span class="nx"&gt;world&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have a training dataset that we curated called &lt;code&gt;languages.csv&lt;/code&gt;. The csv file contains two columns, the first is the code snippet and the second is the programming language of the code snippet. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;code&lt;/th&gt;
&lt;th&gt;language&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;def foo(): print(“hello world”)&lt;/td&gt;
&lt;td&gt;python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;function myFunction() { console.log(“hello world”) }&lt;/td&gt;
&lt;td&gt;javascript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;def say_hello(name) return “Hello, ” + name end&lt;/td&gt;
&lt;td&gt;ruby&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We can train a machine learning model to predict the programming language contained in the code snippet by running the following command: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;tangram train --file languages.csv --target language&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The csv file &lt;a href="https://docs.google.com/spreadsheets/d/1hUwPLxbexL6BMMrAdmp_a8ARWILr6P4YQ4BJUDWXf7Y/edit?usp=sharing"&gt;&lt;code&gt;languages.csv&lt;/code&gt;&lt;/a&gt; is a small dataset of programming language snippets and their corresponding language label. You can download the full dataset &lt;a href="https://docs.google.com/spreadsheets/d/1hUwPLxbexL6BMMrAdmp_a8ARWILr6P4YQ4BJUDWXf7Y/edit?usp=sharing"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Under the hood, Tangram will take care of feature engineering, split our data into a train/test split, train a number of linear and gradient boosted decision tree models with a range of hyperparameter settings and finally evaluate all of the models and output the best one in the current directory: &lt;code&gt;languages.tangram&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, we can use this file &lt;code&gt;langauges.tangram&lt;/code&gt; to make predictions in our apps. &lt;/p&gt;

&lt;p&gt;To make a prediction in javascript, all we have to do is import the tangram library and load the model file we just trained and call the predict function on the model. &lt;/p&gt;

&lt;p&gt;Here is the code to load the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;tangram&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@tangramdotdev/tangram&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;modelUrl&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./languages.tangram&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Download the model.&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;modelResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;modelData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;modelResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Load the model.&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;tangram&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we can just call the &lt;code&gt;predict&lt;/code&gt; function, passing in the code snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;def foo(): print("hello world")&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="c1"&gt;// Make a prediction&lt;/span&gt;
&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We said we wanted to make this a react component that renders a code editor. Here is the full example code that contains usage of the &lt;a href="https://ace.c9.io"&gt;Ace code editor&lt;/a&gt;. Every time the code changes in the editor, we call model.predict, passing in the new code string contained in the editor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;tangram&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@tangramdotdev/tangram&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;modelUrl&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./languages.tangram&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Download the model.&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;modelResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;modelData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;modelResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="c1"&gt;// Load the model.&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;tangram&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setCode&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setLanguage&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;onChange&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newCode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;setLanguage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newCode&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;className&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;setCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newCode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;`Detected language: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;AceEditor&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;onChange&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;onChange&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;div&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;ReactDOM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;App&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Under the Hood
&lt;/h2&gt;

&lt;p&gt;With Tangram, we were able to train a model with just a single command on the command line. In the following section, we will learn more about what Tangram is actually doing under the hood. &lt;/p&gt;

&lt;h3&gt;
  
  
  Tokenization
&lt;/h3&gt;

&lt;p&gt;The first step in making the code into features is called Tokenization, where we will split the code into individual tokens. One strategy of splitting a stream of characters into chunks of characters called &lt;code&gt;tokens&lt;/code&gt; is using whitespaces. &lt;/p&gt;

&lt;p&gt;Here is our python code tokenized by using whitespace as a token delimitter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;token 1&lt;/th&gt;
&lt;th&gt;token 2&lt;/th&gt;
&lt;th&gt;token 3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;def&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;foo():&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;print("hello world")&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This isn't so great because the string that is being printed "hello world" is part of the same token as the print function.  &lt;/p&gt;

&lt;p&gt;Another strategy of splitting characters into tokens is by using all non-alphanumeric characters as token boundaries. Here is our python code tokenized using this strategy:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;token 1&lt;/th&gt;
&lt;th&gt;token 2&lt;/th&gt;
&lt;th&gt;token 3&lt;/th&gt;
&lt;th&gt;token 4&lt;/th&gt;
&lt;th&gt;token 5&lt;/th&gt;
&lt;th&gt;token 6&lt;/th&gt;
&lt;th&gt;token 7&lt;/th&gt;
&lt;th&gt;token 8&lt;/th&gt;
&lt;th&gt;token 9&lt;/th&gt;
&lt;th&gt;token 10&lt;/th&gt;
&lt;th&gt;token 11&lt;/th&gt;
&lt;th&gt;token 12&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;def&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;foo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;(&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;:&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;print&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;(&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hello&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;world&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For code, splitting on punctuation is better because now the &lt;code&gt;print&lt;/code&gt; function name is no longer in the same token as the string we want to print. So our machine learning model can learn that the word &lt;code&gt;print&lt;/code&gt; is associated with the python language. (Of course, the string &lt;code&gt;print&lt;/code&gt; can and will appear in other programming languages as well.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature Engineering
&lt;/h3&gt;

&lt;p&gt;This is a great first step, but we still don’t have something that we can pass to a machine learning model. Remember, things we can pass to machine learning models are numbers (integers and floats) and what we still have is strings. &lt;/p&gt;

&lt;p&gt;What we can do is turn every token into its own feature. For each token, we ask, does our input code  contain this token? If the answer is yes, we assign a feature value of 1. If the answer is no, we assign a feature value of 0. This is called "Bag of Words" encoding. It's called "Bag of Words" encoding because after tokenization, we just treat everything as a bag of words, completely ingoring the structure and order that those words may have appeared in the original code snippet. &lt;/p&gt;

&lt;p&gt;To illustrate this better, the following two code snippets produce the exact same features:&lt;/p&gt;

&lt;p&gt;Jumbled python code snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"hello)def:world"&lt;/span&gt;
&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Regular python code snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
  &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One way to make the machine learning model aware of the structure of the code is through ngrams. Commonly used ngrams are bigrams and trigrams. To make bigrams from our token stream, we just combine all adjacent unigrams.&lt;/p&gt;

&lt;p&gt;Unigram token features:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;token 1&lt;/th&gt;
&lt;th&gt;token 2&lt;/th&gt;
&lt;th&gt;token 3&lt;/th&gt;
&lt;th&gt;token 4&lt;/th&gt;
&lt;th&gt;token 5&lt;/th&gt;
&lt;th&gt;token 6&lt;/th&gt;
&lt;th&gt;token 7&lt;/th&gt;
&lt;th&gt;token 8&lt;/th&gt;
&lt;th&gt;token 9&lt;/th&gt;
&lt;th&gt;token 10&lt;/th&gt;
&lt;th&gt;token 11&lt;/th&gt;
&lt;th&gt;token 12&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;def&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;foo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;(&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;:&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;print&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;(&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hello&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;world&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Bigram token Features: &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;token 1&lt;/th&gt;
&lt;th&gt;token 2&lt;/th&gt;
&lt;th&gt;token 3&lt;/th&gt;
&lt;th&gt;token 4&lt;/th&gt;
&lt;th&gt;token 5&lt;/th&gt;
&lt;th&gt;token 6&lt;/th&gt;
&lt;th&gt;token 7&lt;/th&gt;
&lt;th&gt;token 8&lt;/th&gt;
&lt;th&gt;token 9&lt;/th&gt;
&lt;th&gt;token 10&lt;/th&gt;
&lt;th&gt;token 11&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;def foo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;foo (&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;( )&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;):&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;: print&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;print(&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;("&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"hello&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hello world&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;world"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;")&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can see how now we have features that capture some of the structure of our code. If you really want machine learning to capture structure you can use some deep learning techniques, but that is out of scope for this tutorial.&lt;/p&gt;

&lt;p&gt;So far, in our bag of words encoding, we are using a binary count method. If the token is present in the string, we assign a feature value of 1 and 0 otherwise. There are other feature weighting strategies that we can use. For instance, we can use a counting strategy where we count the number of times each token appear in the text. We can also use a strategy called &lt;a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf"&gt;tf-idf&lt;/a&gt; that downweights frequently occurring tokens.&lt;/p&gt;

&lt;p&gt;By default, Tangram chooses a feature engineering strategy based on the input data. But you can completely configure which strategy you want to use by passing a config file to the command line:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tangram train --file languages.csv --target language --config config.json&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To learn about all of the options to customize training, check out the tangram docs on custom configuration :&lt;a href="https://www.tangram.dev/docs/guides/train_with_custom_configuration"&gt;https://www.tangram.dev/docs/guides/train_with_custom_configuration&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training a Hyperparameter Grid
&lt;/h2&gt;

&lt;p&gt;Finally, Tangram trains a number of machine learning models including linear models and gradient boosted decision trees and chooses the best model based on a hold-out comparison dataset. Since we are training a multiclass classifier, the metric we use to choose the best model is &lt;code&gt;accuracy&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;And thats it!&lt;/p&gt;

&lt;p&gt;In this tutorial, we showed how we can train a machine learning model to predict the programming language contained in a code snippet and then use that model in a react app to predict the code contained in a code editor. &lt;/p&gt;

&lt;p&gt;Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;tangram train&lt;/code&gt; to train a model from a CSV file on the command line.&lt;/li&gt;
&lt;li&gt;Make predictions with libraries for &lt;a href="https://hex.pm/packages/tangram"&gt;Elixir&lt;/a&gt;, &lt;a href="https://pkg.go.dev/github.com/tangramdotdev/tangram-go"&gt;Go&lt;/a&gt;, &lt;a href="https://www.npmjs.com/package/@tangramdotdev/tangram"&gt;JavaScript&lt;/a&gt;, &lt;a href="https://packagist.org/packages/tangram/tangram"&gt;PHP&lt;/a&gt;, &lt;a href="https://pypi.org/project/tangram"&gt;Python&lt;/a&gt;, &lt;a href="https://rubygems.org/gems/tangram"&gt;Ruby&lt;/a&gt;, and &lt;a href="//lib.rs/tangram"&gt;Rust&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;tangram app&lt;/code&gt; to learn more about your models and monitor them in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Head over to &lt;a href="https://www.tangram.dev"&gt;https://www.tangram.dev&lt;/a&gt; and give it a try!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>react</category>
      <category>javascript</category>
      <category>node</category>
    </item>
    <item>
      <title>Writing the fastest GBDT libary in Rust</title>
      <dc:creator>Isabella Tromba</dc:creator>
      <pubDate>Tue, 11 Jan 2022 19:30:05 +0000</pubDate>
      <link>https://dev.to/tangram/writing-the-fastest-gbdt-libary-in-rust-197k</link>
      <guid>https://dev.to/tangram/writing-the-fastest-gbdt-libary-in-rust-197k</guid>
      <description>&lt;p&gt;In this post, we will go over how we optimized our Gradient Boosted Decision Tree library. This is based on a talk that we gave at RustConf 2021: &lt;a href="https://www.youtube.com/watch?v=D1NAREuicNs" rel="noopener noreferrer"&gt;Writing the Fastest Gradient Boosted Decision Tree Library in Rust&lt;/a&gt;. The code is available on &lt;a href="https://github.com/tangramdotdev/tangram/tree/main/crates/tree" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The content of this post is organized into following sections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
What are GBDTs.&lt;/li&gt;
&lt;li&gt;
Use Rayon to Parallelize.&lt;/li&gt;
&lt;li&gt;
Use cargo-flamegraph to find bottlenecks.&lt;/li&gt;
&lt;li&gt;
Use cargo-asm to identify suboptimal code generation.&lt;/li&gt;
&lt;li&gt;
Use intrinsics to optimize for specific CPUs.&lt;/li&gt;
&lt;li&gt;
Use unsafe code to eliminate unnecessary bounds checks.&lt;/li&gt;
&lt;li&gt;
Use unsafe code to parallelize non-overlapping memory access.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What are GBDTs?
&lt;/h2&gt;

&lt;p&gt;GBDT stands for Gradient Boosted Decision Tree. GBDTs are a type of machine learning model that perform incredibly well on tabular data. Tabular data is data that you would normally find in a spreadsheet or csv.&lt;/p&gt;

&lt;p&gt;In order to get a feeling for how GBDT's work, let's go through an example of making a prediction with a single decision tree. Let’s say you want to predict the price of a house based on features like the number of bedrooms, bathrooms, and square footage. Here is a table with 3 features &lt;code&gt;num_bedrooms&lt;/code&gt;, &lt;code&gt;num_bathrooms&lt;/code&gt; and &lt;code&gt;sqft&lt;/code&gt;. The final column called &lt;code&gt;price&lt;/code&gt; is what we are trying to predict.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;num_bedrooms&lt;/th&gt;
&lt;th&gt;num_bathrooms&lt;/th&gt;
&lt;th&gt;sqft&lt;/th&gt;
&lt;th&gt;price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1200&lt;/td&gt;
&lt;td&gt;$300k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3.5&lt;/td&gt;
&lt;td&gt;2300&lt;/td&gt;
&lt;td&gt;$550k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2200&lt;/td&gt;
&lt;td&gt;$450k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;7000&lt;/td&gt;
&lt;td&gt;$990k&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To make a prediction with a decision tree, you start at the top of the tree, and at each branch you ask how one of the features compares with a threshold. If the value is less than or equal to the threshold, you go to the left child. If the value is greater, you go to the right child. When you reach a leaf, you have the prediction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5rmpcxip4db9g7369l9p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5rmpcxip4db9g7369l9p.png" alt="Example Decision Tree"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s make an example prediction. We have a house with 3 bedrooms, 3 bathrooms, and 2500 square feet. Let’s see what price our decision tree predicts.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;num_bedrooms&lt;/th&gt;
&lt;th&gt;num_bathrooms&lt;/th&gt;
&lt;th&gt;sqft&lt;/th&gt;
&lt;th&gt;price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2500&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Starting at the top, the number of bedrooms is 3 which is less than or equal to 3, so we go left. The square footage is 2500 which is greater than 2400, so we go right, and we arrive at the prediction which is $512K.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6itoqd91ndlmjj42bvvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6itoqd91ndlmjj42bvvc.png" alt="Example Prediction Single Decision Tree"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A single decision tree isn’t very good at making predictions on its own, so we train a bunch of trees, one at a time, where each tree predicts the error in the sum of the outputs of the trees before it. This is called gradient boosting over decision trees!&lt;/p&gt;

&lt;p&gt;Making a prediction with gradient boosted decision trees is easy. We start with a baseline prediction which in the case of regression (predicting a continuous value like the price of a home) is just the average price of houses in our dataset. Then, we run the process we described for getting a prediction out of a single decision tree for each tree and sum up the outputs. In this example, the prediction is $340K.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusk70j5i1g7tqozliml1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusk70j5i1g7tqozliml1.png" alt="Gradient Boosted Decision Tree"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To learn more about GBDT's check out the &lt;a href="https://en.wikipedia.org/wiki/Gradient_boosting" rel="noopener noreferrer"&gt;wikipedia article on gradient boosting&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Rayon to Parallelize
&lt;/h2&gt;

&lt;p&gt;So now that we know a little about GBDT's, let's talk about how we made our code fast. The first thing we did was parallelize our code. &lt;a href="https://github.com/rayon-rs/rayon" rel="noopener noreferrer"&gt;Rayon&lt;/a&gt; makes this really easy. Rayon is a data parallelism library for Rust that makes converting sequential operations into parallel ones extremely easy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqjlktm8ncvqkn5k5w2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqjlktm8ncvqkn5k5w2a.png" alt="Matrix"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The process of training trees takes in a matrix of training data which is n_rows by n_features.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovc2s3df93fqy3rfgdki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovc2s3df93fqy3rfgdki.png" alt="Matrix with column-wise parallelization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To decide which feature to use at each node, we need to compute a score for each feature. We do this by iterating over each column in the matrix. The following is a sequential iteration over the columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="nf"&gt;.columns&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// compute the score of branching on this feature&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can parallelize this code with Rayon. All we have to do is change the call to &lt;code&gt;iter&lt;/code&gt; to &lt;code&gt;par_iter&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="nf"&gt;.columns&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.par_iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// compute the score of branching on this feature&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rayon will keep a thread pool around and schedule items from your iterator to be processed in parallel. Parallelizing over the features works well when the number of features is larger than the number of cores on your computer. When the number of features is smaller than the number of logical cores in your computer, parallelizing over the features is not as efficient. This is because some of our cores will be sitting idle so we will not be using all of the compute power available to us. You can see this clearly in the image below. Cores 1 through 4 have work to do because they have features 1 through 4 assigned to them. Cores 5 through 8 though are sitting idle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvsdihtt824l4bzedh15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvsdihtt824l4bzedh15.png" alt="Core utilization with column-wise parallelization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this situation, we can parallelize over chunks of rows instead and make sure we have enough chunks so that each core has some work to do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzc25shmhs0dk3p55xqj9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzc25shmhs0dk3p55xqj9.png" alt="Matrix with row-wise parallelization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each core now has some rows assigned to it, and no core is sitting idle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frsrl0p4ri8ygrg37dtlg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frsrl0p4ri8ygrg37dtlg.png" alt="Core utilization with row-wise parallelization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Distributing the work across rows is super easy with Rayon as well. We just use the combinator &lt;code&gt;par_chunks&lt;/code&gt;!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="nf"&gt;.rows&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.par_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// process the chunk&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are just a couple of the combinators available in Rayon. There are a lot of other high-level combinators that make it easy to express complex parallel computations. Check out &lt;a href="https://github.com/rayon-rs/rayon" rel="noopener noreferrer"&gt;Rayon&lt;/a&gt; on GitHub to learn more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cargo-flamegraph to find bottlenecks
&lt;/h2&gt;

&lt;p&gt;Next, we used &lt;a href="https://github.com/flamegraph-rs/flamegraph" rel="noopener noreferrer"&gt;cargo-flamegraph&lt;/a&gt; to find where most of the time was being spent. &lt;code&gt;Cargo-flamegraph&lt;/code&gt; makes it easy generate flamegraphs and integrates elegantly with cargo. You can install it with &lt;code&gt;cargo install&lt;/code&gt;, then run &lt;code&gt;cargo flamegraph&lt;/code&gt; to run your program and generate a flamegraph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install -y linux-perf
cargo install flamegraph
cargo flamegraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is a simple example with a program that calls two subroutines, &lt;code&gt;foo&lt;/code&gt; and &lt;code&gt;bar&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we run &lt;code&gt;cargo flamegraph&lt;/code&gt; we get an output that looks like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqpexb982pcvecfpy0w1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqpexb982pcvecfpy0w1.png" alt="Flamegraph"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It contains a lot of extra functions that you have to sort through, but it boils down to something like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foe2o79c0aody6s3qbtb1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foe2o79c0aody6s3qbtb1.png" alt="Simplified Flamegraph"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The y-axis of the graph is the call stack, and the x axis is duration. The bottom of the graph shows that the entire duration of the program was spent in the main function. Above that, you see that the main function’s time is broken up between calls to &lt;code&gt;foo&lt;/code&gt; and &lt;code&gt;bar&lt;/code&gt;, and that about two thirds of the time was spent in &lt;code&gt;foo&lt;/code&gt; and its subroutines and about one third of the time spent in bar and its subroutines.&lt;/p&gt;

&lt;p&gt;In our code for training decision trees, the flamegraph showed one function where the majority of the time was spent. In this function, we maintain an array of the numbers &lt;code&gt;0&lt;/code&gt; to &lt;code&gt;n&lt;/code&gt; that we call &lt;code&gt;indexes&lt;/code&gt;, and at each iteration of training we rearrange it. Then, we access an array of the same length, called &lt;code&gt;values&lt;/code&gt;, but in the order of the indexes in the &lt;code&gt;indexes&lt;/code&gt; array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// rearrange indexes&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This results in accessing each item in the &lt;code&gt;values&lt;/code&gt; array out of order. We will refer back to this function throughout the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cargo-asm to identify suboptimal code generation
&lt;/h2&gt;

&lt;p&gt;From the flamegraph, we knew which function was taking the majority of the time, which we briefly described above. We started by looking at the assembly code it generated to see if there were any opportunities to make it faster. We did this with &lt;a href="https://github.com/gnzlbg/cargo-asm" rel="noopener noreferrer"&gt;cargo-asm&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;cargo-asm
cargo asm &lt;span class="nt"&gt;--rust&lt;/span&gt; path::to::a::function
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Like &lt;code&gt;cargo-flamegraph&lt;/code&gt;, &lt;code&gt;cargo-asm&lt;/code&gt; is really easy to install and integrates nicely with cargo. You can install it with &lt;code&gt;cargo install&lt;/code&gt;, and run it as a cargo subcommand.&lt;/p&gt;

&lt;p&gt;Here is a simple example with a function that adds two numbers and multiplies the result by two.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;sum_times_two&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;sum_times_two&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;sum_times_two&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we run &lt;code&gt;cargo asm&lt;/code&gt;, we get an output that looks like this. It shows the assembly instructions alongside the rust code that generated them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// pub fn sum_times_two(x: i32, y: i32) -&amp;gt; i32 {&lt;/span&gt;
&lt;span class="c1"&gt;// let sum = x + y;&lt;/span&gt;
&lt;span class="n"&gt;add&lt;/span&gt;        &lt;span class="n"&gt;edi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;esi&lt;/span&gt;
&lt;span class="c1"&gt;// let sum_times_two = sum * 2;&lt;/span&gt;
&lt;span class="n"&gt;lea&lt;/span&gt;        &lt;span class="n"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rdi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rdi&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;//}&lt;/span&gt;
&lt;span class="n"&gt;ret&lt;/span&gt;
&lt;span class="c1"&gt;//}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that due to all the optimizations the compiler does, there is often not a perfect correlation from the rust code to the assembly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we looked at the assembly for this loop, we were surprised to find an &lt;a href="https://www.felixcloutier.com/x86/imul" rel="noopener noreferrer"&gt;&lt;code&gt;imul&lt;/code&gt;&lt;/a&gt; instruction, which is an integer multiplication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;imul     rax, qword, ptr, [r8, +, 16]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What is that doing in our code? We are just indexing into an array of f32’s. f32’s are 4 bytes each, so the compiler should be able to get the address of the ith item by multiplying i * 4. Multiplying by four is the same as shifting i left by two. Shifting left by two is much faster than integer multiplication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why would the compiler not be able to produce a shift left instruction? Well, the &lt;code&gt;values&lt;/code&gt; array is a column in a matrix and a matrix can be stored either in row major or column major order. This means that indexing into the column might require multiplying by the number of columns in the matrix, which is unknown at compile time. But since we were storing our matrix in column major order, we could eliminate the multiplication, but we have to convince the compiler of this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.as_slice&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// for index in indexes {&lt;/span&gt;
  &lt;span class="c1"&gt;// let mut value = valuues.get_mut(index);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="c1"&gt;// }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We did this by casting the &lt;code&gt;values&lt;/code&gt; array to a slice. This convinced the compiler that the &lt;code&gt;values&lt;/code&gt; array was contiguous, so it could access items using the shift left instruction, instead of integer multiplication.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use intrinsics to optimize for specific CPUs
&lt;/h2&gt;

&lt;p&gt;Next, we used compiler intrinsics to optimize for specific CPUs. Intrinsics are special functions that hint to the compiler to generate specific assembly code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3p0bzmaso7dvo8vhgem.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3p0bzmaso7dvo8vhgem.png" alt="Values array out of order access"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Remember how we noticed that this code results in accessing the &lt;code&gt;values&lt;/code&gt; array out of order? This is really bad for cache performance, because CPUs assume you are going to access memory in order. If a value isn’t in cache, the CPU has to wait until it is loaded from main memory, making your program slower.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbo9bqiv8bwn2ybuz4o1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbo9bqiv8bwn2ybuz4o1.png" alt="Cache Hierarchy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, we know which values we are going to be accessing a few iterations of the loop in the future. We know this because the indexes are given by the &lt;code&gt;indexes&lt;/code&gt; array. So 10 iterations in the future, we will be accessing &lt;code&gt;values[indexes[current_index + 10]]&lt;/code&gt;. We can hint to x86_64 CPU’s to prefetch those values into cache using the &lt;a href="https://doc.rust-lang.org/beta/core/arch/x86_64/fn._mm_prefetch.html" rel="noopener noreferrer"&gt;mm_prefetch&lt;/a&gt; intrinsic. We experimented with different values of the &lt;code&gt;OFFSET&lt;/code&gt; until we got the best performance. If the &lt;code&gt;OFFSET&lt;/code&gt; is too small, the CPU will still have to wait for the data, if the &lt;code&gt;OFFSET&lt;/code&gt; is too large, data that the CPU needs might be evicted and by the time the CPU gets to the iteration that needs the data, it will no longer be there. The best offset will vary depending on your computer's hardware so it can be more of an art than a science.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;mm_prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.offset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;OFFSET&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="c1"&gt;// do something with value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are interested in cache performance and writing performant code, check out &lt;a href="https://www.youtube.com/watch?v=rX0ItVEVjHc" rel="noopener noreferrer"&gt;Mike Acton's talk on Data-Oriented Design&lt;/a&gt; and &lt;a href="https://media.handmade-seattle.com/practical-data-oriented-design/" rel="noopener noreferrer"&gt;Andrew Kelly's recent talk at Handmade Seattle&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use unsafe code to eliminate unnecessary bounds checks
&lt;/h2&gt;

&lt;p&gt;Next, we used a touch of unsafe to remove some unnecessary bounds checks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// for index in indexes {&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="c1"&gt;//}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most of the time, the compiler can eliminate bounds checks when looping over values in an array. However, in this code, it has to check that &lt;code&gt;index&lt;/code&gt; is within the bounds of the values array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// for index in indexes {&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_unchecked_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="c1"&gt;// }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But we said at the beginning that the indexes array is a permutation of the values 0 to n. This means the bounds check is unnecessary. We can fix this by replacing &lt;code&gt;get_mut&lt;/code&gt; with &lt;code&gt;get_unchecked_mut&lt;/code&gt;. We have to use unsafe code here, because Rust provides no way to communicate to the compiler that the values in the &lt;code&gt;indexes&lt;/code&gt; array are always in bounds of the &lt;code&gt;values&lt;/code&gt; array.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use unsafe code to parallelize non-overlapping memory access
&lt;/h2&gt;

&lt;p&gt;Finally, we parallelized the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But is it even possible to parallelize? At first glance, it seems the answer is no, because we are accessing the &lt;code&gt;values&lt;/code&gt; array mutably in the body of the loop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="nf"&gt;.par_iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.for_each&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we try it, the compiler will give us an error indicating overlapping borrows. However, the &lt;code&gt;indexes&lt;/code&gt; array is a permutation of the values 0 to n, so we know that the access into the &lt;code&gt;values&lt;/code&gt; array is never overlapping.&lt;/p&gt;

&lt;p&gt;We can parallelize our code using unsafe Rust, wrapping a pointer to the values in a struct and unsafely marking it as Send and Sync.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nf"&gt;ValuesPtr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;f32&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Send&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ValuesPtr&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Sync&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ValuesPtr&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ValuesPtr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="nf"&gt;.par_iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.for_each&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="na"&gt;.0&lt;/span&gt;&lt;span class="nf"&gt;.get_unchecked_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, going back to the code we started out with...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.get_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, when combining the four optimizations together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Making sure that the &lt;code&gt;values&lt;/code&gt; array is a contiguous slice.&lt;/li&gt;
&lt;li&gt;Prefetching values so they are in cache.&lt;/li&gt;
&lt;li&gt;Removing bounds checks because we know the indexes are always in bounds.&lt;/li&gt;
&lt;li&gt;Parallelizing over the indexes because we know they never overlap.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;this is the code we get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nf"&gt;ValuesPtr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;f32&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Send&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ValuesPtr&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Sync&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ValuesPtr&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ValuesPtr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.as_slice&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="nf"&gt;.par_iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.for_each&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;mm_prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.offset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;OFFSET&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="na"&gt;.0&lt;/span&gt;&lt;span class="nf"&gt;.get_unchecked_mut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// mutate the value&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are our benchmarks on training time comparing Tangram's Gradient Boosted Decision Tree Library to &lt;a href="https://lightgbm.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;LightGBM&lt;/a&gt;, &lt;a href="https://xgboost.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;XGBoost&lt;/a&gt;, &lt;a href="https://catboost.ai/" rel="noopener noreferrer"&gt;CatBoost&lt;/a&gt;, and &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html" rel="noopener noreferrer"&gt;sklearn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmj5zejlprw56hohygd8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmj5zejlprw56hohygd8w.png" alt="Training Time Benchmarks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To see all the benchmarks, head over to &lt;a href="https://tangram.dev/benchmarks" rel="noopener noreferrer"&gt;https://tangram.dev/benchmarks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you are interested in reading the code or giving us a star, the project is available on &lt;a href="https://github.com/tangramdotdev/tangram" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>machinelearning</category>
      <category>performance</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>What machine learning can learn from Ruby on Rails</title>
      <dc:creator>Isabella Tromba</dc:creator>
      <pubDate>Mon, 10 Jan 2022 21:47:52 +0000</pubDate>
      <link>https://dev.to/tangram/what-machine-learning-can-learn-from-ruby-on-rails-4epg</link>
      <guid>https://dev.to/tangram/what-machine-learning-can-learn-from-ruby-on-rails-4epg</guid>
      <description>&lt;p&gt;I wrote my first end-to-end functioning web application using Ruby on Rails in &lt;a href="https://stellar.mit.edu/S/course/6/sp13/6.170/index.html"&gt;a class at MIT (6.170)&lt;/a&gt; in 2013. There were things that Rails automatically handled for me that I didn’t even realize were hard to do. Running &lt;code&gt;rails new&lt;/code&gt; just set up a completely functioning application. I never had to consider all of the components I would need to string together. Database migrations, routing, run and deploy scripts, tests, handling static assets, and more worked out of the box and the documentation clearly described how to build every part of my application. In fact, I assumed that writing web applications should always be this easy because I had never tried to write one from scratch. I was the beginner benefiting from my own ignorance that DHH talks about in &lt;a href="https://rubyonrails.org/doctrine/"&gt;The Rails Doctrine&lt;/a&gt;!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But beyond the productivity gains for experts, conventions also lower the barriers of entry for beginners. There are so many conventions in Rails that a beginner doesn’t even need to know about, but can just benefit from in ignorance. It’s possible to create great applications without knowing why everything is the way it is.&lt;/p&gt;

&lt;p&gt;That’s not possible if your framework is merely a thick textbook and your new application a blank piece of paper. It takes immense effort to even figure out where and how to start. Half the battle of getting going is finding a thread to pull.&lt;/p&gt;

&lt;p&gt;- DHH, The Rails Doctrine&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A couple of years later, as a machine learning engineer at Slack, getting machine learning into production felt a lot more like "the framework as a thick textbook" and my application as "a blank piece of paper" that DHH talks about in the Rails Doctrine.&lt;/p&gt;

&lt;p&gt;To make things even worse, try googling “how to learn machine learning”. The steps involved start looking like the curriculum required to obtain a PhD in Statistics, Math, and Computer Science.&lt;/p&gt;

&lt;p&gt;The problems don’t end once you have successfully trained a model. You still have to figure out how to get your model into production. The code you wrote in your jupyter notebook needs to be translated into code that can be deployed. An entirely new job called “Machine Learning Engineer” was created just to solve this problem.&lt;/p&gt;

&lt;p&gt;In the Rails Doctrine, there is a section on “Value Integrated Systems”. DHH says that Rails is “A whole system that addresses an entire problem.”&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Rails can be used in many contexts, but its first love is the making of integrated systems: Majestic monoliths! A whole system that addresses an entire problem. This means Rails is concerned with everything from the front-end JavaScript needed to make live updates to how the database is migrated from one version to another in production.&lt;/p&gt;

&lt;p&gt;That’s a very broad scope, as we’ve discussed, but no broader than to be realistic to understand for a single person. Rails specifically seeks to equip generalist individuals to make these full systems. Its purpose is not to segregate specialists into small niches and then require whole teams of such in order to build anything of enduring value.&lt;/p&gt;

&lt;p&gt;- DHH, The Rails Doctrine&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One sentence in that section really stuck out to me: "Its [Rails'] purpose is not to segregate specialists into small niches and then require whole teams of such in order to build anything of enduring value". Today, this is exactly what companies are doing to get machine learning into production. They are required to assemble a team of specialists including Data Scientists, Machine Learning Engineers, Backend Engineers and Ops teams.&lt;/p&gt;

&lt;p&gt;It would be great if we had something like Ruby on Rails for machine learning: a single system that provides the tools you need to go from data to a deployed machine learning model.  Just as DHH says "rails specifically seeks to equip generalist individuals to make these full system", we need tools to equip generalist programmers, like front-end javascript engineers or back-end ruby programmers, to build full machine learning systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Tangram
&lt;/h2&gt;

&lt;p&gt;Tangram is an all-in-one automated machine learning framework that makes it easy to add machine learning to your applications. Predictions happens directly in your existing applications so there are no network requests and there is no need to set up a separate service to serve your models.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;tangram train&lt;/code&gt; to train a model from a CSV file on the command line.&lt;/li&gt;
&lt;li&gt;Make predictions with bindings for &lt;a href="https://rubygems.org/gems/tangram"&gt;Ruby&lt;/a&gt;, &lt;a href="https://pypi.org/project/tangram"&gt;Python&lt;/a&gt;, &lt;a href="https://pkg.go.dev/github.com/tangramdotdev/tangram-go"&gt;Golang&lt;/a&gt;, &lt;a href="https://hex.pm/packages/tangram"&gt;Elixir&lt;/a&gt;, &lt;a href="https://www.npmjs.com/package/@tangramdotdev/tangram"&gt;Javascript&lt;/a&gt;, &lt;a href="https://packagist.org/packages/tangram/tangram"&gt;PHP&lt;/a&gt;, or &lt;a href="https://lib.rs/tangram"&gt;Rust&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;tangram app&lt;/code&gt; to start a web application where you can learn more about your models and monitor them in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can check out the &lt;a href="https://rubygems.org/gems/tangram"&gt;Tangram Ruby Gem&lt;/a&gt;. We built it using Ruby FFI and the source is available on our &lt;a href="https://github.com/tangramdotdev/tangram/tree/main/languages/ruby"&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Tangram is a new project and there is a lot of work ahead. We’d love to get your feedback. Check out the project on &lt;a href="https://github.com/tangramdotdev/tangram"&gt;GitHub&lt;/a&gt;, and let us know what you think! If you like what we are working on, &lt;a href="https://github.com/tangramdotdev/tangram"&gt;give us a star&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ruby</category>
      <category>rails</category>
    </item>
  </channel>
</rss>
