<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Darth Espressius</title>
    <description>The latest articles on DEV Community by Darth Espressius (@_aadidev).</description>
    <link>https://dev.to/_aadidev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F407894%2Fdadcb23f-f41f-4de3-a6e7-9064ed51f251.jpg</url>
      <title>DEV Community: Darth Espressius</title>
      <link>https://dev.to/_aadidev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_aadidev"/>
    <language>en</language>
    <item>
      <title>Graph Features for Graph Machine Learning</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Tue, 17 May 2022 22:58:31 +0000</pubDate>
      <link>https://dev.to/_aadidev/graph-features-for-graph-machine-learning-ajh</link>
      <guid>https://dev.to/_aadidev/graph-features-for-graph-machine-learning-ajh</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/_aadidev/node-features-for-graph-machine-learning-30el"&gt;previous post&lt;/a&gt;, we went through what a Graph is in the context of math, as well as node-based features used for machine-learning on graphs.&lt;br&gt;
&lt;strong&gt;TLDR:&lt;/strong&gt; a graph is a set of nodes connected by edges, both of with can contain features. Graph ML is concerned with using graph-based representations to infer these features on new graphs, or in some cases to learn structures existing within a graph. &lt;/p&gt;

&lt;p&gt;The previous post covered features which assumed that we wanted to perform inference on the nodes or edges of a graph. Some applications however, tend to warrant entire-graph operations; For example, predicting whether a new molecule is toxic or whether a new protein is compatible with particular enzymes. &lt;/p&gt;

&lt;p&gt;To this end, there are three common approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bag of nodes&lt;/li&gt;
&lt;li&gt;Weisfeiler-Lehman Kernel&lt;/li&gt;
&lt;li&gt;Graphlets&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bag of Nodes
&lt;/h2&gt;

&lt;p&gt;This is the simplest method. Summary statistics, such as histograms, node degree, centrality measures from node-level operations can be aggregated and used as a graph-level representation. However, this is entirely based on node-level data, which means that larger, structural features of the graph may be missed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weisfeiler-Lehman Kernel
&lt;/h2&gt;

&lt;p&gt;The idea behind this method is to iteratively aggregate node-level information, which contain data extending past their local &lt;em&gt;ego&lt;/em&gt; graph (their immediate neighbourhood). This method can get mathsy quite quickly;&lt;/p&gt;

&lt;p&gt;A label is assigned to each node, such as the node-degree. A hash-function then iteratively assigns each node a new label using the multi-set of current labels within the current node's neighbourhood (multi-set, since some neighbours may have the same degree). This is run a fixed number of time (depending on graph-size and how much data we wish to capture). Each node now encodes the structure of its neighbourhood, which can then be summarized for further processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graphlets
&lt;/h2&gt;

&lt;p&gt;This is a typically combinatoric-ally difficult problem, since it analyses different possible subgraph structures existing in a graph (called graphlets). A graphlet kernel would encode the number of times graphlets of a certain type occur in a graph, typically in a column vector. A similar approach looks at paths that occur in a graph, and encodes the number of times particular degree-sequences occur. A slight variation to this approach uses only the shortest-path between nodes which is used to encode this data.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Node Features for Graph Machine Learning</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Sat, 07 May 2022 18:05:19 +0000</pubDate>
      <link>https://dev.to/_aadidev/node-features-for-graph-machine-learning-30el</link>
      <guid>https://dev.to/_aadidev/node-features-for-graph-machine-learning-30el</guid>
      <description>&lt;p&gt;A graph in the context of mathematics is was almost every other field refers to as a &lt;em&gt;network&lt;/em&gt;. It consists a series of nodes connected by edges, both of which can contain meta- information, or what we refer to as &lt;em&gt;features&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cR3zZOwC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nf1r7okcm0t7qbhva70t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cR3zZOwC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nf1r7okcm0t7qbhva70t.png" alt="A graph of nodes and edges" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, a graph can describe power-stations across some geographic area; Each station could have features describing its maximum output, current output and current demand as its &lt;em&gt;node&lt;/em&gt; features. The edges could describe the energy flowing to distribution stations, or other power-stations in times of high demand. An interesting application may be to forecast the required power output for any station or for the consumption of any distribution station at a given point in time. The benefit of using a graph-based approach in this case is an implicit way to capture the dependencies between stations that deliver power to similar geographic areas, where each node is "aware" of other nodes at given time-steps, and therefore can adjust itself in relation to other nodes for more efficient energy use.&lt;/p&gt;

&lt;p&gt;See (here)[&lt;a href="https://arxiv.org/pdf/2105.13399.pdf"&gt;https://arxiv.org/pdf/2105.13399.pdf&lt;/a&gt;] for an interesting application similar to what was described above&lt;/p&gt;

&lt;p&gt;Before we get ahead of ourselves, we should take a step back and think about how we go about making predictions on a graph. Supervised machine learning uses features constructed from our data to establish (hopefully) pronounced similarities and differences in the input data that may not have been immediately obvious. In order to do this, we take our data (graph with node and edge features) and construct more appropriate features that can more easily be digested by a traditional machine-learning model. &lt;/p&gt;

&lt;p&gt;There are three (traditional) ways of going about this, we can construct features on the nodes, on the edges or on the entire graph at a time. In this post, I will go through node features, where it's used, and some high-level intuition about each.&lt;/p&gt;




&lt;h1&gt;
  
  
  Node Importance
&lt;/h1&gt;

&lt;p&gt;Identifying &lt;em&gt;important&lt;/em&gt; vertices in a graph is an interesting problem that usually requires some the development of some node-level embedding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Node Degree
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Caveat: this technically is a type of centrality, but I keep it separate as its sometimes referred to as a node "feature"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The most obvious way to create a smart feature describing nodes u in a graph V is to take a count of the node &lt;em&gt;degree&lt;/em&gt;, or the number of edges leaving or entering a node (or just the number of edges if the graph is un-directed).&lt;br&gt;


&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;du=∑u∈VA[u,v]
d_u = \sum_{u\in V}\bold{A}[u, v]
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;u&lt;/span&gt;&lt;span class="mrel mtight"&gt;∈&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;V&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathbf"&gt;A&lt;/span&gt;&lt;span class="mopen"&gt;[&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;v&lt;/span&gt;&lt;span class="mclose"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;As intuitive as this is, there are some drawbacks; Counting the number of edges of a given node only accounts for the nodes directly local neighborhood, and it ignores other node features (such as how important a neighbor may be) that may contribute to informative statistics from the graph. Lastly, the importance of a given node should (I think?) depend on the importance of my neighbors. For example, in a given social network, I think if you had a direct edge to the President of the US, your "importance" should be more heavily weighted. &lt;/p&gt;

&lt;h2&gt;
  
  
  Node Centrality
&lt;/h2&gt;

&lt;p&gt;Node centrality aims to improve upon vanilla node degree, by addressing its main shortcoming of not accounting for neighbors' importance. Additionally the idea of "centrality" encompasses a range of different methods, few of which I go through here&lt;/p&gt;

&lt;h3&gt;
  
  
  Betweenness Centrality
&lt;/h3&gt;

&lt;p&gt;This measure accounts for the number of shortest-paths to a given node. Going back to the social-media example, if lots of my friends are directly connected to an important person, then that makes them more important, and therefore (assuming it's actually a &lt;em&gt;lot&lt;/em&gt; of friends), I may then be weighted as a more "important" person. &lt;/p&gt;

&lt;p&gt;This measure counts the number of shortest paths which go through a given node, therefore making it more "central" to the graph. Another way is to think of junctions in a city as nodes, and edges as roads; if you have to pass through the main junction to reach most of the other junctions, it's highly likely that it is indeed important.&lt;/p&gt;

&lt;p&gt;For completeness, here's how we calculate betweenness centrality of a node v:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;C(v)=∑all node pairs in graphNumber of shortest paths between two nodes that pass through node vNumber of shortest path between two nodes
C(v) = \sum_{\text{all node pairs in graph}}\frac{\text{Number of shortest paths between two nodes that pass through node v}}{\text{Number of shortest path between two nodes}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;v&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord text mtight"&gt;&lt;span class="mord mtight"&gt;all node pairs in graph&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Number of shortest path between two nodes&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Number of shortest paths between two nodes that pass through node v&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Closeness Centrality
&lt;/h3&gt;

&lt;p&gt;This measure counts how far away a node is from every other node. The idea is, if a node is further from every other node, it is less important (note the concept of "important" here can be flipped if we rank important-ness in rarity, or if we want to find outlier nodes).&lt;/p&gt;

&lt;p&gt;For a given node v, this measure is taken as the sum of 1 over the number of edges in every shortest path to every other node.&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;C(v)=1∑every other node(Number of edges in shortest path)
C(v) = \frac{1}{\sum_{\text{every other node}}\text{(Number of edges in shortest path)}} 
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;v&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop"&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord text mtight"&gt;&lt;span class="mord mtight"&gt;every other node&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;(Number of edges in shortest path)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Eigenvector Similarity
&lt;/h2&gt;

&lt;p&gt;This is an interesting way of simply updating node importance based on neighbor importance, and requires a bit more background. We represent a graph as an adjacency matrix (other ways are as an edge list or adjacency set), which is square and has the number of rows/columns equal to the number of nodes in a graph. A non-zero entry indicates connection between two nodes. A node's &lt;em&gt;eigenvector similarity&lt;/em&gt; is defined as following:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;eu=1λ∑u∈VA[u,v]ev
e_u = \frac{1}{\lambda}\sum_{u\in V}\bold{A}[u, v]e_v
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;u&lt;/span&gt;&lt;span class="mrel mtight"&gt;∈&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;V&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathbf"&gt;A&lt;/span&gt;&lt;span class="mopen"&gt;[&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;v&lt;/span&gt;&lt;span class="mclose"&gt;]&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;v&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;If we rewrite the above in vector notation, with &lt;em&gt;e&lt;/em&gt; as a vector of node centralities:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;λe=Ae
\lambda\bold{e} = \bold{Ae}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;span class="mord mathbf"&gt;e&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathbf"&gt;Ae&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;The above is of the form of the eigenvector-eigenvalue decomposition (see Section 4.4 of &lt;a href="https://mml-book.github.io/book/mml-book.pdf"&gt;this free online book&lt;/a&gt; for a great breakdown).&lt;/p&gt;

&lt;p&gt;A given view of this measure is that it ranks the likelihood that a node is visited on a random walk of infinite length on the graph.&lt;/p&gt;

&lt;p&gt;The part to note here is that, assuming we require positive centrality values, there are &lt;a href="https://people.math.harvard.edu/~knill/teaching/math19b_2011/handouts/lecture34.pdf"&gt;theorem&lt;/a&gt; which allow us to solve this iteratively computationally.&lt;/p&gt;




&lt;h1&gt;
  
  
  Structure Based Features
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Clustering Coefficient
&lt;/h2&gt;

&lt;p&gt;This is taken as the fraction of existing connection among a node's neighbors divided by the total number of possible connections.&lt;/p&gt;

&lt;p&gt;It corresponds to the probability that two nearest neighbors of a node are connected with each other. In another view: clustering coefficient measures the proportion of closed triangles in a nodes local neighbor hood, giving an idea of how tightly knit a node's neighborhood may be.&lt;/p&gt;

&lt;p&gt;There are many variations of this metric, of which a popular version called the &lt;em&gt;local&lt;/em&gt; is computed as follows:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;cv=Number of edges between neighbors of node vNumber of pairs of nodes in neighborhood
c_v = \frac{\text{Number of edges between neighbors of node }v}{\text{Number of pairs of nodes in neighborhood}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;c&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;v&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Number of pairs of nodes in neighborhood&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Number of edges between neighbors of node &lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;v&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Graphlet Degree Vector
&lt;/h2&gt;

&lt;p&gt;A graphlet is a collection of nodes, that can contribute to a &lt;em&gt;subgraph&lt;/em&gt; of a given network. This counts the number of graphlets &lt;em&gt;rooted&lt;/em&gt; at each given node (up to a given size or of a given type). The graphlet degree vector is a vertical column of how many graphlets of a particular count appears rooted at a given node. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mZvteFPb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l68sh641qwgtux56ry61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mZvteFPb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l68sh641qwgtux56ry61.png" alt="From Pruzi et. al. 2004" width="752" height="441"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;From Pruzi et. al. 2004&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For example, to get the graphlet degree vector of a node for graphlets up to five-nodes, every node would have a vertical vector of 73 values, representing the various types of graphlets, where each number represents the number of a given 5-node graphlet which appears rooted at a given node. An interesting note is that the graphlet-frequency vector is fairly robust to random node addition/deletion and rewiring. This may be beneficial for classification tasks, but may not be favorable for outlier-analysis tasks. Additionally, this measure has been used as a basis for graph comparison tasks, in fairly recent papers such as &lt;a href="https://www.nature.com/articles/srep35098#Sec5"&gt;Graphlet-based Characterization of Directed Networks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See &lt;a href="https://academic.oup.com/bioinformatics/article/23/2/e177/202080?login=true"&gt;here&lt;/a&gt; for a more thorough application of the graphlet-degree vector.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>programming</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Automating Data Validation using Kedro Hooks</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Sat, 26 Mar 2022 10:41:03 +0000</pubDate>
      <link>https://dev.to/_aadidev/automating-data-validation-using-kedro-hooks-3gd6</link>
      <guid>https://dev.to/_aadidev/automating-data-validation-using-kedro-hooks-3gd6</guid>
      <description>&lt;h2&gt;
  
  
  30 Second Intro to Kedro
&lt;/h2&gt;

&lt;p&gt;Kedro is a fairly un-opinionated Python framework for running data pipelines. On a high-level, Kedro is a DAG-solver, consisting a series of discrete steps abstracted as &lt;strong&gt;Nodes&lt;/strong&gt;, connected by Datasets abstracted as &lt;strong&gt;Catalog&lt;/strong&gt; entries. Nodes are grouped into higher-order constructs called &lt;strong&gt;Pipelines&lt;/strong&gt;, and the order in which nodes are run is determined by the common data-dependencies in the input-output of each node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks
&lt;/h2&gt;

&lt;p&gt;Hooks, according to the Kedro &lt;a href="https://kedro.readthedocs.io/en/stable/07_extend_kedro/02_hooks.html?highlight=Hooks"&gt;documentation&lt;/a&gt;, allow you to extend the behaviour of Kedro's main exeuction in an easy and consistent manner. A Hook is built from a specification and an implementation. Below shows the general structure of a project created by running &lt;code&gt;kedro new&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Hooks are &lt;em&gt;implemented&lt;/em&gt; in &lt;code&gt;hooks.py&lt;/code&gt; and should consist a set of related functions grouped into a class (for each set of related) hooks. A hook is then &lt;em&gt;specified&lt;/em&gt; in the &lt;code&gt;src/&amp;lt;project_name&amp;gt;/settings.py&lt;/code&gt; by registering the hook Class. This is done by importing your newly created hook-class and adding it to the &lt;code&gt;HOOKS&lt;/code&gt; key. &lt;/p&gt;

&lt;p&gt;There are several types of hooks, depending on what type of event your hook should follow, and when it should execute. In this post, I would be focusing on one specific hook to validate data after it has been loaded&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Hooks to Validate Data
&lt;/h2&gt;

&lt;p&gt;One Kedro hook &lt;code&gt;after_dataset_loaded&lt;/code&gt; allows you to consistently execute a user-defined function every time an entry in your data-catalog is loaded. This is helpful in, for-example: ensuring the distribution of your data-source is as-expected. This can be a common issue in building machine-learning pipelines, where monitoring data-&lt;em&gt;drift&lt;/em&gt; is crucial in maintaining the performance and trust-ability of your model. In this post, we will be writing a hook to monitor data-drift using the &lt;a href="https://scholarworks.wmich.edu/cgi/viewcontent.cgi?article=4249&amp;amp;context=dissertations"&gt;Population-Stability-Index&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Hook Definition
&lt;/h3&gt;

&lt;p&gt;We will be using the &lt;code&gt;after_dataset_loaded&lt;/code&gt; Hook to ensure our data for a (potential) machine-learning model is consistent. If we look at the definition of the &lt;code&gt;after_dataset_loaded&lt;/code&gt; Hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;hook_spec&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;after_dataset_loaded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="s"&gt;"""Hook to be invoked after a dataset is loaded from the catalog.
        Args:
            dataset_name: name of the dataset that was loaded from the catalog.
            data: the actual data that was loaded from the catalog.
        """&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we see that the hook definition requires a dataset name and the data that was loaded from the catalog. (Don't worry, we don't need to actually specify those as this is handled by Kedro, we simply need to use the above-defined interface in our &lt;code&gt;hooks.py&lt;/code&gt; file, and add the &lt;code&gt;hook_impl&lt;/code&gt; decorator to the correctly-named function). &lt;/p&gt;

&lt;p&gt;For example, let us create a new class in &lt;code&gt;hooks.py&lt;/code&gt;, called &lt;code&gt;PSIHooks&lt;/code&gt; and create the required hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;kedro.framework.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hook_impl&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PSIHooks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;hook_impl&lt;/span&gt; 
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;after_dataset_loaded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's also assume that our data contains columns, and we would like to validate each column against a series of values stored as arrays. Using &lt;a href="https://github.com/mwburke/population-stability-index/blob/master/psi.py"&gt;this implementation&lt;/a&gt; (not mine), for PSI, we can add the following to the body of our hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# convert dataframe to numpy matrix 
&lt;/span&gt;&lt;span class="n"&gt;actual_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

&lt;span class="n"&gt;psi_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;calculate_psi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected_values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actual_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'f Dataset Name: {dataset_name}'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'PSI Values'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psi_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, you could add conditional logic to determine the data needed to validate each individual dataset, and the option to not monitor data for datasets that are a result of some data-operation.&lt;/p&gt;

&lt;p&gt;While the above establishes the general PSI-calculation, there is no way to keep track of what our PSI is, or how the PSI-changes over time. In this case, we can use an experiment-tracking framework such as MLFlow, Neptune.ai or wandb.ai in the body of our hook to log how our PSI changes over time.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Protocol Buffers, Neural Networks and Python Generators</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Mon, 07 Feb 2022 12:15:23 +0000</pubDate>
      <link>https://dev.to/_aadidev/protocol-buffers-neural-networks-and-python-generators-2o5i</link>
      <guid>https://dev.to/_aadidev/protocol-buffers-neural-networks-and-python-generators-2o5i</guid>
      <description>&lt;p&gt;&lt;em&gt;NB: There is an interactive, Google-Collab-style version of this post available &lt;a href="https://colab.research.google.com/github/aadi350/Blogs/blob/main/Protocol_Buffers%2C_Neural_Networks_and_Python_Generators.ipynb#scrollTo=zaIWCsNhidfj"&gt;here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;TLDR: So I was working on my thesis, and wanted to implement a particular paper that I would be able to iterate upon, long story short: &lt;a href="https://ieeexplore.ieee.org/document/8451652"&gt;this paper&lt;/a&gt; presented a Fully-Convolutional Siamese Neural network for Change Detection. And me, being me, was not satisfied with simply cloning their model &lt;a href="https://github.com/rcdaudt/fully_convolutional_change_detection"&gt;from GitHub&lt;/a&gt; and using it as-is. I had to implement it, using TensorFlow (instead of PyTorch), so that I could &lt;em&gt;really&lt;/em&gt; experience the intricacies of their model. (So I did, and you can find it &lt;a href="https://gist.github.com/aadi350/d17e6ddecd51845738ce5d506a25a10c"&gt;here&lt;/a&gt; but that's besides the point of this post).&lt;/p&gt;

&lt;p&gt;12 hours and two days later, I was ready to train my model. A &lt;a href="https://ieeexplore.ieee.org/document/9467555"&gt;recent 2022 paper&lt;/a&gt; released a dataset of 20000 image pairs, and painstainkingly labelled masks for the purposes of training the very type of network I had wrote. So there I was, ready with data, my training loop &lt;a href="https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch"&gt;written from scratch&lt;/a&gt; and a freshly brewed cup of coffee, ready to type the all-so-crucial command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python src/train.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But then, after about 15 seconds or so, the stacktrace in my terminal immediately gave me the sense that all was not right....&lt;br&gt;
Garbled, nearly unintelligible collections of words, all hinting that I was running out of memory (somehow 64 Gigabytes of system RAM and an 8GB GPU wasn't enough?!), and then, the magic error message brought my model training to a screeching halt indicating something about my "protos" did not allow for such large graph-nodes (or something along those lines).&lt;/p&gt;

&lt;p&gt;A quick side-quest: TensorFlow 2.x default mode of operation is &lt;em&gt;eager&lt;/em&gt; mode, when I hit run, the function runs as-is, and does not care on a low-level the command that came before or after. However, if using special decorators, there is a possibility for performance enhancement in using &lt;em&gt;Graph&lt;/em&gt; execution, where a really smart piece of code optimally choses how to execute my hand-written code in an execution graph. To get a better understanding of this, see the &lt;a href="https://www.tensorflow.org/guide/intro_to_graphs"&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  "proto"?
&lt;/h1&gt;

&lt;p&gt;Now that you have an idea of what Graph-execution is, and a general idea of the error I was facing, there remains one vital gap in information: what the hell is a "proto"?! According to &lt;a href="https://stackoverflow.com/questions/34128872/google-protobuf-maximum-size"&gt;this stackoverflow post&lt;/a&gt;, Protobuf has a hard limit of 2GB, since the arithmetic used is typically 32-bit signed. As &lt;a href="https://medium.com/@ouwenhuang/tensorflow-graphs-are-just-protobufs-9df51fc7d08d"&gt;this medium post explained&lt;/a&gt;, TF graphs are simply protobufs. Each operation in TensorFlow are symbolic handles for graph-based operations, which are stored as &lt;a href="https://developers.google.com/protocol-buffers"&gt;Protocol Buffers&lt;/a&gt;. A Protocol Buffer (proto for short), are Google's language-neutral, extensible mechanism for serializing structured data. The specially generated code is used to easily read and write structured data (in this case a TensorFlow graph) regardly of data stream and programming language.&lt;/p&gt;

&lt;p&gt;To the best of my understanding, my gigantic dataset was causing individual operations in the execution graph to exceed the proto hard-limit of 2GB, since I was using the &lt;code&gt;tf.Data&lt;/code&gt; API and the &lt;code&gt;from_tensor_slices&lt;/code&gt; function to keep my entire dataset in memory and perform operations from there. Now, the dataset is about 8GB large, wayyyyy smaller than my 64GB of RAM, however performing multiple layers of convolutions (not to mention, in &lt;em&gt;parallel&lt;/em&gt;) quickly caused the entire training pipeline to shut down.&lt;/p&gt;

&lt;p&gt;So I needed to somehow use this large dataset, but without having to keep all the images in memory, and for this, we now move to Python generators&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;code&gt;yield&lt;/code&gt;
&lt;/h1&gt;

&lt;p&gt;A &lt;em&gt;generator&lt;/em&gt; function allows you to declare a function that behaves like an iterator. For example, in order to read lines of a text file, I could do the following, which loads the entire file first, then returns it as a list. The downside of this is that the entire file must be kept in memory&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;csv_reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If instead, I do the following&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;csv_reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I could then call the &lt;code&gt;csv_reader&lt;/code&gt; function as if it were an &lt;em&gt;iterator&lt;/em&gt;, where the next row is loaded &lt;em&gt;only when the function is called&lt;/em&gt; and the previous output (possibly already processed) is discarded.&lt;/p&gt;

&lt;p&gt;So something along the lines of&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Generators and &lt;code&gt;tf.Data&lt;/code&gt;
&lt;/h1&gt;

&lt;p&gt;TensorFlow's &lt;code&gt;tf.Data&lt;/code&gt; API is extremely powerful, and the ability to define a Dataset &lt;em&gt;from a generator&lt;/em&gt;, is all the more powerful. So this is how I solved my issued from above, first I defined a generator for both train and validation sets: &lt;/p&gt;

&lt;p&gt;(the &lt;a href="https://gist.github.com/aadi350/b6f5ef46359f20f2cb4a2856a692067a"&gt;preprocessing functions&lt;/a&gt; simply loads the image from its file path, converts them to floats and normalizes them)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_gen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'train'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'data/'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;'/time1'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;'/time2'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;'/label'&lt;/span&gt;&lt;span class="p"&gt;))):&lt;/span&gt;
        &lt;span class="c1"&gt;# get full paths
&lt;/span&gt;
        &lt;span class="n"&gt;t1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_path_rgb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/time1/'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_path_rgb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/time2/'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_path_grey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/label/'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;val_gen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'val'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'data/'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;'/time1'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;'/time2'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;'/label'&lt;/span&gt;&lt;span class="p"&gt;))):&lt;/span&gt;
        &lt;span class="c1"&gt;# get full paths
&lt;/span&gt;
        &lt;span class="n"&gt;t1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_path_rgb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/time1/'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_path_rgb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/time2/'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process_path_grey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/label/'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not that since my model is a &lt;em&gt;Siamese&lt;/em&gt; neural network, it has two &lt;em&gt;heads&lt;/em&gt; and therefore requires &lt;strong&gt;two&lt;/strong&gt; inputs (t1 and t2 above refer to time-1 and time-2, or before-and-after, where l is the label mask indicating the areas that actually underwent change). Finally, I passed these generators to the &lt;code&gt;tf.Data&lt;/code&gt; API calls as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;train_ds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;train_gen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;val_ds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;val_gen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following section is more for performance and batching, which again removes how much data is actually held in memory at any given point in time. The &lt;code&gt;from_generator&lt;/code&gt; call achieves exactly what I wanted, where data is loaded on a as-needed basis, and (thus far) avoided my headache with Protocol buffers&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;buffer_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

&lt;span class="n"&gt;train_batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;train_ds&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AUTOTUNE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;val_batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;val_ds&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AUTOTUNE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a very, &lt;strong&gt;very&lt;/strong&gt; problem-specific post, however it does cover some key aspects of dealing with large sets of image data, TensorFlow and Python generators. I hope that you learnt something!&lt;/p&gt;

&lt;p&gt;For any changes, suggestions or overall comments, feel free to reach out to me &lt;a href="https://www.linkedin.com/in/aadidev-sooknanan/"&gt;on LinkedIn&lt;/a&gt; or on Twitter &lt;a href="https://twitter.com/__aadiDev__"&gt;@__aadiDev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>deeplearning</category>
      <category>python</category>
      <category>tensorflow</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>3 Common Loss Functions for Image Segmentation</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Sun, 30 Jan 2022 00:12:18 +0000</pubDate>
      <link>https://dev.to/_aadidev/3-common-loss-functions-for-image-segmentation-545o</link>
      <guid>https://dev.to/_aadidev/3-common-loss-functions-for-image-segmentation-545o</guid>
      <description>&lt;p&gt;Image segmentation has a wide range of applications, from &lt;a href="https://paperswithcode.com/paper/spleeter-a-fast-and-state-of-the-art-music" rel="noopener noreferrer"&gt;music spectrum separation&lt;/a&gt; and &lt;a href="https://deepmind.com/blog/article/how-evolutionary-selection-can-train-more-capable-self-driving-cars" rel="noopener noreferrer"&gt;self-driving-cars&lt;/a&gt; to &lt;a href="https://paperswithcode.com/paper/u-net-convolutional-networks-for-biomedical" rel="noopener noreferrer"&gt;biomedical imaging&lt;/a&gt; and &lt;a href="https://paperswithcode.com/paper/optimized-u-net-for-brain-tumor-segmentation" rel="noopener noreferrer"&gt;brain-tumor segmentation&lt;/a&gt;. The aim of image segmentation is to visually separate &lt;em&gt;(segment)&lt;/em&gt; parts of an image (or image-sequence) into separate objects. For example in the image below from the &lt;a href="https://arxiv.org/pdf/1909.11065v6.pdf" rel="noopener noreferrer"&gt;OCR: Transformer Segmentation paper&lt;/a&gt;, the car at the center of the image was "detected" on a pixel-wise basis. Whilst object detection would simply return the coordinates of say, a bounding box around the car, segmentation aims to return an image mask (1 for "is car", 0 for "is not car") for a given image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fup2iz5ys3esazeesczuf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fup2iz5ys3esazeesczuf.png" alt="Car Segmented from OCR Paper"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deep learning has affected (in my opinion) the area of computer vision more so than any other field. There have been multiple innovations in various fields using a &lt;a href="https://paperswithcode.com/method/u-net" rel="noopener noreferrer"&gt;variety of techniques&lt;/a&gt; over the past five tears. Image segmentation can be thought of a classification task on the pixel level, and the choice of &lt;em&gt;loss function&lt;/em&gt; for the task of segmentation is key in determining both the speed at which a Machine-Learning model converges, as well to some extent, the accuracy of the model. &lt;/p&gt;

&lt;p&gt;A &lt;em&gt;loss function&lt;/em&gt; gives feedback to the model during the process of supervised training (learning from already-labelled data), how well it is &lt;em&gt;converging&lt;/em&gt; upon the optimal model parameters. It is used to guide a model in its search for the "ideal" approximation which maps the input data to the output data (images to masks in the case of image segmentation). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2006.14822.pdf" rel="noopener noreferrer"&gt;This review paper from Shruti Jadon (IEEE Member)&lt;/a&gt; bucketed loss functions into four main groupings: Distribution-based, region-based, boundary-based and compounded loss. In this blog post, I will focus on three of the more commonly-used loss functions for semantic image segmentation: Binary Cross-Entropy Loss, Dice Loss and the Shape-Aware Loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Binary Cross-Entropy
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Cross-entropy&lt;/em&gt; is used to measure the difference between two probability distributions. It is used as a similarity metric to tell how close one distribution of random events are to another, and is used for both classification (in the more general sense) as well as segmentation. &lt;/p&gt;

&lt;p&gt;The binary cross-entropy (BCE) loss therefore attempts to measure the differences of information content between the actual and predicted image masks. It is more generally based on the Bernoulli distribution, and works best with equal data-distribution amongst classes. In other terms, image masks with very heavy class imbalance may (such as in finding very small, rare tumors from X-ray images) may not be adequately evaluated by BCE. &lt;/p&gt;

&lt;p&gt;This is due to the fact that the BCE treats both positive (1) and negative (0) samples in the image mask equally. Since there may be an unequal distribution of pixels that represent a given object (say, a car from the first image above) and the rest of the image, the BCE loss may not effectively represent the performance of the deep-learning model.&lt;/p&gt;

&lt;p&gt;Binary Cross Entropy is defined as:&lt;br&gt;


&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;L(y,y^)=−ylog⁡(y^)−(1−y)log(1−y^)
 L(y,\hat{y}) = -y\log(\hat{y}) - (1-y)log(1-\hat{y})
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;−&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord mathnormal"&gt;l&lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;g&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;em&gt;Quick primer on mathematical notation: if 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is our target image-segmentation mask, and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y^\hat{y}&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is our predicted mask from our deep-learning model, the loss measures the difference between what we want (
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
) and what the model gave us (
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y^\hat{y}&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
)&lt;/em&gt;

&lt;p&gt;This has been implemented in TensorFlow's &lt;code&gt;keras.losses&lt;/code&gt; package and as such, can be readily used as-is in your image segmentation models.&lt;/p&gt;

&lt;p&gt;An adaptation of vanilla BCE has been weighted BCE, which weights positive pixels by some coefficient. It is heavily used in medical imaging (and other areas with highly skewed datasets). It is defined as follows:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;L(y,y^)=−βylog⁡(y^)−(1−y)log(1−y^)
 L(y,\hat{y}) = -\beta y\log(\hat{y}) - (1-y)log(1-\hat{y})
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;−&lt;/span&gt;&lt;span class="mord mathnormal"&gt;β&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord mathnormal"&gt;l&lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;g&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;The 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;β\beta&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;β&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 parameter can be tuned, for example: to reduce the number of false-negative pixels, 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;β&amp;gt;1\beta &amp;gt; 1&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;β&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, in order to reduce the number of false positives, set 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;β&amp;lt;1\beta &amp;lt; 1&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;β&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 &lt;/p&gt;
&lt;h2&gt;
  
  
  Dice Coefficient
&lt;/h2&gt;

&lt;p&gt;This is a widely-used loss to calculate the similarity between images and is similar to the &lt;em&gt;Intersection-over-Union&lt;/em&gt; heuristic. The Dice Coefficient has as such, been adapted to a loss function as the Dice Loss:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;DL(y,y^)=1−2yy^+1y+y^+1
DL(y, \hat{y}) = 1 - \frac{2y\hat{y}+1}{y+\hat{y}+1}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;D&lt;/span&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
 

&lt;p&gt;A common criticism is the nature of its resulting search space, which is non-convex, several modifications have been made to make the Dice Loss more tractable for solving using methods such as L-BFGS and Stochastic Gradient Descent. The Dice Loss can be implemented in TensorFlow by subclassing &lt;code&gt;tf.keras.losses&lt;/code&gt; as following:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DiceLoss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;losses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Loss&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gama&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DiceLoss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NDL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;smooth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gama&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gama&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;nominator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; \
            &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;smooth&lt;/span&gt;
        &lt;span class="n"&gt;denominator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gama&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gama&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;smooth&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;divide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nominator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;denominator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Shape-Aware Loss
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/pdf/1505.04597.pdf" rel="noopener noreferrer"&gt;The U-Net paper&lt;/a&gt; forced their fully-connected convolutional network to learn small separation borders by using a pre-computed weight map for each ground truth pixel. This was aimed at compensating for the different frequency of pixels from certain classes in the training data set, and is computed using morphological operations. This weight map was computed as:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;w(x)=wc(x)+w0e−(d1(x)+d2(x))22σ2
w(\bold{x}) = w_c(\bold{x}) + w_0 e^{-\frac{
(d_1(\bold{x}) + d_2(\bold{x}))^2}{2\sigma^2}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathbf"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathbf"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;&lt;span class="mopen nulldelimiter sizing reset-size3 size6"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size3 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line mtight"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size3 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mopen mtight"&gt;(&lt;/span&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen mtight"&gt;(&lt;/span&gt;&lt;span class="mord mathbf mtight"&gt;x&lt;/span&gt;&lt;span class="mclose mtight"&gt;)&lt;/span&gt;&lt;span class="mbin mtight"&gt;+&lt;/span&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen mtight"&gt;(&lt;/span&gt;&lt;span class="mord mathbf mtight"&gt;x&lt;/span&gt;&lt;span class="mclose mtight"&gt;)&lt;/span&gt;&lt;span class="mclose mtight"&gt;&lt;span class="mclose mtight"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter sizing reset-size3 size6"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;The 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;d1d_1&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;d2d_2&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 functions give distances to the nearest and second nearest cells. 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;wcw_c&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is manually tuned to weight classes of instances of objects within an image depending on class distribution. &lt;/p&gt;

&lt;p&gt;This weight term is then used in the typical cross-entropy loss, which results in the following loss function:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;L(y,y^)=−w(x)×[ylog⁡(y^)+(1−y)log⁡(1−p^)]
L(y, \hat{y}) = -w(\bold{x})\times \left[ y\log(\hat{y}) + (1-y)\log(1-\hat{p})\right]
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;−&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathbf"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;[&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


</description>
      <category>tensorflow</category>
      <category>computervision</category>
      <category>deeplearning</category>
      <category>imageprocessing</category>
    </item>
    <item>
      <title>3 Ways to Handle non UTF-8 Characters in Pandas</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Thu, 20 Jan 2022 18:13:44 +0000</pubDate>
      <link>https://dev.to/_aadidev/3-ways-to-handle-non-utf-8-characters-in-pandas-242</link>
      <guid>https://dev.to/_aadidev/3-ways-to-handle-non-utf-8-characters-in-pandas-242</guid>
      <description>&lt;p&gt;So we've all gotten that error, you download a CSV from the web or get emailed it from your manager, who wants analysis done ASAP, and you find a card in your Kanban labelled &lt;em&gt;URGENT AFF&lt;/em&gt;,so you open up VSCode, import Pandas and then type the following: &lt;code&gt;pd.read_csv('some_important_file.csv')&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, instead of the actual import happening, you get the following, near un-interpretable stacktrace:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgq8ywh0oiu8t1feo0eoo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgq8ywh0oiu8t1feo0eoo.png" alt="Unintelligible stacktrace"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What does that even mean?! And what the heck is &lt;code&gt;utf-8&lt;/code&gt;. As a brief primer/crash course, your computer (like all computers), stores &lt;em&gt;everything&lt;/em&gt; as &lt;em&gt;bits&lt;/em&gt; (or series of ones and zeros). Now, in order to represent human-readable things (think letters) from ones and zeros, the &lt;a href="https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority" rel="noopener noreferrer"&gt;Internet Assigned Numbers Authority&lt;/a&gt; came together and came up with the &lt;a href="https://en.wikipedia.org/wiki/ASCII" rel="noopener noreferrer"&gt;ASCII&lt;/a&gt; mappings. These basically map bytes (binary bits) to &lt;em&gt;codes&lt;/em&gt; (in base-10, so numbers) which represent various characters. For example, &lt;code&gt;00111111&lt;/code&gt; is the binary for &lt;code&gt;063&lt;/code&gt; which is the code for &lt;code&gt;?&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;These letters then come together to form words which form sentences. The number of unique characters that ASCII can handle is limited by the number of unique bytes (combinations of &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt;) available. However, to summarize: using 8 bits allows for 256 unique characters which is NO where close in handling every single character from every single language. This is where &lt;a href="https://home.unicode.org/" rel="noopener noreferrer"&gt;Unicode&lt;/a&gt; comes in; unicode assigns a "code points" in &lt;a href="https://learn.sparkfun.com/tutorials/hexadecimal/all" rel="noopener noreferrer"&gt;hexadecimal&lt;/a&gt; to each character. For example &lt;code&gt;U+1F602&lt;/code&gt; maps to 😂. This way, there are potentially millions of combinations, and is far broader than the original ASCII.&lt;/p&gt;

&lt;h1&gt;
  
  
  UTF-8
&lt;/h1&gt;

&lt;p&gt;UTF-8 translates Unicode characters &lt;em&gt;to a unique binary&lt;/em&gt; string, and vice versa. However, UTF-8, as its name suggests, uses an 8-bit word (similar to ASCII), to save memory. This is similar to a technique known as &lt;a href="https://en.wikipedia.org/wiki/Huffman_coding" rel="noopener noreferrer"&gt;Huffman Coding&lt;/a&gt; which represents the most-used characters or &lt;em&gt;tokens&lt;/em&gt; as the &lt;em&gt;shortest&lt;/em&gt; words. This is intuitive in the sense that, we can afford to assign tokens used the least to larger bytes, as they are less likely to be sent together. If every character would be sent in 4 bytes instead, every text file you have would take up four times the space. &lt;/p&gt;

&lt;h2&gt;
  
  
  Caveat
&lt;/h2&gt;

&lt;p&gt;However, this also means that the number of characters encoded by &lt;em&gt;specifically UTF-8&lt;/em&gt;, is limited (just like ASCII). There are other UTFs (such as 16), however, this raises a key limitation, especially in the field of data science: sometimes we either don't need the non-UTF characters or can't process them, or we need to save on space. Therefore, here are three ways I handle non-UTF-8 characters for reading into a Pandas dataframe:&lt;/p&gt;

&lt;h3&gt;
  
  
  Find the correct Encoding Using Python
&lt;/h3&gt;

&lt;p&gt;Pandas, by default, assumes &lt;code&gt;utf-8&lt;/code&gt; encoding every time you do &lt;code&gt;pandas.read_csv&lt;/code&gt;, and it can feel like staring into a crystal ball trying to figure out the correct encoding. Your first bet is to use vanilla Python:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;file_name.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Most&lt;/em&gt; of the time, the output resembles the following:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&amp;lt;_io.TextIOWrapper &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'file_name.csv'&lt;/span&gt; &lt;span class="nv"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'r'&lt;/span&gt; &lt;span class="nv"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'utf16'&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;


&lt;span class="sb"&gt;```&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
If that fails, we can move onto the second option

&lt;span class="c"&gt;### Find Using Python Chardet&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;chardet]&lt;span class="o"&gt;(&lt;/span&gt;https://github.com/chardet/chardet&lt;span class="o"&gt;)&lt;/span&gt; is a library &lt;span class="k"&gt;for &lt;/span&gt;decoding characters, once installed you can use the following to determine encoding:
&lt;span class="sb"&gt;```&lt;/span&gt;python


import chardet
with open&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'file_name.csv'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; as f:
    chardet.detect&lt;span class="o"&gt;(&lt;/span&gt;f&lt;span class="o"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The output should resemble the following:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'encoding'&lt;/span&gt;: &lt;span class="s1"&gt;'EUC-JP'&lt;/span&gt;, &lt;span class="s1"&gt;'confidence'&lt;/span&gt;: 0.99&lt;span class="o"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Finally
&lt;/h3&gt;

&lt;p&gt;The last option is using the Linux CLI (fine, I lied when I said three methods &lt;em&gt;using Pandas&lt;/em&gt;)&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

iconv &lt;span class="nt"&gt;-f&lt;/span&gt; utf-8 &lt;span class="nt"&gt;-t&lt;/span&gt; utf-8 &lt;span class="nt"&gt;-c&lt;/span&gt; filepath &lt;span class="nt"&gt;-o&lt;/span&gt; CLEAN_FILE


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;The first &lt;code&gt;utf-8&lt;/code&gt; after &lt;code&gt;f&lt;/code&gt; defined what we think the original file format is&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;t&lt;/code&gt; is the target file format we wish to convert to (in this case &lt;code&gt;utf-8&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;c&lt;/code&gt; skips ivalid sequences&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;o&lt;/code&gt; outputs the fixed file to an actual filepath (instead of the terminal)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now that you have your encoding, you can go on to read your CSV file successfully by specifying it in your &lt;code&gt;read_csv&lt;/code&gt; command such as here:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;some_csv.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>pandas</category>
      <category>datascience</category>
      <category>python</category>
      <category>linux</category>
    </item>
    <item>
      <title>Why I despise IPython Notebooks</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Wed, 12 Jan 2022 00:07:36 +0000</pubDate>
      <link>https://dev.to/_aadidev/why-i-despise-ipython-notebooks-4eok</link>
      <guid>https://dev.to/_aadidev/why-i-despise-ipython-notebooks-4eok</guid>
      <description>&lt;h1&gt;
  
  
  Reason 1: Esc twice ain’t it
&lt;/h1&gt;

&lt;p&gt;I’m keyboard-driven, I use a tiling window manager, my mouse is seen as a luxury and I find it a pain to lift my hands from my keyboard and break my stream of thought to shift what feels like a 16 hour flight with 2 connections all the way to my mouse. (And it isn’t that I have a bad mouse, I like my mouse! But I like my keyboard more...). &lt;/p&gt;

&lt;p&gt;For those who don’t know what a tiling window manager is, it’s essentially a way of interacting with your open windows, grouped into workspaces (which are switched by key-combinations). I like the idea of auto-aligning open windows (trust me, I’ve been through Alt-Tab hell and back), and having my applications open by keystrokes and automatically fill exactly a given part of the screen is a Godsend!&lt;/p&gt;

&lt;p&gt;Everything you're seeing happens with TWO keystrokes&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--17lz4fbz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ffcj6qq7gwbry5il9im6.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--17lz4fbz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ffcj6qq7gwbry5il9im6.gif" alt="TWM" width="880" height="495"&gt;&lt;/a&gt;&lt;br&gt;
However, my biggest gripe with the entire notebook environment isn’t necessarily related directly to notebooks: it has to do with actual, WORKING vim-shortcut support. &lt;/p&gt;

&lt;p&gt;I love shortcuts, and my entire being is wired to hit &lt;code&gt;j&lt;/code&gt; and &lt;code&gt;k&lt;/code&gt; as soon as I see any sort of code-related text to navigate up and down. My &lt;code&gt;caps-lock&lt;/code&gt; key is remapped to &lt;code&gt;Esc&lt;/code&gt; so that I have more control switching back to ‘Normal’ mode in vim. Usually this wouldn’t be an issue, since I use the vim-extension in VSCode and use evil-mode in emacs (I use &lt;a href="https://github.com/hlissner/doom-emacs"&gt;doom-emacs&lt;/a&gt; which use vim-keybindings by default). &lt;/p&gt;

&lt;p&gt;However, God forbid that you happen to hit the &lt;code&gt;esc&lt;/code&gt; key twice whilst in a Jupyter/IPython cell with the Vim extension loaded (either via the web interface or via VScode), and you're taken OUT of the entire editor. This means that I now have to reach (what feels like) halfway across the room to my mouse, to re-click the cell I was working on, this is just terrible for ergonomics.&lt;/p&gt;

&lt;p&gt;Now, this may not be an IPython-specific issue, it's just that I haven't found an extension which works, since technically speaking all the extensions work as they are supposed to! Hitting &lt;code&gt;esc&lt;/code&gt; takes you to "normal" mode in Vim, however this now breaks the ability to shift between cells normally as you would in a Notebook because you technically aren't in the actual cell&lt;/p&gt;
&lt;h1&gt;
  
  
  Reason 2: &lt;code&gt;print(type(df))&lt;/code&gt; is NOT debugging
&lt;/h1&gt;

&lt;p&gt;An IPython/Jupyter notebook is meant to be run sequentially, which I may or may not have an issue with. My MAIN gripe around notebooks however come in the form of debugging. Typically, a break-point is set at a particular line in code, and a debugger temporarily halts code execution at that point, and we can set certain variables to be "watched"; i.e. the debugger can keep track of these variables &lt;em&gt;during&lt;/em&gt; execution. For example, in the following code&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Conv2D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;some_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm applying a &lt;a href="https://learnopencv.com/image-filtering-using-convolution-in-opencv/"&gt;convolution&lt;/a&gt; to what is presumably some sort of image. Don't worry about the actual operation, the important part is that I'm taking an image (or a matrix of floating-point values) and applying some operation to it, which may change the actual values, change the &lt;em&gt;type&lt;/em&gt; of values (some may go to &lt;code&gt;NaN&lt;/code&gt;) and possibly change the actual shape of the original image depending on convolutions. Now, in order to actually check what the operation is doing, I could potentially do the following set of abominations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# get shape
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# make sure I actually get a return Tensor
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="c1"&gt;# make sure no NaNs popped up
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and probably the WORST of them all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# I'm PRINTING an IMAGE as its 
#   RAW values, how in the 
#   actual heck is this helpful..?
&lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also to the above, remember IPython notebooks length grow as cell-outputs are populated. Imagine doing the &lt;code&gt;print(output)&lt;/code&gt; but MULTIPLE times in one notebook just to validate pre-processing steps (which is very common in computer vision). This is just highly unproductive! &lt;/p&gt;

&lt;p&gt;Compare this to the below which requires &lt;strong&gt;none&lt;/strong&gt; of the print statements (which usually have to be removed or re-added every-time you need to find the values or properties of an object, which is tiring, unnecessary and overall unproductive). EVERYTHING I could ever want to know about any variable is seen, and is much more useful and clean-coding practice:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is from using &lt;code&gt;watch&lt;/code&gt; in VSCode's debugger on the output variable, the shape, type, etc etc are all visible without additional code&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jJBiiKJe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/87wtwlc0hf5y0y18eioa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jJBiiKJe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/87wtwlc0hf5y0y18eioa.png" alt="Watch Window" width="676" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Finally: Becoming familiar with how an Engineer might look at your code to deploy
&lt;/h1&gt;

&lt;p&gt;This may be the least "ranty" reason. Deployable code does NOT contain random print statement throughout, and typically forms a directed acyclic graph between functions! This acyclicity is broken in a Jupyter notebook, since changing a cell above another cell does NOT change the output of the below cell. Let me show you what I mean, if I do the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NzIVqVbY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xomiworr6lium5svwavc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NzIVqVbY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xomiworr6lium5svwavc.png" alt="Simple math" width="709" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output is quite clearly correct,however if I NOW change the value of &lt;code&gt;a&lt;/code&gt; (or &lt;code&gt;b&lt;/code&gt;) and re-run only the upper cell, clearly the following does not remain true:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--suelmrA4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6sapw8aq1rejqjcfmu20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--suelmrA4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6sapw8aq1rejqjcfmu20.png" alt="Definitely not" width="807" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hence the entire notebook need-be re-run (which is not a big deal, there's a convenient drop-down, but then I have to use my mouse again AND it breaks the natural flow of functions throughout my pipeline). But the biggest issue I have here is modularity, I can't easily swap in one notebook for another without doing ALL the data pre-processing steps in separate cells. You can't import one notebook into another the same way you'd import packages.&lt;/p&gt;

&lt;p&gt;And deployment?! YIKES, that's a data engineer's job right....? WRONG!! How about we actually think about how our code is to be deployed, and follow some sort of paradigm where our resulting code can be modularly swapped in and out, and most importantly, actually follow some format to be easily tested. There are frameworks that assist in setting up your entire pipeline as a directed-acyclic-graph (such as Kedro, etc), but it's still a chore, and highly inefficient to not consider that building a data-science pipeline, outside of purely academic research is not for deployment in some setting&lt;/p&gt;

&lt;p&gt;In conclusion, there are benefits to IPython notebooks, ease of demonstration, etc etc. However, this article covers my &lt;strong&gt;opinion&lt;/strong&gt; of Jupyter/IPython notebooks, and why I try my best to steer clear of them for data-science/machine-learning related tasks.&lt;/p&gt;

&lt;p&gt;If you like rants like these, feel free to follow me on &lt;a href="https://twitter.com/_aadiDev"&gt;twitter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>jupyter</category>
      <category>datascience</category>
      <category>programming</category>
      <category>linux</category>
    </item>
    <item>
      <title>Replacing terms using ^ in the Linux Terminal</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Wed, 05 Jan 2022 23:05:11 +0000</pubDate>
      <link>https://dev.to/_aadidev/replacing-terms-using-in-the-linux-terminal-405c</link>
      <guid>https://dev.to/_aadidev/replacing-terms-using-in-the-linux-terminal-405c</guid>
      <description>&lt;p&gt;I use the terminal quite a lot in my day-to-day activities. This includes, but is not limited to: copying files, installing packages, running updates, searching for folders, etc. Sometimes, the commands are simple &lt;code&gt;ls -la&lt;/code&gt; or &lt;code&gt;grep&lt;/code&gt; to show files and search for text respectively. &lt;/p&gt;

&lt;p&gt;But sometimes, these commands are longer, MUCH longer, and this is past the point of realizing that I should have written this in a script file to run that way. AND, God forbid that I happen to make a mishap  in typing some unnecessarily convoluted command and hit enter OR that I need to use a similar command again, then I have to go through the trouble of either typing it from scratch or hunting through terminal history to find, modify and re-execute the command. &lt;/p&gt;

&lt;p&gt;Until I discovered the &lt;code&gt;^&lt;/code&gt; operator....&lt;/p&gt;

&lt;p&gt;Let's start with an example, let's say I wanted to use &lt;code&gt;conda&lt;/code&gt; (a package manager commonly used for data-science development), to create a new environment, in this case I'm using the following code from &lt;a href="https://rapids.ai/start.html#get-rapids" rel="noopener noreferrer"&gt;NVIDIA's RAPIDS install instructions&lt;/a&gt; for ease of demonstration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conda create &lt;span class="nt"&gt;-n&lt;/span&gt; rapids-21.12 &lt;span class="nt"&gt;-c&lt;/span&gt; rapidsai &lt;span class="nt"&gt;-c&lt;/span&gt; nvidia &lt;span class="nt"&gt;-c&lt;/span&gt; conda-forge &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;rapids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;21.12 &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.8 &lt;span class="nv"&gt;cudatoolkit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;11.5 dask-sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, let's say that I realised that I wanted to actually create the environment using a different version of python (again, for demonstration purposes, although there are packages that work with 3.7 but not 3.8).&lt;/p&gt;

&lt;p&gt;In order to replace the &lt;code&gt;3.8&lt;/code&gt; from the above command, here's the code I can use for in-place substitution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;^3.8^3.7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code will replace the first occurence of &lt;code&gt;3.8&lt;/code&gt; from my previous command with &lt;code&gt;3.7&lt;/code&gt; and proceed to re-execute the command. &lt;/p&gt;

&lt;p&gt;I realise the above is REALLY application-specific, so let's break this down:&lt;br&gt;
If I type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;star wars
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the &lt;em&gt;output&lt;/em&gt; is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;star wars
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I type the following after the above&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;^wars^trek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output becomes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;star trek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me show that in-terminal so you get an idea of exactly what we're doing&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg50bb2ndaeivih6op8ka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg50bb2ndaeivih6op8ka.png" alt="Basic Replacement"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;^&lt;/code&gt; operator is a shorthand way of using the &lt;code&gt;gs&lt;/code&gt; command for global substitution. In other terms, &lt;code&gt;^wars^trek&lt;/code&gt; is equivalent to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!!&lt;/span&gt;:gs/wars/trek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's say I wanted to replace &lt;strong&gt;all&lt;/strong&gt; instances of a term from the most-recently run command, so if I ran:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;luke luke
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;followed by&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;^luke^leia^:&amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the result is the equivalent of running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;leia leia
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, to see the continuity of how these commands work, we need to see them in-shell:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdot9byqy4l555s8ij59g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdot9byqy4l555s8ij59g.png" alt="Full replace"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now back to the original example (ignore the &lt;code&gt;CondaError&lt;/code&gt; as this was necessary to cancel the original command)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj77bpb38tu1a0ftex0te.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj77bpb38tu1a0ftex0te.png" alt="conda command"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>tutorial</category>
      <category>linux</category>
    </item>
    <item>
      <title>Anomaly Detection I - Distance-Based Methods</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Sun, 21 Nov 2021 14:16:01 +0000</pubDate>
      <link>https://dev.to/_aadidev/anomaly-detection-i-distance-based-methods-278g</link>
      <guid>https://dev.to/_aadidev/anomaly-detection-i-distance-based-methods-278g</guid>
      <description>&lt;p&gt;In my &lt;a href=""&gt;previous post&lt;/a&gt;, I went through the basics of what anomaly detection is, why it is important and current challenges in the field. To give a TLDR&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An outlier is generally considered a data point which is significantly different from other data points or which does not conform to the expected normal pattern of the phenomenon it represents&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HYJvTTSA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6nqdnjndw6c9g7y4ujv3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HYJvTTSA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6nqdnjndw6c9g7y4ujv3.png" alt="Anomalous in Cartesian Plane" width="629" height="631"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Distance-Based Approaches
&lt;/h1&gt;

&lt;p&gt;A distance-based approach to anomaly detection is one which relies on some measure of distance, or distance-derived metric between points and sets of points. This results in multiple concepts of distance-based anomaly detection:&lt;/p&gt;

&lt;h2&gt;
  
  
  Less Than &lt;em&gt;p&lt;/em&gt; Samples
&lt;/h2&gt;

&lt;p&gt;In this approach, points with less than &lt;em&gt;p&lt;/em&gt; neighbouring points are classified as anomalous or outliers. For example, in the image below, the test point at the bottom left may be classified as anomalous based on the neighbour distance (represented as the dashed circle) and the number &lt;em&gt;p&lt;/em&gt;, which is chosen depending on how sensitive the algorithm must be to anomalies. For example, very high levels of &lt;em&gt;p&lt;/em&gt; will result in high numbers of anomalies, since few points may have the necessary numbers of neighbours within its radius of consideration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--68to4Z7I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/247rr0joaq9qxn4d6i4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--68to4Z7I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/247rr0joaq9qxn4d6i4f.png" alt="p-nearest" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  kNN Methods
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Distance to All Points
&lt;/h3&gt;

&lt;p&gt;This the simplest possible method, where an algorithm evaluates a single point against every other point. The sum of the distances may be used as the metric. However, computing the scores of all the data points is generally computationally expensive intensive since all pairs of distances between points need be calculated. This is in all cases very expensive when the number of data points is large.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distance to Nearest Neighbour
&lt;/h3&gt;

&lt;p&gt;This is simple but can be misleading at times. A new point is considered anomalous if the distance to its nearest point is greater than some threshold. &lt;/p&gt;

&lt;h3&gt;
  
  
  Average Distance to kNN
&lt;/h3&gt;

&lt;p&gt;In an unsupervised approach to anomaly detection, it is impossible to know the correct value for &lt;strong&gt;k&lt;/strong&gt; for any particular algorithm, as this is highly dataset-dependent. A range of values of &lt;em&gt;k&lt;/em&gt; may be tested. The average to a test point's &lt;em&gt;k&lt;/em&gt; nearest-neighbours is less sensitive to different choices for K, since it effectively averages the exact k-nearest-neighbour scores over a range of values. &lt;/p&gt;

&lt;p&gt;In the image below, depending on whether the distance-to-furthest OR &lt;strong&gt;average&lt;/strong&gt; distance-to-furthest is used, the encircled point may or may not be considered as an anomaly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XntdD29n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0bwtf3iode5v94jjylg1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XntdD29n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0bwtf3iode5v94jjylg1.png" alt="kNN-possible-anomaly" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Median Distance to kNN
&lt;/h3&gt;

&lt;p&gt;This is very simple to interpret in low dimensions. It is additionally useful for building models that involve non-standard data-types, such as text. However, unlike the above, there is no standard way to choose &lt;em&gt;k&lt;/em&gt; (except through cross-validation, etc), is computationally expensive and requires large storage requirements.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Pruning Methods
&lt;/h2&gt;

&lt;p&gt;This can be thought of an extension to the aforementioned, with the main goal being a reduction in computational complexity. Pruning methods first partition the input space into discrete regions, with summary statistics such as the minimum-bounding rectangle, number of points, etc. During the nearest-neighbour search, a test example is compared to the bounding rectangle within which it lies, to determine first if it is possible at all for a nearby region to contain neighbours. If not, the region nearby is eliminated. This reduces the search complexity of actually finding nearby points to run distance calculations. &lt;/p&gt;

&lt;h1&gt;
  
  
  Metrics
&lt;/h1&gt;

&lt;p&gt;There are multiple methods to determine the "distance" in distance-based methods, following are a few of the most common&lt;/p&gt;

&lt;h3&gt;
  
  
  Euclidean
&lt;/h3&gt;

&lt;p&gt;This is possibly the most common, and is the easiest to work with. The Euclidean distance works well for well-clustered data, but is very sensitive to outliers. It may seem counter-intuitive to consider a metric's sensitivity to outliers when the entire point of anomaly detection is in detecting outliers, however an overly-sensitive metric may result in an unbearable false positive rate.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In all equations, assume that p and q are two points with i dimensions&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;∑i=1n(qi−pi)2
\sqrt{\sum_{i=1}^n(q_i-p_i)^2}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;q&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Weighted Euclidean
&lt;/h3&gt;

&lt;p&gt;If the relative importance of a dimension (which represents a feature in a dataset), then weighted Euclidean distance can be used. For example, if attempting to determine whether readings from a car's engine are anomalous, the oil-temperature may be far more important than the noise its engine makes. In other terms, more weight must be placed on the oil-temperature, since less variability is to be tolerated, whilst the sound in decibels can be down-weighted since it may have a wider operating range.&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;∑i=1nwi(qi−pi)2
\sqrt{\sum_{i=1}^{n}w_i(q_i-p_i)^2}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;q&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Minkowski
&lt;/h3&gt;

&lt;p&gt;This is a generalization of the Euclidean distance, and similarly performs well for isolated, well-clustered data. However, large-scale attributes will dominate smaller-scale attributes (owing to the index term), hence caution is advised in terms of feature-scaling.&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(qi−pi)nn
 \sqrt[n]{(q_i-p_i)^n}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="root"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;q&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Manhattan
&lt;/h3&gt;

&lt;p&gt;This is incredibly sensitive to outliers. The Manhattan distance results in a radius surrounding points which are hyper-rectangular (rectangular in high-dimensions). The caveat in using the Manhattan distance is that anomalies may be a function of both orientation and distance, which is not typically desired &lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;∑i=1n∣pi−qi∣
\sum_{i=1}^n|p_i-q_i|
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;q&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Mahalanobis
&lt;/h3&gt;

&lt;p&gt;This is most applicable when the dimensions in each axis are wildly non-comparable. This was originally derived to define regions that were hyper-ellipsoidal, and can alleviate distortion caused by linear correlation amongst features (this is known as a &lt;em&gt;whitening&lt;/em&gt; transformation). However, this is &lt;strong&gt;incredibly&lt;/strong&gt; computationally expensive, and should be used only when absolutely &lt;br&gt;
necessary.&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(p−q)S−1(p−q)T
\sqrt{(p-q)S^{-1}(p-q)^T}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;q&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;S&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;p&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;q&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Advantages of Distance-Based Methods
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;These methods scale well for large datasets with medium-to-high dimensionality&lt;/li&gt;
&lt;li&gt;These are also more computationally efficient than corresponding density-based statistical techniques&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Disadvantages of Distance-Based Methods
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Extremely high dimensionality drastically reduces performance due to elevated computational complexity
&lt;/li&gt;
&lt;li&gt;The search algorithms for nearest-neighbour methods can be inefficient unless a specialised indexing structure is used (such as a k-D Tree), at the cost of increased storage. &lt;/li&gt;
&lt;li&gt;Distance based methods cannot usually deal with data streams ad may not detect local outliers (such as between clusters of data points), since only global data is present&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>An Introduction to Anomaly Detection</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Tue, 26 Oct 2021 23:36:57 +0000</pubDate>
      <link>https://dev.to/_aadidev/an-introduction-to-anomaly-detection-4j8i</link>
      <guid>https://dev.to/_aadidev/an-introduction-to-anomaly-detection-4j8i</guid>
      <description>&lt;h1&gt;
  
  
  Outliers
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism&lt;/em&gt; ~ Hawkins 1980&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An anomaly, also known as an &lt;em&gt;outlier&lt;/em&gt;, is a rare-event data point or pattern which does not conform the the notion of normal behaviour. An object in a data set is usually called an outlier if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It deviates from the known/normal behaviour of the data&lt;/li&gt;
&lt;li&gt;The point is far away from the expected/average value off data or&lt;/li&gt;
&lt;li&gt;It is not connected/similar to any other object in terms off its characteristics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Anomaly Detection&lt;/strong&gt; is the process of flagging unusual cases in data, and spans multiple industries across &lt;a href="https://paperswithcode.com/task/anomaly-detection"&gt;multiple types of data&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This may seem like a trivial task, however humans have evolved to perform pattern-recognition at levels which far surpass even the most complex machine-learning model which exists today. We can differentiate betweeen the expected variance in data and outliers after having only seen a small number of normal instances (an infant is able to differentiate its biological parents from relatives before the age of 1 after exposure to only two humans, that's one &lt;em&gt;heck&lt;/em&gt; of a cold-start performance metric)&lt;/p&gt;

&lt;p&gt;The property which defines an outlier may be attributed to various properties of the data, and each property may lead to a specific characterization of outliers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Classifying Outliers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Size-Based
&lt;/h3&gt;

&lt;p&gt;An outlier can quantitatively correspond to the size of a data neighbourhood. For example, according to network theory, the degree of distribution of social networks typically follows a power-law. In other terms, the number of nodes with degree 

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;kk&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 (number of connections) is proportional to 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;k−αk^{-\alpha}&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;α&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
. A community made up of a collection of 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;nn&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 nodes (persons) all with the same degree, for a sufficiently large number of persons, can be thought of an outlier, or anomalous community, as it does not follow the expected size-related pattern.&lt;/p&gt;
&lt;h3&gt;
  
  
  Diversity Based
&lt;/h3&gt;

&lt;p&gt;Outliers may be classified based on how different they are, generally speaking, to other data points. For example, if a search-engine optimizes its results according to how fast a website loads (yes it's dumb but bare with me), then a page with a significantly faster-loading behaviour compared to others can get a better ranking. Here, incredibly fast-loading websites, such as static pages, can adversely affect the rankings of search engines. &lt;/p&gt;
&lt;h1&gt;
  
  
  Applications of Anomaly Detection
&lt;/h1&gt;
&lt;h3&gt;
  
  
  Network Intrusion Detection
&lt;/h3&gt;

&lt;p&gt;This is mainly applicable to time-series and graph-based data. Network security is of paramount importance, and the rise of cyber-attacks (such as DDoS, etc) has continued to cement the need for robust detection and response to these types of attacks. This is particularly challenging, since an anomaly detection system must be able to partially differentiate between an actual anomalous event and some other non-anomalous, high-traffic event such as a new product release (such as when Google's online store &lt;a href="https://9to5google.com/2021/10/19/pixel-6-google-store-down/"&gt;went down&lt;/a&gt; for the launch of its Pixel 6). &lt;/p&gt;
&lt;h3&gt;
  
  
  Medical Diagnosis
&lt;/h3&gt;

&lt;p&gt;ECGs, MRIs and simpler readouts such as glucose and oximetry readouts directly or indirectly indicate individuals' health status. This can be a potentially life-or-death application of anomaly detection, and is further complicated by the low-latency need of such a system. &lt;/p&gt;
&lt;h3&gt;
  
  
  Industrial Visual Defect Classification
&lt;/h3&gt;

&lt;p&gt;This application uses anomaly detection for (in my opinion) its second most immediately-tangible application yet. Measurements from various sensors and cameras are used as input into an anomaly-detection system as a form of quality-assurance. This is another challenging area, as defects can vary from subtle changes such as thin scratches to larger structural defects like missing components. &lt;/p&gt;
&lt;h3&gt;
  
  
  The Sciences
&lt;/h3&gt;

&lt;p&gt;A black-hole is an anomaly from our perspective. We only &lt;a href="https://www.nasa.gov/mission_pages/chandra/news/black-hole-image-makes-history"&gt;recently&lt;/a&gt; managed to get a decent image of what was previously strictly theorized. Even Einsten himself did not believe in its existence. Anomaly detection systems can be used to detect previously-unseen physical phenomena, such as in parsing the input from out-of-visible-light telescopes or to detect genetic mutations in DNA.&lt;/p&gt;
&lt;h2&gt;
  
  
  Feature Selection
&lt;/h2&gt;

&lt;p&gt;This is possibly the most difficult part of building an anomaly detetion system, owing to its unsupervised nature. A common way of measuring the non-uniformity of a set of univariate points is the &lt;em&gt;Kurtosis measure&lt;/em&gt;. The data is standardized to zero mean and unit variance. The resultant data points are raised to the fourth power, following by summation and normalization&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;K(z)=1N∑i=1Nzi4
K(\textbf{z}) = \frac{1}{N}\sum_{i=1}^{N}z_i^4
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;K&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord textbf"&gt;z&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;z&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8W8OebVL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9mfsqb7tjq1iebielfoh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8W8OebVL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9mfsqb7tjq1iebielfoh.png" alt="Types of Kurtosis" width="677" height="681"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Types of Kurtosis&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Feature distributions that are highly non-uniform exhibit a high-level of &lt;em&gt;Kurtosis&lt;/em&gt;, for example when the data contains a few extreme values, the &lt;em&gt;Kurtosis&lt;/em&gt; measure tends to increase owing to the use of the fourth power. Features may then be selected based on its level of &lt;em&gt;Kurtosis&lt;/em&gt; as a learning algorithm may better differentiate non-anomalous points from actual hazards.&lt;/p&gt;
&lt;h2&gt;
  
  
  Approaches to Anomaly Detection
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Extreme-Value Analysis
&lt;/h3&gt;

&lt;p&gt;This is the most widely used knowledge-based method. A decision-tree -type structure is used to classify data into anomalous or not. This is different from the classification approach, as these rules are hand-defined and hardcoded into the algorithm. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Xy0uzziV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzkx0tnk9vgz8za00wzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Xy0uzziV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzkx0tnk9vgz8za00wzg.png" alt="Basic Anomaly Detection" width="880" height="463"&gt;&lt;/a&gt;&lt;br&gt;
An even simpler approach would assign specific thresholds to data values and simply report anomalies when the data crosses this threshold. This system is the least flexible and is not able to learn with new data, or adapt to different data distributions. It is however, very simple to set up and highly interpretable, as the overall structure is defined beforehand. &lt;/p&gt;
&lt;h3&gt;
  
  
  Statistical Techniques
&lt;/h3&gt;

&lt;p&gt;This approach assumes that data follows a specific distribution. The most basic form computes parameters of a probability density function for each known class of network traffic, and tests unknown samples to determine to which class it belongs.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VFzsh6u6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/csl0e5cho3vz1z6abamx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VFzsh6u6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/csl0e5cho3vz1z6abamx.png" alt="Statistical Anomaly" width="595" height="551"&gt;&lt;/a&gt;&lt;br&gt;
In most network-based time-series applications, the Gaussian distribution is used to model each class of data, however other approaches such as Association Rule mining (counting the co-occurrences of items in transactions) has been used for one-class anomaly detection by generating rules from the data in an unsupervised fashion.&lt;/p&gt;
&lt;h3&gt;
  
  
  As a Supervised Classification Task
&lt;/h3&gt;

&lt;p&gt;An algorithm learns a function mapping input features to outputs based on example input-output pairs. The goal is to reframe anomaly detection as a binary classification task (either an anomaly or not). However, owing to the incredibly skewed data distribution (remember, by definition an anomaly is rare), each anomaly is potentially highly underrepresented. Additionally, there may be many types of anomalies (an aircraft engine can under-perform by either spinning too slowly or by &lt;a href="https://en.wikipedia.org/wiki/Qantas_Flight_32"&gt;catastrophically exploding&lt;/a&gt;, both of which can potentially lead to hazardous situations for vastly different reasons). This further leads to sparsity and intense skews in the data used to train these models.&lt;/p&gt;
&lt;h3&gt;
  
  
  Unsupervised Proximity-Based Learning
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ERYw4gze--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xknmjllccbs0qrby13io.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ERYw4gze--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xknmjllccbs0qrby13io.png" alt="Clustering" width="675" height="682"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;In the above, the group , or *cluster&lt;/em&gt; of points represents  normal operation, whilst the three to the bottom-left can be considered anomalous*&lt;/p&gt;

&lt;p&gt;Since labeled anomalous data is rare, unsupervised approaches tend to be more popular than supervised ones in anomaly detection. This is where no input-output pairs are readily available for training, but the algorithm learns what is 'normal' over time, and reports anything which deviates to some degree from the usual data distribution. The caveat here is that many anomalies may correspond to noise, or may not be of interest to the task at hand. The acutal approach used for this sort of unsupervised learning may vary wildly. For example, in detection anomalous events from video, &lt;a href="https://arxiv.org/abs/1511.05440"&gt;an algorithm&lt;/a&gt; may predict the next video frame, and compare the actual video frame to the predicted. The deviation from predicted and actual may then be thresholded in order to classify frames as positive/negative anomalies.&lt;/p&gt;
&lt;h3&gt;
  
  
  Information Theoretic Models
&lt;/h3&gt;

&lt;p&gt;The idea behind this approach is that outliers increase the &lt;em&gt;minimum-code-length&lt;/em&gt; (MDL) required to describe the data set, because they represent deviations from natural attemps to summarize thd data. The following example, taken from &lt;a href="https://www.amazon.com/Outlier-Analysis-Charu-C-Aggarwal/dp/3319475770/ref=sr_1_1?crid=39KDCHD95BEL&amp;amp;dchild=1&amp;amp;keywords=outlier+analysis&amp;amp;qid=1634982671&amp;amp;sprefix=outlier+analysi%2Caps%2C138&amp;amp;sr=8-1"&gt;Outlier Analysis&lt;/a&gt;, describes this idea. &lt;/p&gt;

&lt;p&gt;Consider the following two strings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ABABABABABABABABABABABABABABABABAB
ABABACABABABABABABABABABABABABABAB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The second string is the first length as the first, different only at a single positions containing the unique symbol 'C'. The first string can be concisely described as "AB 17 times", whilst the second scring can not be described in the same manner (additional description is need to account for 'C'). These models are closely related to conventionaly models, in that they learn a concise representation of the data as a baseline for comparison.&lt;/p&gt;
&lt;h3&gt;
  
  
  Semi-Supervised Learning
&lt;/h3&gt;

&lt;p&gt;This is a hybrid approach using both labelled and unlabelled data. This hybrid approach is well suited to applications like network intrusion detection, where one may have multiple examples of the normal class and some examples of intrusion classes, but new kinds of intrusions may arise over time. &lt;br&gt;
Another idea is based on initializing a neural network with pre-trained weights and then improving by adaptation to training data. This is a relatively new concept, and has not begun to see much advanced research. This approach might include training an auto-encoder on normal data, and then using the encoder on previously unseen data. The difference between encoded features from the usual data distribution can then be used to indicate the presence of an anomaly.&lt;/p&gt;
&lt;h1&gt;
  
  
  Evaluation Criteria
&lt;/h1&gt;

&lt;p&gt;By definition, anomaly detection expects that the distribution between normal and abnormal data classes may be highly skewed, and is known as the &lt;em&gt;class imbalance&lt;/em&gt; problem. Models which learn from this type of data may not be robust, as they tend to perform poorly when attempting to class anomalous examples. For example, imagine trying to classify 1000 time-series snippets representing web-traffic for your website, 950 of which are typical, everyday usage patterns. &lt;em&gt;Any&lt;/em&gt; algorithm which classifies all of your data samples as normal (non-anomalous), immediately achieves 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;95%95\%&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;95%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 accuracy! In other terms, a simple rule, 'return negative' appears to perform remarkably well. The issue here is that the anomalous class is under-represented, an the accuracy metric does not account for this&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Accuracy=Number of Correct Predictions MadeTotal Number of Predictions
Accuracy = \frac{\text{Number of Correct Predictions Made}}{\text{Total Number of Predictions}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;A&lt;/span&gt;&lt;span class="mord mathnormal"&gt;cc&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mord mathnormal"&gt;r&lt;/span&gt;&lt;span class="mord mathnormal"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal"&gt;cy&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Total Number of Predictions&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Number of Correct Predictions Made&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;p&gt;Based on the prior argument, we must conclude that using Accuracy alone may not be suitable for adequately evaluating an anomaly-detection system. This is where we move forward with more intentional measures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Precision and Recall
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt; gives an idea of the ratio between correct predictions (true positives) and the sum of all predictions that returned as true (&lt;em&gt;true positives plus false positives&lt;/em&gt;). An algorithm which optimizes strictly for precision is less concerned about false negatives, and is instead optimized for extreme confidence in its positive predictions. &lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Precision=TPTP+FP
Precision = \frac{TP}{TP+FP}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;P&lt;/span&gt;&lt;span class="mord mathnormal"&gt;rec&lt;/span&gt;&lt;span class="mord mathnormal"&gt;i&lt;/span&gt;&lt;span class="mord mathnormal"&gt;s&lt;/span&gt;&lt;span class="mord mathnormal"&gt;i&lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;TP&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;FP&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;TP&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt; gives the ratio of correct predictions (&lt;em&gt;true positives&lt;/em&gt;) so the sum of all true data points (&lt;em&gt;true positives plus false negatives&lt;/em&gt;). This metric, also known as sensitivy, can give a more balanced idea of how well an algorithm is at detecting positive samples. For example, the recall in our "return all false" model will be zero, as the numbeer of &lt;em&gt;true-positives&lt;/em&gt; (correct predictions) will also be zero. Optimizing for recall may be more appropriate when the cost of false-negative may be very high, for example in an airport security system, where it is better to flag many items for human inspection as opposed to accidentally allowing dangerous items onto a flight.&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Recall=TPTP+FN
Recall = \frac{TP}{TP+FN}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;R&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ec&lt;/span&gt;&lt;span class="mord mathnormal"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ll&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;TP&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;FN&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;TP&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;Hopefully you should have a general understanding of what anomaly detection is, why it's useful, a few challenges in the field and a few ways of framing anomaly-detection problems. In my next few posts, I'd be delving into anomaly detection models in-detail, and walking through some Python in how to implement some of these anomaly detection models&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>modeling</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Five thousand processing cores?</title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Mon, 11 Oct 2021 00:46:46 +0000</pubDate>
      <link>https://dev.to/_aadidev/five-thousand-processing-cores-2035</link>
      <guid>https://dev.to/_aadidev/five-thousand-processing-cores-2035</guid>
      <description>&lt;p&gt;Even if you're not in the 'tech' industry, in today's computing age you may have heard the term 'CPU' tossed around. A CPU or &lt;em&gt;Central Processing Unit&lt;/em&gt; is a general term for the 'brain' of today's computers. (I use the term computer here very loosely to refer to any sort of desktop, laptop, server, etc without attempting to fully encapsulate the infinite array of microprocessing units in our fridges, watches, and elevator controls). &lt;/p&gt;

&lt;h1&gt;
  
  
  How a CPU Works
&lt;/h1&gt;

&lt;p&gt;A CPU may divide a series of tasks by time; where any given slot (or series of slots) may be dedicated to a given task or series of tasks. These tasks are assigned to a single computational unit (also known as a core, technically known as a floating point unit) at any given point in time, and the single core is freed to move on to its next task once the previously-running task has completed. &lt;/p&gt;

&lt;p&gt;The issue with this model is readily apparent: what if I want two tasks to happen at the same time? &lt;/p&gt;

&lt;h2&gt;
  
  
  RTOS
&lt;/h2&gt;

&lt;p&gt;Since the 1980's, the &lt;em&gt;real-time operating system&lt;/em&gt; or &lt;strong&gt;RTOS&lt;/strong&gt; was the only way by which a single CPU could achieve, or at least appear to achieve, some sort of concurrent operation. However, this "appear to achieve" is a bit of a gotcha, since the &lt;strong&gt;real-time&lt;/strong&gt; in RTOS translates to &lt;em&gt;finishing within a predetermined time-interval&lt;/em&gt;. This is achieved by some sort of scheduling algorithm, and a series of programming constructs for holding resources (mutexes), signalling (semaphores) and a host of other methods by which some sort of deterministic behaviour is effected.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; purely concurrent, there is no way to actually carry out simultaneous operations, say, on a large chunk of data&lt;/p&gt;

&lt;h1&gt;
  
  
  The Multi-Core Processor
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aP-mPoPy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/59uo7wmwegsx7bm6dsz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aP-mPoPy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/59uo7wmwegsx7bm6dsz4.png" alt="IBM 100 Power 4"&gt;&lt;/a&gt;&lt;em&gt;IBM 100 Power 4&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.ibm.com/ibm/history/ibm100/us/en/icons/power4/#:~:text=In%202001%2C%20IBM%20introduced%20the,more%20than%20170%20million%20transistors"&gt;first&lt;/a&gt; multi-core processor was the POWER4, however the first commercially available desktop processor accessible as a familiar socket-mounted package was the Intel Celeron for home consumer usage, and the AMD Opeteron for server usage. This took the single-core idea and solved the "I want to do two things at once" problem in the most brute-force way possible: if you want to do to (or more things) at once, then you need two (or more) cores.&lt;/p&gt;

&lt;p&gt;This wasn't (and still isn't) an absurdly irrational concept, as the proliferation of &lt;a href="https://www.supermicro.com/products/motherboard/Xeon7000/7300/X7QC3.cfm"&gt;multi-socket motherboards&lt;/a&gt; prior to the multi-core era demonstrated the desire for concurrency in the enterprise space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6yHjVS4Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/26kw1euitnb822qczgpn.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6yHjVS4Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/26kw1euitnb822qczgpn.jpeg" alt="Multi-Socket Motherboard"&gt;&lt;/a&gt;&lt;em&gt;Multi-Socket Motherboard&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The main limitation before the introduction of the first multi-core CPUs was (and still is) power delivery. Having to power two cores in a single package introduces complications and adds additional heating requirements. Moreover, having the introduced overhead for core synchronization and memory sharing not outweigh the benefits of multi-core has seen many creative solutions over the years, the most recent of which is AMD's &lt;a href="https://en.wikichip.org/wiki/amd/infinity_fabric"&gt;Infinity Fabric&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;When the first multi-core processors came out, the issue was fitting enough transistors on a chip to build more than one core. The issue regarding transistor size has for more or less disappeared, as we approach the opposite problem in transistor design: as transistors shrink below the 1-2nm mark, new quantum effects such as tunneling introduce an entirely new class of nondeterminism into chips' operation. &lt;/p&gt;

&lt;p&gt;Okay, so now you understand where CPUs came from, and how being able to do more than to things at once was physically achieved, but where do GPUs come in?&lt;/p&gt;

&lt;h1&gt;
  
  
  The CUDA Core!!
&lt;/h1&gt;

&lt;h4&gt;
  
  
  or streaming processor?
&lt;/h4&gt;

&lt;p&gt;Okay I love NVIDIA and AMD, but their definition of a 'core' is a bit, err...ambitious? On the CPU side of things, a 'core' should be able to fetch instructions, load the necessary data required to perform this instruction into memory, perform the said data operation as indicated by the instruction, and return the complete, processed data at the end of the operation. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--unx6N_oA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dgtslgq5a1a0h41veyzc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--unx6N_oA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dgtslgq5a1a0h41veyzc.jpg" alt="Ampere Architecture"&gt;&lt;/a&gt;&lt;em&gt;Layout of the latest 3000-series GPUs&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;CUDA core&lt;/strong&gt; or (Stream processor depending on which colour flag you're currently waving, for me it's currently green, so we'd stick with "CUDA Core" for now), is simply a floating-point unit. It receives data, performs some operation, and returns it. It does &lt;strong&gt;not&lt;/strong&gt; independently handle fetching instructions and loading data into memory. &lt;/p&gt;

&lt;p&gt;Terminology out of the way, modern-day GPUs have &lt;em&gt;thousands&lt;/em&gt; of CUDA cores, the GA104 in my NVIDIA RTX 3060ti has nearly &lt;strong&gt;five thousand&lt;/strong&gt; CUDA cores. Heck, the measly mobile GTX1060M in my laptop has over a thousand, and that launched &lt;em&gt;five years&lt;/em&gt; ago. GPUs are essentially a set of floating point processors bundled nicely into a well-powered, nicely ventilated chip which makes GPUs incredibly versatile for huge levels of parallelism. &lt;/p&gt;

&lt;p&gt;GPUs have been used for &lt;a href="https://www.usenix.org/legacy/events/atc11/tech/final_files/atc11_proceedings.pdf#page=27"&gt;real-time scheduling&lt;/a&gt;, &lt;a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.418.233&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;graph algorithms&lt;/a&gt;, &lt;a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.3825&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;HPC&lt;/a&gt;, &lt;a href="https://arxiv.org/pdf/1811.05588.pdf"&gt;object detection using neural networks&lt;/a&gt;, and the list goes on. This is due in no small part to the nature of machine-learning applications, and the ability of neural networks to be split across multiple processing cores. Major Deep Learning frameworks such as &lt;a href="https://www.tensorflow.org/install/gpu"&gt;TensorFlow&lt;/a&gt; and &lt;a href="https://pytorch.org/docs/stable/notes/cuda.html"&gt;PyTorch&lt;/a&gt; now offer GPU support by default (once the CUDA toolkit and cuDNN is installed). &lt;/p&gt;

&lt;p&gt;With respect to purely cost, GPUs have higher instruction throughput and memory bandwidth when compared to CPUs. Additionally, GPUs tend to have significantly higher raw arithmetic capabilities than CPUs, and is centered around a large number of fine-grained parallel processors.&lt;/p&gt;

&lt;p&gt;I could go on and on about the wonders of GPUs and where there are used, and probably do some more hand-wavy stuff in an attempt to convince you that GPUs are really cool, but I'd rather go in a bit more detail into how exactly GPUs do what they do, and the thinking that goes into developing a GPU program.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;From here forward, most of the technical details are NVIDIA-specific, however they can for the most part be ported to AMD/ATI GPUs&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Programming Model
&lt;/h1&gt;

&lt;p&gt;GPUs work on the &lt;strong&gt;SIMD&lt;/strong&gt; model, or the &lt;em&gt;single-instruction-multiple-data&lt;/em&gt; idea, where a single operation is to be carried out on multiple data points in parallel. These operations must be independent, as there is no data-sharing between these operations. This is in direct contrast to the &lt;strong&gt;MIMD&lt;/strong&gt; model of the CPU (or &lt;em&gt;multi-instruction-multi-data&lt;/em&gt;, where CPUs possess inherent complexity to be able to handle multiple types of different tasks). &lt;br&gt;
GPUs are more general purpose, as their floating-point units can be adapted to a wider range of applications by means of a programming interface (such as CUDA). &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QPFE0I9o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ntmf9tmmdt937qgrys0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QPFE0I9o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ntmf9tmmdt937qgrys0t.png" alt="Block of Threads"&gt;&lt;/a&gt;&lt;em&gt;How Threads are grouped into blocks which are grouped into grids&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;streaming multiprocessor&lt;/strong&gt; or an SM,  in NVIDIA-land could be thought of as a multithreaded CPU core, with its own shared memory, with a set of 32-bit registers (think of this as the GPU equivalent of L1 cache), and contains a set of floating point units. A collection of &lt;strong&gt;threads&lt;/strong&gt; called a &lt;em&gt;block&lt;/em&gt; runs on an SM and executes a custom GPU function called a &lt;strong&gt;kernel&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;That's a lot of terminology, the important bit to note is that current GPUs have a limit of 1024 threads per block, and this number is further limited by the available memory requirements of your specific kernel. For a more in-depth explanation of this, see &lt;a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking About Problems
&lt;/h2&gt;

&lt;p&gt;The GPU architecture is centered around fine-grained parallelism (or &lt;em&gt;thread-based&lt;/em&gt; parallelism). This is where a problem is partitioned into coarse sub-problems solved independently by blocks of threads, where each sub-problem is split into finer pieces that may be solved cooperatively in parallel by &lt;strong&gt;all&lt;/strong&gt; threads in a block.&lt;/p&gt;

&lt;p&gt;There can be a few issues here however, where bad branching in your custom GPU program or &lt;em&gt;kernel&lt;/em&gt; results in massive overhead induced by the GPUs limitation to tell a block of threads to do only &lt;em&gt;one&lt;/em&gt; thing. For example, if your kernel needs all the even-numbered threads to do one thing, and the odd-numbered threads to do another thing, there will always be one set of threads waiting on the other to complete its task, which effectively &lt;strong&gt;doubles&lt;/strong&gt; the processing time for your given task (or set of tasks).&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharing is Caring
&lt;/h2&gt;

&lt;p&gt;Remember where I said that each thread can work only on independent data points? Well in theory this may seem feasible, but in practice this idea falls apart. Think of the simplest case of needing to first calculate the square of a series of numbers, followed by finding a sum of these numbers. Every "square" mathematical operation can happen on a separate thread, however when needing to sum the output, the threads need to talk to each other, or at least have some central repository by which to sync their outputs. This is where &lt;strong&gt;shared memory&lt;/strong&gt; comes in. (This is one of the main types of memory available in the CUDA programming model, along with &lt;em&gt;global&lt;/em&gt;, &lt;em&gt;texture&lt;/em&gt; and &lt;em&gt;host&lt;/em&gt; memory). &lt;/p&gt;

&lt;p&gt;Shared memory is a memory that can accessed all threads within a block, and is orders of magnitude times faster than system memory, with significantly lower latency. (It can be thought of programmer-controlled L1 cache). The CUDA programming model introduces a special keyword &lt;code&gt;__syncthreads()&lt;/code&gt; to ensure no race conditions occur. &lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;race condition&lt;/strong&gt; is where two processes need to access a single memory location, and one of both threads attempts to read from/write to the memory location before the other is done with its own operation. This can lead to failed reads and corrupt writes. &lt;/p&gt;

&lt;p&gt;This is a very basic introduction to why GPUs are useful, and how they function on a high-level basis. If you have any questions, feel free to contact my via the email listed in the profile, and happy reading!&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Gradient Boosting on the GPU </title>
      <dc:creator>Darth Espressius</dc:creator>
      <pubDate>Sun, 26 Sep 2021 23:10:05 +0000</pubDate>
      <link>https://dev.to/_aadidev/gradient-boosting-on-the-gpu-1pbf</link>
      <guid>https://dev.to/_aadidev/gradient-boosting-on-the-gpu-1pbf</guid>
      <description>&lt;p&gt;Decisions, decisions, it seems like every data-centric problem simmers down to making some sort of choice. Whether it be choosing a class of object present in an image, or modeling churn prediction by choosing a probability, solving data-related problem is typically centred around making decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Trees
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56xbvx1i26a6nopqry6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56xbvx1i26a6nopqry6a.png" alt="Decision Tree"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Song et. al., 2015&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Decision and regression trees are a form of supervised learning. These algorithms infer implicit rules from structured data to predict some given outcome.&lt;br&gt;
The first regression tree algorithm was published in a &lt;a href="http://cda.psych.uiuc.edu/statistical_learning_course/morgan_sonquist.pdf" rel="noopener noreferrer"&gt;1963 paper&lt;/a&gt;, whilst the first decision tree algorithm was published in &lt;a href="https://www.proquest.com/openview/4d0c5d0c515e62cfaa5d112c6b3b3bac/1?pq-origsite=gscholar&amp;amp;cbl=40685" rel="noopener noreferrer"&gt;this 1972 paper&lt;/a&gt; The premise of a decision tree to recursively divide the input attributes in smaller, "purer" subsets, which can then more accurately define a given output.&lt;/p&gt;

&lt;p&gt;In other words, say you are attempting to predict what type of personal computer someone may purchase. You have a list of previous purchases of persons given their age and occupation. Naturally, you may first group your input data by age (the younger folk may make a different sort of purchasing decision, aiming for portability, performance or flashier features, whilst the elderly may aim on a larger, more tactile keyboard and included software). &lt;/p&gt;

&lt;p&gt;After you split your data by age, you end up with two groups; put another way, your decision making process has created its first 'branch'. These two groups can then be further split by occupation. Persons working in software or AI may look to more performance oriented options, while accountants, managers and writers may focus on ergonomics and portability. This multi-way split into various occupation then splits your original two groups further, resulting in "purer" input attributes. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes6u4m04k4o9cmpv93fe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes6u4m04k4o9cmpv93fe.png" alt="Groups"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In other terms, your resulting groups or &lt;em&gt;leaves&lt;/em&gt; are of &lt;em&gt;one&lt;/em&gt; or a few age groups and one or a few occupations. Yes, a &lt;em&gt;few&lt;/em&gt;, it may not be necessary to split on every single occupation, owing to a concept known as overfitting, where your model does not generalize well owing to its incredibly high specificity. To the stats folks, this is when your model exhibits unreasonably high variance owing to unconstrained optimization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs53fkytx937bwbdxftxi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs53fkytx937bwbdxftxi.png" alt="Leaves"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This results in an incredibly interpretable model, since the decision made at each branch is easily found. However, a decision tree may leave some room for accuracy improvement, and can require complete re-training if data changes. The latter point is a significant consideration, as organizations may have large amounts of data, where data drift may result in models slowly becoming less effective over time&lt;/p&gt;
&lt;h2&gt;
  
  
  Strength In Numbers
&lt;/h2&gt;

&lt;p&gt;A decision tree &lt;strong&gt;ensemble&lt;/strong&gt; is a &lt;em&gt;group&lt;/em&gt; of decision tree classifiers/regressors. A data tuple is mapped to an output leaf for a series of decision trees, and the &lt;em&gt;average&lt;/em&gt; of the output is taken. This can assist in improving accuracy. A specific implementation of the ensemble method is the &lt;em&gt;gradient-boosted&lt;/em&gt; tree. &lt;/p&gt;

&lt;p&gt;The decision tree ensemble is trained to optimize some given loss function&lt;br&gt;


&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;L=∑il(yi^,yi)+∑kΩ(fk)
\mathcal{L} = \sum_{i}l(\hat{y_{i}}, y_{i}) + \sum_{k}\Omega(f_{k})
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathcal"&gt;L&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;l&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;^&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;Ω&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;The 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Ω\Omega &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;Ω&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 term above penalizes the impact of any single decision tree. This may sound strange at first, however the situation may arise where two decision trees in the ensemble decide to over- and under- weight any given input attribute, resulting in a net zero effect. For example, if one tree thinks that age positively correlates very strongly with a person buying a less portable machine (i.e. the older folk buy desktops instead of tablets), another decision tree added to the model may instigate an exactly opposite relationship to effectively cancel out the first tree's impact. &lt;/p&gt;

&lt;p&gt;This is a phenomena known as parameter explosion and is a serious issue in statistical learning algorithms. (This regularization technique is not limited to Decision-Trees, Regression has its variants in the Lasso and Ridge variants, whilst Deep Learning employs dropout, etc)&lt;/p&gt;
&lt;h3&gt;
  
  
  Gradient Boosting
&lt;/h3&gt;

&lt;p&gt;An additive function (a new tree) is added greedily to some decision making function to minimize your objective function. In other terms, a gradient-boosting scheme trains and adds the next-best-performing tree at each iteration. &lt;br&gt;
In more mathematical terms, at each iteration, a new newly trained tree which most optimally minimizes a chosen objective function is added to the overall ensemble. &lt;/p&gt;

&lt;p&gt;There are a few methods by which trees split input data, the following list does not claim to be exhaustive, but rather gives a general introduction to popular split-finding techniques&lt;/p&gt;
&lt;h4&gt;
  
  
  Exact Greedy
&lt;/h4&gt;

&lt;p&gt;Every possible split of input data (for a given attribute) is enumerated, and the split resulting in the maximal increase in &lt;em&gt;Information Gain&lt;/em&gt; is chosen. &lt;em&gt;Information Gain&lt;/em&gt; represents the difference in &lt;strong&gt;entropy&lt;/strong&gt; before and after your data is split, whilst &lt;strong&gt;entropy&lt;/strong&gt; measures how "impure" (how many different values  occur in the split) your attribute is. For example, following our example above, if we split based on occupation, and any given group of data &lt;em&gt;after&lt;/em&gt; the split contains multiple, unrelated occupations, this data is said to be high-entropy, as no one consistent theme is present for the occupation attribute. If, however, the split data contains a single occupation per split, the data is low-entropy. &lt;/p&gt;

&lt;p&gt;This implies then, that if the difference in entropy before and after a split is high, the data has become "purified", or has experienced positive information gain. As stated in the name for this method, the next split is chosen greedily (which means the next best option is chosen out of all possible options).&lt;/p&gt;

&lt;p&gt;This technique can be computationally expensive to enumerate all splits for continuous features however, since it includes sorting all values for a given attribute and accumulating these gradient statistics for &lt;em&gt;every possible split&lt;/em&gt;.&lt;/p&gt;
&lt;h4&gt;
  
  
  Approximate Greedy
&lt;/h4&gt;

&lt;p&gt;Exact greedy above cannot work for data which is not held is main system memory. When dealing with terra- and peta-bytes of data, it is unreasonable to expect all data to be held in memory to perform any sort of statistical analysis. The approximate greedy method of split-finding proposes split points based on percentiles of a given feature by means of samples from the main database. Continuous features are additionally bucketed using these candidate points, and the best solution is found based on aggregated statistics. &lt;/p&gt;
&lt;h4&gt;
  
  
  Sparsity Aware
&lt;/h4&gt;

&lt;p&gt;Data is never perfect. That's a fact of life. In the library we will be using, a default split is defined for missing data based on existing data. This ensures that all data within your input database is used to train the model, and ensures robustness against future missing data. &lt;a href="https://arxiv.org/pdf/1603.02754.pdf" rel="noopener noreferrer"&gt;This paper&lt;/a&gt; found an exponential decrease in classification time when using sparsity-aware methods. In other words, the algorithm did not have to estimate adjusted gradients at time of classification when subject to missing features, as a pre-baked option was already selected. This can be customised depending on application need.&lt;/p&gt;
&lt;h2&gt;
  
  
  The GPU
&lt;/h2&gt;

&lt;p&gt;Or as I like to call it, a concurrency nerd's DREAM. I'm planning on writing an article completely dedicate to the wonders of GPU processing, and how to do some cool things using NVIDIA CUDA, but for now, we will be using NVIDIA's RAPIDS library. &lt;/p&gt;

&lt;p&gt;RAPIDS allows execution of end-to-end pipeline entirely on GPUs, and is built on CUDA primitives (a blend of C++ and C code). This allows workloads that are highly parallelizable (single-instruction-multiple-data) to be scaled outwards across multiple GPU cores. Whilst your GPU's individual cores may be much simpler than a CPU's, a GPU typically has hundreds (to thousands) of these cores. For example, the lowest end NVIDIA GPU is currently upwards of EIGHT HUNDRED. The &lt;em&gt;lowest end&lt;/em&gt; GPU has orders of magnitude more cores than many high-end desktop CPUs.&lt;/p&gt;

&lt;p&gt;Additionally RAPIDS is incredibly compliant with typical Pandas' function calls. What this translates to is a very natural transition from typical Pandas' function calls (which run on your CPU) to RAPIDS API calls, which run by default on your GPU. See &lt;a href="https://docs.rapids.ai/api/cudf/stable/api.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; for a complete reference the RAPIDS' direct analog to Pandas.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GPU-powered API
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cudf&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cd&lt;/span&gt;

&lt;span class="c1"&gt;# usual import
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# does exactly the same thing from a programmer's perspective
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Gradient Boosting on GPU
&lt;/h2&gt;

&lt;p&gt;Now the fun part, since gradient-boosting involves iteratively adding decision trees to a main model, at first it may seem completely counter-intuitive to attempt to run this on GPU. However, we are not parallelizing tree creation, RAPIDS works to parallelized across data. Data points for each iteration will be scaled across your multiple GPUs cores in the background to essentially "unroll" the summation expressed in the additive equation above.&lt;/p&gt;

&lt;p&gt;Enough talk, it's time to code.&lt;br&gt;
Note, this will require you to have a RAPIDS-compatible GPU, the latest version of the CUDA Toolkit and RAPIDS installed. See the following links on how to check for compatibility with RAPIDS, and how to install both RAPIDS and the CUDA toolkit:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://rapids.ai/start.html" rel="noopener noreferrer"&gt;Getting started with RAPIDS AI&lt;/a&gt;&lt;br&gt;
&lt;a href="https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html" rel="noopener noreferrer"&gt;Install the CUDA Toolkit&lt;/a&gt; &lt;br&gt;
&lt;a href="https://docs.nvidia.com/deploy/cuda-compatibility/" rel="noopener noreferrer"&gt;Check CUDA Compatibility&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(If you have a 10-series NVIDIA GPU or newer, you should be fine)&lt;/p&gt;

&lt;p&gt;This does not claim to be a fully end-to-end tutorial, this highlights some of the main features of the RAPIDS and XGBoost APIs.&lt;/p&gt;

&lt;p&gt;Let's ensure we have our libraries imported&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;xgboost&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's also say we have sorted our data set, and have numpy arrays of the training and validation data. From these arrays, we need to wrap this data in XGBoost's DMatrix format. This according the &lt;a href="https://xgboost.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;XGBoost Documentation&lt;/a&gt; is a more optimized data wrapper for Extreme Gradient boosting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dtrain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dvalidation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_validation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_validation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Tilling the Soil
&lt;/h4&gt;

&lt;p&gt;Now we need to specify some parameters for our Gradient-Boosted Tree. I'm ignoring some safety-checking for the purposes of clarity. If this were being deployed, the code would first check for the existence of a GPU, then proceed to create the tree&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tree_method&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpu_hist&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;n_gpus&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;eval_metric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# your choice of metric
&lt;/span&gt;    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;objective&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# some objective function
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most important part of the above is the assignment &lt;code&gt;'tree_method': 'gpu_hist'&lt;/code&gt;. This tells XGBoost to use a CUDA-accelerated GPU-based tree construction method.&lt;/p&gt;

&lt;h4&gt;
  
  
  Light, Water and Love
&lt;/h4&gt;

&lt;p&gt;Now it's time to grow our tree. XGBoost and RAPIDS make this incredibly simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;num_round&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;bast&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtrain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_round&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that's it! You have successfully grown a Gradient-boosted tree on the GPU.&lt;/p&gt;

</description>
      <category>rapids</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
