ng-conf

Posted on Mar 3, 2021

Getting Started With TensorFlow in Angular

#angular #javascript #tensorflow #webdev

Jim Armstrong | ng-conf | Nov 2020

Polynomial Regression using TensorFlow JS, Typescript, and Angular Version 10

Introduction

AI/ML (Artificial Intelligence/Machine Learning) is a hot topic and it’s only natural for Angular developers to want to ‘get in on the action,’ if only to try something new and fun. While the general concepts behind neural networks are intuitive, developers looking for an organized introduction are often suffocated with jargon, complex API’s, and unfamiliar math concepts just from a few web searches.

This article provides a simple introduction on how to use TensorFlow.js to solve a simple regression problem using Typescript and Angular version 10.

Regression and Classification

Regression and classification are two important types of problems that are often solved with ML techniques.

Regression is a process of ‘fitting.’ A functional relationship between independent and dependent variables is presumed. The function exposes a number of parameters whose selection uniquely determines a fit. A quality-of-fit metric and functional representation are chosen in advance. In many cases, the desire is to fit some smooth and relatively simple curve to a data set. The function is used to predict future values in lieu of making ‘guesses’ based on the original data.

Classification involves selecting the ‘best’ output among a number of pre-defined ‘classes.’ This process is often used on images and answers questions such as

Is this an image of a bird?
Does this image contain clouds?
Does this image contain grass?
Is this image the Angular logo?

ML techniques are also used to solve important problems where a set of inputs are mapped to a set of outputs and the functional relationship between the inputs and outputs is not known. In such cases, any functional relationship is likely to be discreet (or mixed discreet/continuous), nonlinear, and likely not closed-form. Ugh. That’s a fancy was of saying that we don’t even want to think about a mathematical model for the process :)

A neural network is used to create an approximation for the problem based on some sort of scoring metric, i.e. a measure of one solution being better or worse than another solution.

Two Dimensional Data Fitting By Regression

Let’s start with a simple, but common problem. We are given a collection of (x, y) data points in two dimensions. The total number of points is expected to be less than 100. Some functional relationship, i.e. y = f(x) is presumed, but an exact relationship is considered either intractable or inefficient for future use. Instead, a simpler function is used as an approximation to the original data.

The desire is to fit a small-order polynomial to this data so that the polynomial may be used as a predictor for future values, i.e. y-estimated = p(x), where p represents a k-th order polynomial,

p(x) = a0 + a1*x + a2*x² + a3x³ + …

where a0, a1, a2, … are the polynomial coefficients (Medium does not appear to support subscripting).

A k-th order polynomial requires k+1 coefficients in order to be completely defined. For example, a line requires two coefficients. A quadratic curve requires three coefficients, and a cubic curve requires four coefficients.

The polynomial for this discussion is a cubic, which requires four coefficients for a complete definition. Four equations involving the polynomial coefficients are required to uniquely compute their value. These equations would be typically be derived from four unique points through which the polynomial passes.

Instead, we are given more than four data points, possibly as many as 100. For each point, substitute the value of x into the equation

p(x) = a0 + a1*x + a2*x² + a3*x³

For N points, this process yields N equations in 4 unknowns. N is likely to be much greater than 4, so more data is provided than is needed to compute a unique set of coefficients. In fact, there is no unique solution to this problem. Such problems are often called overdetermined.

What do we do? Do we throw away data points and only choose four out of the supplied set? We could take all possible combinations of four data points and generate a single cubic polynomial for each set. Each polynomial would interpolate (pass through) the chosen four points exactly, but would appear different in terms of how well it ‘fit’ the remaining data points.

In terms of the approximating polynomial, are we interested only in interpolation or both interpolation and extrapolation?

Interpolation refers to using the polynomial to make predictions inside the domain of the original data points. For example, suppose the x-coordinates (when sorted in ascending order) all lie in the interval [-5, 10]. Using a polynomial function to interpolate data implies that all future x-coordinate values will be greater than or equal to -5 and less then or equal to 10. Extrapolation implies some future x-coordinate values less than five or greater than 10. The polynomial will be used to make predictions for these coordinate values.

In general, performance of a predictor outside the interval of original data values is of high interest, so we are almost always interested in extrapolation. And, if we have multiple means to ‘fit’ a simple function to a set of data points, how do we compare one fit to another? If comparison of fit is possible, is there such a thing as a best-possible fit?

Classical Least Squares (CLS)

The classical method of least squares defines the sum of squares of the residuals to be the metric by which one fit is judged to be better or worse than another. Now, what in the world does that mean to a developer?

Residuals is simply a fancy name given to the difference between a predicted and actual data value. For example, consider the set of points

(0, 0), (1, 3), (2, 1), (3,6), (4,2), (5, 8)

and the straight-line predictor y = x + 1 (a first-order or first-degree polynomial).

The x-coordinates cover the interval [0, 5] and the predicted values at each of the original x-coordinates are 1, 2, 3, 4, 5, and 6. Compute residuals as the difference between predicted and actual y-coordinate. This yields a vector,

[1–0, 2–3, 3–1, 4–6, 5–2, 6–8] or [1, -1, 2, -2, 3, -2]

As is generally the case, some residuals are positive and others are negative. The magnitude of the residual is more important than whether the predictor is higher or lower than the actual value. Absolute value, however, is not mathematically convenient. Instead, the residuals are squared in order to produce a consistent, positive value. In the above example, the vector of squared residuals is [1, 1, 4, 1, 9, 4].

Two common metrics to differentiate the quality of predictors are sum of the squared residual and mean-squared residual. The former simply sums all the squares of the residuals. The latter metric computes the mean value of all squared residuals, or an average error. The terms residual and error are often used interchangeably.

The Classical Least Squares algorithm formulates a set of polynomial coefficients that minimizes the sum of the squared residuals. This results in an optimization problem that can be solved using techniques from calculus.

For those interested, this algorithm is heavily documented online, and this page is one of many good summaries. When formulated with normal equations, polynomial least squares can be solved with a symmetric linear equation solver. For small-degree polynomials, a general dense solver can also be used. Note that the terms order and degree are often used interchangeably. A fifth-degree polynomial, for example, has no term higher than x⁵.

  The normal equations formulation is important as it avoids 
having to solve a linear system of equations with a
coefficient matrix that is a Vandermonde matrix.  Empirical 
evidence shows these matrices to be notoriously ill-
conditioned (with the most notable exception being the 
Discrete Fourier Transform).

In general, it is a good idea to keep the order of the polynomial small because higher-degree polynomials have more inflection points and tend to fluctuate quite a bit up and down. Personally, I have never used this technique in practice on more than a couple-hundred data points and no more than a fifth-degree polynomial.

Now, you may be wanting to experiment with CLS, but find the math pretty intimidating. Never fear, because we have a tried and true method for handling that pesky math. Here it goes …

Blah, blah … matrix … blah, blah … least squares … blah, blah … API.

There! It’s all done for you. Just click on this link and grab all the Typescript code you desire. Typescript libraries are provided for linear and polynomial least squares with multiple variants for linear least least squares. This code base is suitable for fitting dozens or even hundreds of data points with small-degree polynomials. Again, I personally recommend never using more than a fifth-degree polynomial.

Classical least squares is a good technique in that it provides a proven optimal solution for the sum of the squared residuals metric. There is no other solution that produces a smaller sum of residuals, inside the interval of the fitted data set. So, CLS is useful for interpolation, i.e. we expect to make predictions for future x-coordinates inside the interval of the original data set. It may or may not be useful for extrapolation.

This long introduction now leads up to the problem at hand, namely, can we use ML techniques for the cubic polynomial fit problem, and how does it compare to CLS? This leads us into TensorFlow and neural networks.

What Are Tensors?

Tensors are simply multi-dimensional arrays of a specified data type. In fact, if you read only one section of the massive TensorFlow documentation, then make sure it’s this one. Many of the computations in neural networks occur across dimensions of a multi-dimensional array structure, and such operations can be readily transformed to execute on a GPU. This makes the tensor structure a powerful one for ML computations.

Neural Networks 101

In a VERY simplistic sense, neural networks expose an input layer where one input is mapped to one ‘neuron.’ One or more hidden layers are defined, with one output from a single neuron to all other neurons in the subsequent layer. Each of these outputs is assigned a weight through a learning or training process. The final hidden layer is connected to an output layer, which is responsible for exposing a solution (fit, extrapolation, control action, etc) given a specific input set.

The network must be trained on a sample set of inputs, and it is generally validated on another data set that is separate from the training set. The training process involves setting weights along the paths that connect one neuron to another. Weights are adjusted based on a loss function or metric that provides a criteria to measure one candidate solution vs. another solution.

The training process also involves selection of an optimization method and a learning rate. The learning rate is important since the learning process is iterative. Imagine being at the top of a rocky mountain range with a desire to traverse to the bottom as quickly as possible. There is no direct line of sight to an optimal path to the bottom. At best, we can examine the local terrain and move a certain distance in what appears to be the best direction. After arriving at a new point, the process is repeated. There is, however, no guarantee that the selected sequence of moves will actually make it to the ground. Backtracking may be necessary since the terrain is very complex.

I experienced this in real life during a recent visit to Enchanted Rock near Fredericksburg, TX. After ascending to the top, I ignored the typical path back down and elected for a free descent down the SE side. Three backtracks and a number of ‘dead ends’ (local optima in math parlance) were encountered before I finally made it to ground level.

The optimizer attempts to move in the ‘best’ direction for a single step according to some pre-defined mathematical criteria. Gradient-based optimizers are common. The gradient of a multi-variable function is a vector whose direction defines the slope of the function at a particular point (value of all independent variables). The negative gradient provides a direction in which the function decreases. A gradient descent method steps along a direction in which the loss function decreases with the hope of eventually reaching a minimum.

The learning rate defines the ‘length’ of each step in the descent (technically, it is a multiplier onto the error gradient during backpropagation). Larger learning rates allow quick moves in a particular direction at the risk of ‘jumping’ over areas that should have been examined more closely. It’s like hiking on a path that is not very well defined and missing an important turn by moving too fast.

Low learning rates can be nimble and move quickly in any valuable direction, but they have higher execution time and can become ‘bogged down’ in local minima.

So, the learning process is rather involved as it requires selecting good data for training, a good loss function, a proper optimizer, and a balanced learning rate. The process is almost equal part art and science (and a good deal of experience really helps).

These observations are one of the reasons I personally like using a UI framework such as Angular when working with ML models. The ability to present an interactive UI to a someone involved with fine-tuning an ML model is highly valuable given the number of considerations required to obtain good results from that model.

TensorFlow Approach to Polynomial Regression

Polynomial regression using TensorFlow (TF) has been covered in other online tutorials, but most of these seem to copy-and-paste from one another. There is often little explanation given as to why a particular method or step was chosen, so I wanted to provide my own take on this process before discussing the specifics of an Angular implementation.

I recently created an interactive demo for a client who had spent too much time reading about CLS on the internet. The goal of the demo was to illustrate that CLS methods are quite myopic and better used for interpolation as opposed to interpolation and extrapolation.

Here is a visualization of a test dataset I created for a client many years ago. This is a subset of the complete dataset that resulted from a proprietary algorithm applied to a number of input equipment measurements. A linear CLS fit is also shown.

Sample Data set and linear least squares fit

Now, you may be wondering how the plot was created. I have multiple Angular directives in my client-only dev toolkit for plotting. This one is called QuickPlot. It’s designed to perform exactly as its name implies, generate quick graphs of multiple functions and/or data sets across a common domain and range. No grids, axes, labels or frills … just a quick plot and that’s it :)

While I can not open-source the entire client demo, I’m pleased to announce that I’m open-sourcing the QuickPlot directive.

theAlgorithmist/QuickPlot hosted by GitHub

A quick visualization of the data seems to support using a low-degree polynomial for a fit. A cubic was chosen for this article, although the completed project supported making the degree of fit user-selectable (with a maximum of a fifth-degree polynomial).

The ultimate goal is for TensorFlow to compute the coefficients, c0, c1, c2, and c3 such that the polynomial c0 + c1*x + c2*x² + c3*x³ is a ‘best’ fit to the above data.

What criteria do we use to determine that one fit is better than another? The sum of squared residuals has already been discussed, but this is ideal for interpolation inside the domain of the supplied data. Sometimes, it is better to have a more ‘relaxed’ criteria when extrapolation is involved. For this reason, we begin the learning process using average squared residual. This is often called mean-square error or MSE. This metric allows for some larger deviations as long as they are countered by a suitable number of smaller deviations, i.e. the error is smaller ‘on average.’

The use of MSE also allows us to compare two different final fits using the SSE (sum of squared errors or residuals) metric.

The TF optimizer selected for this process is called Stochastic Gradient Descent (SGD). We briefly discussed classical gradient descent (GD) above. SGD is an approximation to GD that estimates gradients using a subset of the supplied data that is pseudo-randomly selected. It has the general qualities of faster execution time and less likelihood to ‘bog down’ in areas of local minima. This is especially true for very large (tens of thousands or higher) data sets.

SGD is not the only optimizer that could be applied to this problem, but it’s generally a good first start for regression problems. The other nice feature of this approach is that we do not have to give any consideration to network structure or architecture; just select an optimizer, loss function, and then let TensorFlow do its work!

Fortunately, we have quite a bit of experimental evidence for selecting learning rates. A relatively small rate of 0.1 was chosen for this example. One of the benefits of an interactive learning module is the ability to quickly re-optimize with new inputs. We have the option to use SSE as a final comparative metric between an ‘optimized’ and ‘re-optimized’ solution.

Data Selection and Pre-Processing

One final consideration is preparation of the data set to be presented to TF. It is often a good idea to normalize data because of the manner in which weights are assigned to neuron connections inside TF. With x-coordinates in the original domain, small changes to the coefficient of the x³ term can lead to artificially large reductions in loss function. As a result, that term can dominate in the final result. This can lead the optimizer in the wrong path down the mountain, so to speak, and end up in a depression that is still far up the mountain face :)

The data is first normalized so that both the x- and
y-coordinates are in the interval [-1, 1]. The interval [0, 1] would also work, but since some of the data involves negative x-coordinates, [-1, 1] is a better starting interval. The advantage of this approach is that |x| is never greater than 1.0, so squaring or cubing that value never increases the magnitude beyond 1.0. This keeps the playing field more level during the learning process.

Normalization, however, now produces two scales for the data. The original data is used in plotting results and comparing with CLS. This particular data set has a minimum x-coordinate of -6.5 and a maximum x-coordinate of 9.7. The y-coordinates vary over the interval [-0.25, 4.25]. Normalized data is provided to TF for the learning process and both the x- and y-coordinates are in the interval [-1, 1].

We can’t use the normalized scale for plotting or evaluating the polynomial for future values of x since those values will be over the domain of all real numbers, not restricted to [-1, 1].

Don’t worry — resolution of this issue will be discussed later in the article.

Now that we have a plan for implementing the learning strategy inside TF, it’s time to discuss the specifics of the Angular implementation.

TensorFlowJS and Angular Version 10

TensorFlow JS can be exercised by means of a Layer API or its Core API. Either API serves the same purpose; to create models or functions with adjustable (learnable) parameters that map inputs to outputs. The exact functional or mathematical representation of a model may or may not be known in advance.

The Layer API is very powerful and appeals to those with less programming experience. The Core API is often embraced by developers and can be used with only a modest understanding of machine-learning fundamentals.

The Core API is referenced throughout this article.

Here are the two dependencies (other than Angular) that need to be installed to duplicate the results discussed in this article (presuming you choose to use the QuickPlot directive for rapid plotting).

"@tensorflow/tfjs": "^2.4.0"
.
.
.
"pixi.js": "4.8.2",

Following are my primary imports in the main app component. I should point out that I created my dev toolkit (from which this example was taken) with Nx. The multi-repo contains a Typescript library (tf-lib) designed to support TensorFlow applications in Angular.

import {
  AfterViewInit,
  Component,
  OnInit,
  ViewChild
} from '@angular/core';

import {
  TSMT$LLSQ,
  ILLSQResult,
  IBagggedLinearFit,
  TSMT$Bllsq,
  TSMT$Pllsq,
  IPolyLLSQResult,
} from '@algorithmist/lib-ts-core';

import * as tf from '@tensorflow/tfjs';

import * as fits from '../shared/misc';

import {
  GraphBounds,
  GraphFunction, 
  QuickPlotDirective
} from '../shared/quick-plot/quick-plot.directive';

import {
  mseLoss,
  sumsqLoss,
  cubicPredict,
  normalize,
  normalizeValue,
  denormalizeValue
} from '@algorithmist/tf-lib';

You can obtain the code for all the CLS libraries in my lib-ts-core library from the repo supplied above.

The line, import * as fits from ‘../shared/misc’ simply imports some type guards used to determine type of CLS fit,

import {
  ILLSQResult,
  IBagggedLinearFit,
  IPolyLLSQResult
} from '@algorithmist/lib-ts-core';

export function isLLSQ(fit: object): fit is ILLSQResult
{
  return fit.hasOwnProperty('chi2');
}

export function isBLLSQ(fit: object): fit is IBagggedLinearFit
{
  return fit.hasOwnProperty('fits');
}

export function isPLLSQ(fit: object): fit is IPolyLLSQResult
{
  return fit.hasOwnProperty('coef');
}

Now, let’s examine each of the library functions imported from @algorithmist/tf-lib, as this serves to introduce low-level programming with TensorFlow JS.

mseloss: This is a loss function based on the MSE or Mean-Squared Error metric discussed above.

import * as tf from '@tensorflow/tfjs';

export function mseLoss(pred: tf.Tensor1D, label: tf.Tensor1D): 
tf.Scalar {
  return pred.sub(label).square().mean();
};

The first item to note is that most TF methods take tensors as an argument and the operation is performed across the entire tensor.

The mseLoss function accepts both a one-dimensional tensor of predictions and a one-dimensional tensor of labels as arguments. The term labels comes from classification or categorical learning, and is a fancy term for what the predictions are compared against.

Let’s back up for a second and review.

The learnable inputs to our ‘model’ are four coefficients of a cubic polynomial.
We are given a set of data points, i.e. (x, y) values, that we wish to fit with a cubic polynomial (which is the function or model for our example).
The predictions are an array of y-coordinates created from evaluating the cubic polynomial at each of the x-coordinates of the supplied training data.
The labels are the corresponding y-values of the original training data.

The mseLoss function subtracts the label from the prediction and then squares the difference to create a positive number. This is the squared error or residual for each data point. The TF mean() method produces the average of the squared errors, which is the definition of the MSE metric. Each of these TF methods operates on a single one-dimensional tensor at a time and each method can be chained. The final result is a scalar.

mseLoss is used to compare one set of predictions vs. another. That comparison is used to assign weights in a network that eventually predicts the value of the four cubic polynomial coefficients.

sumsqLoss: This is another loss or comparative function. Instead of mean-squared error, it computes the sum of the squared error values. This is the function that is minimized in CLS.

import * as tf from '@tensorflow/tfjs';

export function sumsqLoss(pred: tf.Tensor1D, label: tf.Tensor1D): tf.Scalar {
  return pred.sub(label).square().sum();
};

This function also takes predictions and labels (1D tensors) as arguments and produces a scalar result.

cubicPredict: This is a predictor function, i.e. it takes a 1D tensor of x-coordinates, a current estimate of four cubic polynomial coefficients, and then evaluates the cubic polynomial for each x-coordinate. The resulting 1D tensor is a ‘vector’ of predictions for the cubic polynomial.

Before providing the code, it is helpful to discuss the most efficient way to evaluate a polynomial. Most online tutorials evaluate polynomials with redundant multiplications. In pseudo-code, you might see something like

y = c3 * x * x *x;
y += c2 * x * x;
y += c1 * x;
y += c0

to evaluate the cubic polynomial c0 + c1*x + c2*x² + c3*x³.

A better way to evaluate any polynomial is to use nested multiplication. For the cubic example above,

y = ((c3*x + c2)*x + c1)*x + c0;

The cubicPredict code implements nested multiplication with the TF Core API. The operations could be written in one line, but that’s rather confusing, so I broke the code into multiple lines to better illustrate the algorithm. You will also see a Typescript implementation later in this article.

import * as tf from '@tensorflow/tfjs';

export function cubicPredict(x: tf.Tensor1D, c0: tf.Variable, c1: 
tf.Variable, c2: tf.Variable, c3: tf.Variable): tf.Tensor1D
{
  // for each x-coordinate, predict a y-coordinate using nested 
multiplication
  const result: tf.Tensor1D = x.mul(c3).add(c2);
  result.mul(x).add(c1);
  result.mul(x).add(c0);

  return result;
}

Notice that the polynomial coefficients are not of type number as you might expect. Instead, they are TF Variables. This is how TF knows what to optimize and I will expand on Variables later in the article.

normalize: This function takes an array of numerical arguments, computes the range from minimum to maximum value, and then normalizes them to the specified range. This is how arrays of x- and y-coordinates, for example, are normalized to the interval [-1, 1].

export function normalize(input: Array<number>, from: number, to: 
number): Array<number>
{
  const n: number = input.length;
  if (n === 0) return [];

  let min: number = input[0];
  let max: number = input[0];

  let i: number;
  for (i = 0; i < n; ++i)
  {
    min = Math.min(min, input[i]);
    max = Math.max(max, input[i]);
  }

  const range: number         = Math.abs(max - min);
  const output: Array<number> = new Array<number>();

  if (range < 0.0000000001)
  {
    output.push(from);
  }
  else
  {
    let t: number;
    input.forEach((x: number): void => {
      t = (x - min) / range;
      output.push((1-t)*from + t*to);
    })
  }

  return output;
}

The inverse process, i.e. transform data from say, [-1, 1], back to its original domain is denormalize.

export function denormalize(output: Array<number>, from: number, to: 
number, min: number, max: number): Array<number>
{
  const n: number = output.length;
  if (n === 0) return [];

  const range: number         = Math.abs(to - from);
  const result: Array<number> = new Array<number>();

  if (range < 0.0000000001)
  {
    let i: number;
    for (i = 0; i < n; ++i) {
      output.push(min);
    }
  }
  else
  {
    let t: number;
    output.forEach((x: number): void => {
      t = (x - from) / range;
      result.push((1-t)*min + t*max);
    })
  }

  return result;
}

Sometimes, we want to normalize or denormalize a single value instead of an entire array.

export function normalizeValue(input: number, from: number, to: 
number, min: number, max: number): number
{
  const range: number = Math.abs(max - min);

  if (range < 0.0000000001)
  {
    return from;
  }
  else
  {
    const t: number = (input - min) / range;
    return (1-t)*from + t*to;
  }
}
export function denormalizeValue(output: number, from: number, to: 
number, min: number, max: number): number
{
  const range: number = Math.abs(to - from);

  if (range < 0.0000000001)
  {
    return min;
  }
  else
  {
    const t: number = (output - from) / range;
    return (1-t)*min + t*max;
  }
}

These are just some of the functions in my TF-specific Typescript library. They will all be referenced during the course of the remaining deconstruction.

Writing the Polynomial Regression Application

This client demo was created entirely in the main app component. Layout was extremely simplistic and consisted of a plot area, some information regarding quality of fit, polynomial coefficients, and a select box to compare against various CLS fits of the same data.

Note that a later version of the application also provided an area in the UI to adjust the degree of the TF-fit polynomial (not shown here).

app.component.html

<div style="width: 600px; height: 500px;" quickPlot 
[bounds]="graphBounds"></div>

<div>
  <div class="controls">
    <span class="smallTxt">RMS Error: {{error$ | async | number:'1.2-
3'}}</span>
  </div>

  <div class="controls">
    <span class="smallTxt padRight">Poly Coefs: </span>
    <span class="smallTxt fitText padRight" *ngFor="let coef of coef$
 | async">{{coef | number: '1.2-5'}}</span>
  </div>

  <div class="controls">
    <span class="smallTxt padRight deepText">{{dlStatus$ | async}}</span>
  </div>

  <div class="controls">
    <span class="smallTxt padRight">Select Fit Type</span>
    <select (change)="fit($event)">
      <option *ngFor="let item of fitName" [value]="item.name">
{{item.label}}</option>
    </select>
  </div>
</div>

Graph bounds are computed by scanning the training data x- and y-coordinates to determine min/max values and then adding a prescribed buffer (in user coordinates). They are computed in the ngOnInit() handler.

this._left   = this._trainX[0];
this._right  = this._trainX[0];
this._top    = this._trainY[0];
this._bottom = this._trainY[0];

const n: number = this._trainX.length;
let i: number;

for (i = 1; i < n; ++i)
{
  this._left  = Math.min(this._left, this._trainX[i]);
  this._right = Math.max(this._right, this._trainY[i]);

  this._top    = Math.max(this._top, this._trainY[i]);
  this._bottom = Math.min(this._bottom, this._trainY[i]);
}

this._left   -= AppComponent.GRAPH_BUFFER;
this._right  += AppComponent.GRAPH_BUFFER;
this._top    += AppComponent.GRAPH_BUFFER;
this._bottom -= AppComponent.GRAPH_BUFFER;

this.graphBounds = {
  left: this._left,
  top: this._top,
  right: this._right,
  bottom: this._bottom
};

The cubic polynomial coefficients are defined as TF Variables. Variables inform TF of the learnable parameters used to optimize the model.

protected _c0: tf.Variable;
protected _c1: tf.Variable;
protected _c2: tf.Variable;
protected _c3: tf.Variable;

Many online demos (which are often copied and pasted from one another) show Variable initialization using a pseudo-random process. The idea is that nothing is known about proper initial values for variables. Since the data is normalized to a small range, initial coefficients in the range [0,1) are ‘good enough.’ So, you will see initialization such as this in many online references,

this._c0 = tf.scalar(Math.random()).variable();
this._c1 = tf.scalar(Math.random()).variable();
this._c2 = tf.scalar(Math.random()).variable();
this._c3 = tf.scalar(Math.random()).variable();

where a native numeric variable is converted into a TF Variable.

In reality, a decision-maker often has some intuition regarding a good initial state for a model. An interactive learning application should provide a means for the decision-maker to express this knowledge. A brief glance at the original data leads one to expect that it likely has a strong linear component and at least one inflection point. So, the cubic component is likely to also be prevalent in the final result.

Just to buck the copy-paste trend, I initialized the coefficients using this intuition.

this._c0 = tf.scalar(0.1).variable();
this._c1 = tf.scalar(0.3).variable();
this._c2 = tf.scalar(0.1).variable();
this._c3 = tf.scalar(0.8).variable();

Initialization to fixed values should lead to a fixed solution, while pseudo-random initialization may lead to some variance in the final optimization.

Learning rate and TF optimizer are defined as follows:

protected _learningRate: number;
protected _optimizer: tf.SGDOptimizer;

The learning rate is initialized to 0.1. This has historically shown to be a reasonable starting point for regression-style applications.

Recall that TF is trained on normalized data that we wish to differentiate from the original data. TF also operates on tensors, not Typescript data structures. So, TF training data is also defined.

protected _tensorTrainX: tf.Tensor1D;
protected _tensorTrainY: tf.Tensor1D;

TF has no knowledge of or respect for the Angular component lifecycle, so expect interactions with this library to be highly asynchronous and out-of-step with Angular’s lifecycle methods. Plotting occurs in a Canvas, so it can remain happily divorced from Angular’s lifecycle. Everything else in the UI is updated via async pipes. Here is the construction of the application status variable, error information, and the polynomial coefficient display. Each of these shown in bold are reflected in the above template.

this._statusSubject = new BehaviorSubject<string>('Training in 
progress ...');
this.dlStatus$      = this._statusSubject.asObservable();

this._errorSubject = new BehaviorSubject<number>(0);
this.error$        = this._errorSubject.asObservable();

this._coefSubject = new BehaviorSubject<Array<number>>([0, 0, 0, 0]);
this.coef$        = this._coefSubject.asObservable();

The remainder of the on-init handler performs the following actions:

1 — Copy the training x- and y-coordinates into separate arrays and then overwrite them with normalized data in the interval [-1, 1].

2 — Initialize the TF optimizer.

this._optimizer = tf.train.sgd(this._learningRate);

3 — Convert the normalized x- and y-coordinates to tensors,

this._tensorTrainX = tf.tensor1d(this._trainX);
this._tensorTrainY = tf.tensor1d(this._trainY);

4 — Assign graph layers to the QuickPlot directive. There is one layer for the original data (in its natural domain), one for the TF fit, and one for the CLS fit.

@ViewChild(QuickPlotDirective, {static: true})
protected _plot: QuickPlotDirective;
.
.
.
this._plot.addLayer(PLOT_LAYERS.DATA);
this._plot.addLayer(PLOT_LAYERS.TENSOR_FLOW);
this._plot.addLayer(PLOT_LAYERS.LEAST_SQUARES);

The remainder of the work is performed in the ngAfterViewInit() lifecycle hander. First, the original data is plotted and then TF is asked to optimize the current model.

this._optimizer.minimize(() => mseLoss(cubicPredict(this._tensorTrainX, this._c0, this._c1, this._c2, this._c3), this._tensorTrainY));

Note that mseLoss is the defined loss-function or the metric by which one solution is deemed better or worse than another solution. The current predictions for each x-coordinate depend on the current estimate of each of the polynomial coefficients. The cubic polynomial is evaluated (on a per-tensor basis) using the cubicPredict function. The labels or values TF compares the predictions to are the original y-coordinates (normalized to [-1, 1]).

In pseudo-code, we might express the above line of code as the following steps:

1 — vector_of_predictions = evaluate cubic poly(c0, c1, c2, c3, vector_of_x_coordinates)

2 — Compute MSE of vector_of_predictions vs. normalized_y_coords

3 — Optimize model based on MSE comparison criterion.

Once the optimization completes, the sumsqLoss function is used to compute the sum of the squares of the residuals as another measure of fit quality.

let sumSq: tf.TypedArray = sumsqLoss(cubicPredict(this._tensorTrainX, 
this._c0, this._c1, this._c2, this._c3), this._tensorTrainY).dataSync();

The TF dataSync() method synchronously downloads the requested value(s) from the specified tensor. The UI thread is blocked until completion.

The SSE value could be reflected in the UI or simply logged to the console,

console.log('initial sumSq:', sumSq[0]);

It’s also possible to re-optimize, i.e. run the optimization again using the current Variables as starting points for a new optimization. We can see if any improvement is made in the total sum of squares of the residuals.

this._optimizer.minimize(() => mseLoss(cubicPredict(this._tensorTrainX, this._c0, this._c1, this._c2, this._c3), this._tensorTrainY));

sumSq = sumsqLoss(cubicPredict(this._tensorTrainX, this._c0, this._c1, this._c2, this._c3), this._tensorTrainY).dataSync();
console.log('sumSq reopt:', sumSq[0]);

This yields the result shown below.

So, how does this result compare against traditional cubic least-squares? Here is the result.

This is really interesting — CLS (shown in blue) and TF (shown in red) seem to have different interpretations of the data (which is one reason I like to use this dataset for client demonstrations). Recall that CLS is very myopic and optimized for interpolation. There is, in fact, no better interpolator across the original domain of the data. The real question is how does the fit perform for extrapolation?

As it happens, the generated data tends downward as x decreases and upward as x increases outside the original domain. So, in some respects, TF ‘got it right,’ as the TF fit performs much better on out-of-sample data.

Dealing With Multiple Domains

The QuickPlot Angular directive plots functions across the same bounds (i.e. extent of x-coordinate and y-coordinate). The original data and CLS fits are plotted across the same bounds, i.e. x in the interval [-6.5, 9.7] and y in the interval [-0.25, 4.25]. The cubic polynomial, computed by TF, has both x and y restricted to. the interval [-1, 1]. The shape of the polynomial is correct, but its data extents do not match the original data. So, how it it displayed in QuickPlot?

There are two resolutions to this problem. One is simple, but not computationally efficient. The other approach is computationally optimal, but requires some math. Code is provided for the first approach and the second is deconstructed for those wishing to delve deeper into the math behind this project.

The QuickPlot directive allows an arbitrary function to be plotted across its graph bounds. It samples x-coordinates from the leftmost extent of the graph to the rightmost extent, and evaluates the supplied function at each x-coordinate.

For each x-coordinate in the original data range, perform the following steps:

1 — Normalize the x-coordinate to the range [-1, 1].
2 — Evaluate the cubic polynomial using nested multiplication.
3 — Denormalize the result back into the original y-coordinate range.

This approach is illustrated in the following code segment.

const f: GraphFunction = (x: number): number => {
  const tempX: number = normalizeValue(x, -1, 1, this._left, this._right);
  const value: number = (((c3*tempX) + c2)*tempX + c1)*tempX + c0;
  return denormalizeValue(value, -1, 1, this._bottom, this._top);
};

this._plot.graphFunction(PLOT_LAYERS.TENSOR_FLOW, 2, '0xff0000', f);

This approach is inefficient in that a normalize/denormalize step is required to move coordinates back and forth to the proper intervals. It is, however, easier to understand and implement.

Another approach is to compute cubic polynomial coefficients that are ‘correct’ in the original data domain. In other words, TF computes coefficients for one polynomial, P, such that P(x) accepts values of x in [-1, 1] and produces y-values in [-1, 1].

Define another cubic polynomial, Q, with coefficients a0, a1, a2, and a3 that accepts x-coordinates in the original data’s domain (all real numbers) and produces y-coordinates in the original data’s range (all real numbers).

The coefficients of P(x) are c0, c1, c2, and c3. This information is used to compute -a0, a1, a2,_ and a3. There are four unknowns, which requires four equations to uniquely specify these values.

Take any four unique x_-coordinates from the domain of _P, say -1, 0, 1/2, and 1. If the normalize-value function is called N(x), for example, then compute

_x1 = N(-1)

x2 = N(0)

x3 = N(1/2)

x4 = N(1)_

Now, evaluate

_y1 = N(P(-1))

y2 = N(P(0))

y3 = N(P(1/2))

y4 = N(P(1))

P(x) = ((c3*x + c2)*x + c1)*x + c0_ in nested form. For example,
P(0) = c0 and P(1) = c0 + c1 + c3 + c3.

This process produces four equations

_a0 + a1*x1 + a2*x1² + a3*x1³ = y1

a0 + a1*x2 + a2*x2² + a3*x2³ = y2

a0 + a1*x3 + a2*x3² + a3*x3³ = y3

a0 + a1*x4 + a2*x4² + a3*x4³ = y4_

Since x1, x2, x3, and x4 (as well as y1, y2, y3, and y4) are actual numerical values, the system of equations is linear in the unknowns a0, a2, a2, and a3. This system can be solved using the dense linear equation solver in the repo provided earlier in this article.

This approach requires some math and for some that can be pretty intimidating. However, once the new coefficients for Q are computed, the TF cubic polynomial fit can be efficiently computed for any new x-coordinate without consideration of normalization or denormalization.

Tidy Up Your Work

TF produces interim tensors during the course of computations that persist unless removed, so it is often a good idea to wrap primary TF computations in a call to tidy(), i.e.

const result = tf.tidy( () => {
  // Your TF code here ...
});

To check the number of tensors currently in use, use a log such as

console.log('# Tensors: ', tf.memory().numTensors);

Returned tensors (or tensors returned by the wrapped function) will pass through tidy.

Variables are not cleaned up with tidy; use the tf.dispose() method instead.

Summary

Yes, that was a long discussion. Pat yourself on the back if you made it this far in one read :)

TensorFlow is a powerful tool and the combination of TF and Angular enables the creation of even more powerful interactive machine-learning applications. If you are not already familiar with async pipe in Angular, then master it now; it will be your most valuable display tool moving forward with TF/Angular.

I hope you found this introduction helpful and wish you the best with all future Angular efforts!

ng-conf: The Musical is coming

ng-conf: The Musical is a two-day conference from the ng-conf folks coming on April 22nd & 23rd, 2021. Check it out at ng-conf.org

DEV Community