DEV Community: Rafael Rocha

Time Series Forecasting through Extreme Learning Machine:

Rafael Rocha — Mon, 05 Sep 2022 00:46:05 +0000

Extreme Learning Machine

The most common artificial neural network architecture is the feedforward neural network. The information of this network propagate (flows) in one direction from the input layer to the output layer.

Extreme Learning Machine (ELM) are feedforward neural networks, which can be used for regression and classification approaches, for example. The weights between the input layer and the hidden layer are assigned randomly. While the weights between the hidden layer and the output layer are computed or learned in a one-step. The second set of weights is computed by the Moore-Penrose inverse of the hidden layer output matrix.

Feedforward Neural Network

The figure below shows a feedforward neural network, which demonstrates the elements in ELM.

The input layer is composed by the input matrix X of size M x N plus the bias of size M x 1, where M is the number of examples and N (equal to 3 in the image) is the number of features. Next is to show the weights W1 assigned randomly of size L x N + 1, where L is the number of neurons in the hidden layer.

The hidden layer output is computed by the following equation:

H = tanh(X_{a}W_{1}^{T})

Where Xa is the concatenation of bias and the input matrix X and tanh is the hyperbolic tangent activation function, which limits the output of each neuron to -1 and 1.

The weights W2 are obtained by the multiplication by the Moore-Penrose inverse of Ha and the target y, as shown in the equation below:

W_{2} = H_{a}^{-1}y

Where Ha is the bias plus hidden layer output matrix H. Thus, we can make predictions by the following equation:

W_{2} = H_{a}^{-1}y

All steps to obtain the parameters W1 and W2 are performed by the training data, and with these parameters in hand, we can make new predictions of data that isn't in the training process, in this case, the test data. The below function shows the one-step learning of ELM to obtain the predictions and the parameters W1 and W2:

def elm_train(X, y, L, w1=None):

  M = np.size(X, axis=0) # Number of examples
  N = np.size(X, axis=1) # Number of features

  # If w1 is defined
  if w1 is None:
    w1 = np.random.uniform(low=-1, high=1, size=(L, N+1)) # Weights with bias

  bias = np.ones(M).reshape(-1, 1) # Bias definition
  Xa = np.concatenate((bias, X), axis=1) # Input with bias

  S = Xa.dot(w1.T) # Weighted sum of hidden layer
  H = np.tanh(S) # Activation function f(x) = tanh(x), dimension M X L

  bias = np.ones(M).reshape(-1, 1) # Bias definition
  Ha = np.concatenate((bias, H), axis=1) # Activation function with bias

  w2 = (np.linalg.pinv(Ha).dot(y)).T # w2' = pinv(Ha)*D

  y_pred = Ha.dot(w2.T) # Predictions

  return y_pred, w1, w2

Time Series Forecasting

Initially, it is necessary to transform the time series forecasting problem into a machine learning problem. To do that, we adjust the temporal data in terms of input and target variables to make it available to any linear regression application, including EML.

In this way, we use the concept of lag, which is the past values in a time series. For our problem, the lag is used to set the number of features in the input matrix, if we use a lag of 3, the input matrix will have a size of M x 3, and the three past values of the time series are used as input. The target variable is assigned as the next value in the series, which can be immediately after the first lag (one-step forward) or more steps forward.

To exemplify, consider the example below:

# Time series
series = [3.93, 4.58, 4.8, 5.07, 5.14, 4.94]

# Three lags and one-step forward
X = [[3.93, 4.58, 4.8],
[4.58, 4.8, 5.07],
[4.8, 5.07, 5.14]]
y = [[5.07],
[5.14],
[4.94]]

# Two lags and two-steps forward
X = [[3.93, 4.58],
[4.58, 4.8],
[4.8, 5.07]]
y = [[5.07],
[5.14],
[4.94]]

In this example, the time series is series, X and y are the input matrix and the target, respectively. The first example presents three lags and one-step forward, which to predict the value at X(t) it is necessary the values at X(t-1), X(t-2), and X(t-3). In the last example, two lags and two-steps forward, the values at X(t-2) and X(t-3) are used to predict the values at X(t).

The function below adjust the time series to a machine learning problem:

def sliding_window(serie, lag=2, step_forward=1):

  M = len(serie) # Lenght of time series

  X = np.zeros((M-(lag+step_forward-1), lag)) # Input definition
  y = np.zeros((M-(lag+step_forward-1), 1)) # Target definition

  cont = 0
  posinput = lag + cont
  posout = posinput + step_forward

  i = 0
  while posout<=M:

    X[i, :] = serie[cont:posinput]
    y[i] = serie[posout-1]
    cont+=1
    posinput = lag+cont
    posout = posinput + step_forward
    i+=1

  return X, y

The inputs of the function are the time series, the number of lags (lag) and, the number of steps forward (step_forward). While the outputs are the input matrix of size (M-(lag+step_forward-1)) x lag and the target variable of size (M-(lag+step_forward-1)) x 1.

Results and Evaluation

To evaluate the predictions of ELM is used the mean squared error (MSE) as shown in the equation below:

\frac{1}{MC}\sum_{i=1}^{M}(y-y_{pred})^{2}

Where M is the number of examples, C is the number of outputs, y is the true target variables and y_pred is the predicted target variable.

The function below computes the MSE:

def mse_function(Y, Y_pred):

  M, C = Y.shape[0], Y.shape[1] # Number of examples (M) and number of outputs (C)

  E = Y-Y_pred # Error between Y true and Y predicted
  mse = np.sum(E.T.dot(E))/(M*C) # Mean squared error

  return mse

Before training the ELM model and obtaining the set weights to make the predictions, we must normalize the time series. Here is used the min-max normalization, which normalizes the input data into a specific range (a, b), in this case (0, 1), as shown in the function below.

def minmax_normalization(X, a=0, b=1):

  xmin, xmax = X.min(), X.max() # Min and max values of data
  X_norm = a + ( (X-xmin)*(b-a)/(xmax-xmin) ) # Normalized data in a new range

  return X_norm, a, b, xmin, xmax

To obtain the parameters of normalization is used only the training data, and with these parameters, we normalize both training and test data. In this way, after training the model and making the prediction in both data, we perform the reverse normalization to evaluate properly the model performance.

We used the time series of Brazilian gross domestic product between 1980 and 1997 and the frequency of the data is monthly (total of 256 months), available on link. It is used 80% of the first monthly data to train the model and 20% of the last monthly data to test the training parameters. The figure below shows the time series split into training (blue line) and test (green line) data, where the black dashed line represents the split.

First, to create the X and y, it is used the lag of 2 and one-step forward to make predictions. The trained model with 3 hidden neurons shows a good performance in the time series forecasting, which reach MSE of 30.99 and 38.74 in the training and test data, respectively. The figure below shows the predictions in the test data, assigned as the red dashed line, which is similar to real test data (green line).

When changing the lag to 3 and preserving the one-step forward, occurs a worsening in the model performance, reaching a MSE of 180.83 in the test data. Therefore, it is not a good idea to increase the number of past values to predict one-step forward for the analyzed time series.

Conclusion

In this way, use extreme learning machine is a good start method to time series forecasting, where it is necessary transform the time series into an input and target before. The ELM is simple approach, but obtain powerful results in propose, where the training is done in a one-step with help of Moore-Penrose inverse.

The complete code is available on Github and Colab. Follow the blog if the post it is helpful for you.

Choosing the Number of Components of Principal Component Analysis: An investigation of cumulative explained variance ratio

Rafael Rocha — Sat, 28 May 2022 12:35:25 +0000

Principal Component Analysis

Principal Component Analysis (PCA) is an approach used for dimensionality reduction, which improve the data visualization while preserving the maximum of information of original data. The loss of information is investigated by the variance between original data and the compressed (projected) data, which aim is to maximize the variance.

In machine learning, the PCA influences the speed of the learning algorithm, which reduces a high dimensionality set of features (e.g. 10,000) to a lower one (e.g. 1,000). So, the PCA enables the algorithm to run faster.

The algorithm

The PCA needs a preprocessing step before the algorithm itself. The preprocessing aims to normalize each feature of data, where each feature in the dataset will have zero mean and unit variance. The code is shown below:

def featureNormalize(X):

  mu = np.mean(X, axis=0) # Mean of each feature

  sigma = np.std(X, axis=0) # Standard deviation of each feature

  X_norm = (X-mu)/sigma # Normalized data (zero mean and unite standard deviation)

  return X_norm

To reduce the data from n-dimensions to k-dimensions (equivalent to the number of components), firstly it is computed the covariance matrix denoted by ∑ as follows:

\sum =\frac{1}{m}X^{T}X

Where X is the n-dimensional data, m is the number of examples, and ∑ is the covariance matrix of size n x n.

Next, it is computed the eigenvectors of covariance matrix through the Singular Value Decomposition or SVD, which we get the unitary matrix U and the vector with singular values S. Choosing the k first columns of matrix U we get the new matrix Ur, which is used to obtain the reduced data Z of k-dimensions as shows the equation below:

Z=XU_{r}

The Python code to perform the PCA is shown below:

def pca(X, k):

  m = np.size(X, axis=0) # Number of examples

  sigma = (1/m)*X.T.dot(X) # Covariance Matrix

  [U, S, V] = np.linalg.svd(sigma) # Singular Decomposition Value

  Ur = U[:, 0:k] # U reduce

  Z = X.dot(Ur) # Projected data of k-dimensions

  return Z, S

The original data X and the number of components k are the inputs of function and the projected data Z and the singular values vector are the outputs of the function.

Choosing the number of principal components

The core of PCA is the choice of the number of components k, which is investigated by cumulative explained variance ratio and evaluate it we can preserve the information of original data. The variance ratio is obtained by the singular values vector S as shown in the equation below:

\frac{\sum_{i=1}^{k}S_{i}}{\sum_{i=1}^{n}S_{i}}

Where k is the number of components (k-dimensions) and n is the number of features. The choice of k is done by selection of the smallest values of k, which has a variance ratio higher than a specific threshold, 99% for example. The code to calculate the cumulative explained variance ratio is shown below:

def cumulativeExplainedVariance(S, k_range):

  variance_ratio = np.zeros(k_range) # Cumulative explained variance ratio

  for i in range(k_range):
      variance_ratio[i] = np.sum(S[0:i+1])/np.sum(S)

  return variance_ratio

The function inputs are the singular value vector S and the range of components investigated k. The output is a vector of the cumulative explained variance ratios.

Iris dataset

The iris dataset is used to analyze the choice of the number of components in PCA. A brief description of the dataset and its features is shown in the previous blog post of Fisher’s discriminant.

Investigating the iris dataset for k values in range 1–3, we obtain the plot of the number of components by the cumulative explained variance ratio below:

It is observed that k = 3 is the smallest value of k with a cumulative explained variance ratio of 0.9948, which is higher than the threshold of 0.99 (red dashed line). So, k = 3 is the small value that maximizes the variance between the original data and the projected data. The figure below shows the 3D scatter plot of the projected data from iris data with k = 3.

Nutrients of pizza dataset

The nutrient analysis of pizza dataset is investigated now. The dataset is available on Kaggle and has 300 examples and 7 features distributed in 10 classes. Some features are the amount of water and protein per 100 grams in the sample.

Inspecting the nutrients of pizza dataset for k values in range 1–6, we obtain the plot of the number of components by the cumulative explained variance ratio below:

Checking the plot above it is noted that k = 4 is the smallest value of k with a cumulative explained variance ratio of 0.9960, which is higher than the threshold of 0.99 (red dashed line). Thus, k = 4 is the small value that maximizes the variance between the original data and the projected data. The figure below shows the plot of feature 1 and feature 2 of the projected data from nutrients of pizza data with k = 4.

Conclusion

In this way, evaluating the cumulative explained variance ratio is a reliable method to choose the number of components on PCA, which performs the dimensionality reduction while maximizing the variance between the original data and the projected data.

The complete code is available on Github and Colab. Follow the blog if the post it is helpful for you.

If you interested in dimensionality reduction with Fisher’s linear discriminant, I wrote a blog post about it. If you want to check it out: blog post.

Dimensionality Reduction with Fisher’s Linear Discriminant: An investigation of decision boundaries

Rafael Rocha — Wed, 18 May 2022 12:48:25 +0000

Introduction

In the classification problems, each input vector x is assigned to one of K discrete classes Ck. The input space is divided into decision regions whose boundaries are called decision boundaries. Datasets (set of input vectors) whose classes can be separated exactly by linear decision boundaries are said to be linearly separable.

One simplest approach used in classification problems involves the building of discriminant functions that assign each vector x to a specific class, denoted Ck.

One way to better visualize the classification through discriminant functions is in terms of dimensionality reduction. In this blog post is used Fisher's linear discriminant that take the D - dimensional input vector x and project it down to a smaller dimension vector.

Dataset

The dataset used is the Iris dataset that contains 3 classes (Setosa, Versicolour and Virginica) of 50 examples each, where each class refers to a type of iris plant. The dataset has 4 attributes or features that is (in cm): sepal length, sepal width, petal length, and petal width. The figure below shows three lines of the dataset each of a specific class where Setosa, Versicolour and Virginica are assassinated as class 0, 1 and 2, respectively.

Fisher's linar discriminant

The Fisher's linear discriminant take the D - dimensional input vector x and project it down to a R - dimensional vector, using the equation below:

y=w^{T}y

Where x is the input vector, w is the weight vector and y is the projected vector of smaller dimension. As the dataset is used and not just an input vector, the x, w and y will be the matrices, X, W and Y, respectively.

The iris dataset has a D = 4 dimensions since it has 4 features, thus, R must be smaller than 4.

The purpose of Fisher's linear discriminant is find the matrix W of size N x R. First the matrix SW called within-class covariance matrix is obtained by following equation:

S_{W}=\sum_{k=1}^{K}S_{k}

Where

S_{K}=\sum_{c\epsilon C_{k}}^{}\left ( x_{n}- m_{k} \right)\left ( x_{n}- m_{k} \right )^T

m_{k}=\frac{1}{N_{k}}\sum_{c\epsilon C_{k}}^{}x_{n}

The mean of the examples of a specific class is determined by mk and Nk is the number of examples in class Ck.

Next, the between-class covariance matrix SB is obtained by equation below:

S_{B}=\sum_{k=1}^{K}N_{k}\left ( m_{k}-m \right)\left ( m_{k}-m \right )^T

Where m is the mean of total dataset.

The values of matrix W are determined by the eigenvectors of (SW^-1)SB that correspond to the D' largest eigenvalues.

The code

The code is done in Python and two loops are the core of fisher's linear discriminant. These loops are used to get both SW and SB matrices to determine W by eigenvectors of (SW^-1)SB later. The first loop is shown below:

for k in Ck:
  Xc = X[labels==k]
  Sk.append(np.cov(Xc.T))
  mk.append(np.mean(Xc, 0))
  Nk[k] = len(Xc)

The initial line in the loop gets the examples of the specific class Ck. Next, are obtained the matrix Sk, used to obtain SW later. The next two lines get mk and Nk to calculate SB in the next loop.

The second loop is shown below. The first line obtains the within-class covariance matrix SW, and the subsequent lines are used to get the between-class matrix.

for k in Ck:
  SW += Sk[k]
  temp = mk[k]-m
  temp.shape = (D, 1)
  SB += np.dot(Nk[k], temp*temp.T)

Lastly, we obtain the matrix through eigenvectors of (SW^-1)SB as shown in the code below:

invSw = np.linalg.pinv(SW)
invSw_by_SB = np.dot(invSw, SB)

eigenvalues, eigenvectors = np.linalg.eig(invSw_by_SB)

sort_eigval = np.argsort(eigenvalues)[::-1]
sort_eigval_index = np.argsort(eigenvalues)[::-1]

W = eigenvectors[:, sort_eigval_index[0:R]] # Weight matrix

Y = np.dot(W.T, X.T)
Y = Y.T # Projected data

Initially, the (SW^-1)SB is calculated, followed by obtaining eigenvalues and eigenvectors of this result. Thus, W is obtained by the first R columns of eigenvectors in descending order of the eigenvalues.

Results

The figure below shows the scatter plot of Petal length and Petal width. Analyzing those two features is noted the difficulty of visualizing the decisions boundaries between three classes, specifically the Versicolour and Virginica.

Applying the fisher’s linear discriminant with R = 2 in the Iris dataset, we obtain the projected data with 2 features, as shown in the figure below. With the new two features, it is observed a better distinction between the three classes as shown an improvement in visualization of the decision boundaries.

To exemplify, performing the dimensionality reduction with R = 1, we also achieved good decision boundaries between the classes.

Thus, fisher’s linear discriminant is a good way to investigate the decision boundaries in classification tasks, like the Iris dataset, through dimensionality reduction.

The complete code is available on Github and Colab. Follow the blog if the blog post it is helpful for you.

Prevent the Overfitting through Regularization: An example by Ridge Regression

Rafael Rocha — Sat, 07 May 2022 13:32:27 +0000

Description

In my post on Polynomial Curve Fitting was discussed that adding more examples is one of the possible ways to prevent overfitting, the phenomenon that occurs in the figure above, where is a gap between the training (lower) and validation (higher) errors.

Another approach used to control the overfitting is Regularization, which involves adding a penalty term to the error function to discourage the coefficients from reaching large values, how to introduce Bishop in the Pattern Recognition and Machine Learning book.

This post continues the polynomial curve fitting analysis but through the Regularization, known as Ridge Regression instead of Linear Regression.

Regularization

To apply the regularization in the previous analysis, it is necessary to modify the Sum-of-Squares Error (SSE) function adding the regularization parameter λ, as shown in the equation below:

\frac{1}{2}\sum_{n=1}^{N}\left ( y_{pred}-y \right )^{2}+\frac{\lambda}{2}\left| w \right|^{2}

Where ||w||² is equivalent to w.T * w, and the parameter or coefficient λ conducts the relative importance of the regularization term compared with the SSE term.

As before, instead use some optimization algorithms, like gradient descent, it is used the adapted normal equation to obtain the coefficients w, as shown below:

W=\left ( X^{T}X + \lambda I \right )^{-1}X^{T}y

Where λ is the regularization parameter, I is the identity matrix of size M + 1 and M is the order of the polynomial. The coefficients obtained through normal equation are given by the function below:

def normal_equation_ridge(x, y, M, L):

  # Normal equation: w = ((x'*x)^-1 + L*I) *x'*y

  I = np.identity(M+1)

  xT = x.T
  w = np.dot(np.dot(np.linalg.pinv(np.dot(xT, x) + L*I), xT), y)

  return w

Choosing the regularization parameter

To exemplify the regularization, we used the overfitted model of M = 9, which obtained a Root-Mean Square Error (RMSE) of 0.0173 and 6.1048 in training and validation sets, respectively. The regularization parameter λ values ranging from -40 to 0 were investigated, but to better illustrate, the values are displayed in terms of the natural log, between -40 ≤ ln(λ) ≤ 0, where the value λ = exp(L) is input from the above function and L is the value in range. The RMSE of the validation set is used to choose the parameter λ. The figure below shows the analysis done to choose the parameter λ.

For the value ln(λ) = -40, in the figure above, the RMSE is approximately the value without regularization (6.1042), as the value of λ tends to zero (λ = exp(-40) = 4.2483e-18). The best parameter found (red dashed) is ln(λ) = -11.43 (λ = exp(-11.43) = 1.0880e-5), reaching 0.1218 of RMSE of validation set, while the training set got 0.0637.

The table below compares the coefficient values for ln(λ) = -∞ and ln(λ) = -11.43. Note that ln(λ) = -∞ corresponds to the model without regularization and ln(λ) = -11.43 the model that has the smallest validation error with regularization. It is possible to notice that the coefficients of ln(λ) = -∞ are large, while the values ln(λ) = -11.43 are smaller due to the addition of the penalty term.

Polynomial order with normalization

As the post of Polynomial Curve Fitting, the error analysis is performed in the training and validation sets by the order of the polynomial, but now with regularization.

The figure below shows the training and validation RMSE by the order of the polynomial, where the prevention of overfitting due to the use of regularization in each analyzed order (M= [0, 1, 3 and 9]) is noted.

The complete code is available on Github and Colab. Follow the blog if the post it is helpful for you.

Follow me on Linkedin and Github.

Curve Fitting: An explain of key concepts of machine learning

Rafael Rocha — Sat, 30 Apr 2022 01:37:19 +0000

Description

This post presents a simple regression problem through Polynomial Curve Fitting analysis. Besides, will explain some key concepts of machine learning, as generalization, overfitting, and model selection. The Pattern Recognition and Machine Learning book of Christopher M. Bishop inspires the post.

Data generation

Both training and validation sets are synthetically generated. The input variable X is generated by spaced uniformly random values in range [0, 1]. The output (or target data) variable y is generated from the generating function sin(2πx) added of random noise. The figure in the cover presents the curves of the target data (training set) and the generating function of N = 10 training examples.

Polynomial features

First, the input variable X (that represents one single feature) will be transformed to polynomial features (X_poly), according to the below function:

def poly_features(x, M):

  N = len(x)

  x_poly = np.zeros([N, M+1], dtype=np.float64)
  for i in range(N):
    for j in range(M+1):
      x_poly[i, j] = np.power(x[i], j)

  return x_poly

Thus, the column vector X of size N x 1 will result in a N x M + 1 matrix, where M is the order of the polynomial. For example, using M = 2 and x = 0.3077 results in three features as follows:

The figure below presents the polynomial features of three examples of the training set.

Training

The next step aims to train the model (with training data), that is, find the coefficients W that multiplied by polynomial features allow us to make predictions y_pred. There are some optimization algorithms, like gradient descent, that minimize the cost function, in this case, the error between the real value y and the predicted value y_pred. Here is used the normal equation to obtain W, as given by the equation (normal) below:

W=\left ( X^{T}X \right )^{-1}X^{T}y

Where X is the polynomial features matrix of size N x M+1, y is the output variable of size N x 1, and W is the coefficients of size M+1 x 1. The coefficients obtained through normal equation are given by the function below:

def normal_equation(x, y):

    # Normal equation: w = ((x'*x)^-1) *x'*y

    xT = x.T
    w = np.dot(np.dot(np.linalg.pinv(np.dot(xT, x)), xT), y)

    return w

Measuring the error

With the coefficients W in hand, it is necessary to measure the error between the real value (output variable y) and the predicted value y_pred. It is used the Root-Mean Square Error (RMSE) obtained through the Sum-of-Squares Error (SSE) to evaluate the performance of the training model. The equations below obtain the errors:

\frac{1}{2}\sum_{n=1}^{N}\left ( y_{pred}-y \right )^{2}

\sqrt{\frac{2SSE}{N}}

Where N is the number of examples. The functions below calculate these errors:

def error(y, y_pred):

    N = len(y)

    # Sum-of-squares error
    sse = np.sum(np.power(y_pred-y, 2))/2

    # Root mean squere error
    rmse = np.sqrt(2*sse/N)

    return sse, rmse

Validation

The next step is to evaluate the validation set, which is data that was not in the model training. Here it is verified whether the model fits the validation set. Most steps are the same as training, how to get the polynomial features, make predictions and measure the error.

Polynomial order analysis

The figure below shows the four (M = 0, 1, 3, 9) different orders of the polynomial. The green curve represents the generating function, blue dots presents the target data (training data, X and y), and the red curve shows the predicted data (y_pred). The worst predictions are presented by M = 0, followed by M = 1. M = 3 and M = 9 are the ones who achieve the best prediction, that is, they are the models that best fit the training set.

Error analysis

Analyzing the RMSE (in the figure below), there is a tendency for the error to decrease in both training and validation between M = 0 and M = 3. But for M = 9, one notices a much bigger error in the validation set, that is, as the order of the polynomial grows it becomes more difficult for the model to predict examples outside the training set. In short, the model lacks the generalization ability. The behavior of M = 9 can be called overfitting.

The order of polynomial helps in selection model, which the choice is M = 3, as it is the model that has the smallest validation error.

One of the possible ways to prevent overfitting is to add more examples, in this case generating training and validation sets with N = 50, we obtain the RMSE by order of the polynomial as shown in the figure below:

Thus, when more examples are added, it is possible to prevent overfitting of the M = 9 model.

The complete code is available on Github and Colab. Follow the blog if the post it is helpful for you.
Follow me on Linkedin and Github.

Automata Equivalence: Automata without epsilon transitions, inaccessible and useless states

Rafael Rocha — Sun, 24 Apr 2022 20:02:54 +0000

Description

This post aims to demonstrate a way to get the Deterministic Finite Automata (DFA) without ε (epsilon) transitions, inaccessible and useless states, and equivalent to automaton in the cover.

The alphabet of automaton is {a, b, ε}, q0 is the initial state, q3 and q6 are final states, and the ε transitions allow the automaton to change its state, without using an input symbol.

The first step is to put the automaton in tabular notation or transition table, as shown below. The rightwards arrow (→) indicates initial states, the leftwards arrow (←) indicates final states and the left right arrow (↔) indicates that state are both initial and final.

For example, the state q0 has transition to both q1 and q4 through ε. The state q5 has transition to himself through a, and to q6 through symbol ε.

Eliminating ε Transitions

Now, it’s necessary to analyze each state and merge it with each state reached by an ε transition, until the analyzed state does not have any ε transition.

As seen above, the first merge (analyzed state q0 and {q0, q4}) achieves transitions to states q1, q4and {q2, q5} through a, b and ε, respectively. The second merge tries to eliminate the new ε transitions {q2, q5} but results in a new ε transitions {q3, q6}. The third merge eliminates the ε transition, where now state q0 is also a final state (←), because merges with a final state q6, besides has transitions to {q1, q3, q5} and {q2, q4, q6} through a and b, respectively.

After analyzing all the states, we have the following transition table as result:

Non-Determinism

The next step is to remove the non-determinism from the automaton. The non-determinism can be indicated by q0 transitions (table above), for example, when receiving a, three moves can be performed (for q1, q3, and q5). Thus, it is necessary to remove the non-determinism so that when receiving an input, the automaton has only one possible path to follow.

For this, each set of transitions ({q1, q3, q5}) will be marked as a new state (q1q3q5) and the transitions of this new state will be given by the merge of the sets of states that form this new state, and if one of these states of the set is final, this new state will also be. If non-determinism continues, that is, a new set of transitions appears that is not yet a new state, the process will continue. The table below shows the result of removing non-determinism.

Accessible and Useless States

The next steps check if the states are accessible and then if they are useless. To check if the state is accessible, the analysis starts with the initial state q0 (initially accessible, marked as ok in the accessible field), and we check which states are accessible through this (q1q3q5e q2q4q6) and mark them as accessible and finish the analysis of q0 by marking as considered. Then we continue to analyze the states marked as accessible but not yet considered until we find all the accessible ones and all those accessible ones are analyzed (marked as ok in the considered field).

The analysis of useless states only considers accessible states. However, now we start analyzing by one of the final states (marked as useful), for example, q3q5, then we check which states reach this, in this case only q6 (marked as useful), and we mark q3q5as considered. The process continues with q6 but is not accessible by anyone other than itself, and we mark it as considered. We then check if there is another final state, the process continues until there is no other final state. In the end, we find that all accessible states are useful, so no changes are made. After both analyses, the states q1, q2, q4, q5, q1q3, and q4q6 are removed.

Renaming the states as follows:

q1q3q5 = q1
q2q4q6 = q2
q2q6 = q3
q3q5 = q4
q3 = q5
q6 = q6

We obtain the tabular form of the automaton without ε transitions, inaccessible and useless states and equivalent to the initial automaton:

Final Automaton

Converting from tabular notation (above) to automaton in state representation we get:

This automaton recognizes the following language:

L = a*b*a* +  b*a*b*

This language is the same recognized by the initial automaton.

Here a way to obtain an equivalent finite deterministic automata without ε transitions, inaccessible and useless states was presented. Share if it’s helpful for you.

Validation mask through Regex with Python

Rafael Rocha — Fri, 22 Apr 2022 15:26:39 +0000

Introduction

Regular expressions (or Regex) is a robust approach used to analyze whether a string belongs to a certain language, and is used in the validation of several fields, such as email and password, as will be seen here. The library re of Python was used to validate each field.

The regex is a composition of symbols, characters with special functions, what, grouped among themselves and with literal characters form a regex. The regex is interpreted as a rule witch will indicate success if it matches all of his conditions.

To build the validations masks of email and password is considerate the following alphabets: Σ = {a, b, c, …, z}, Γ = {A, B, C, …, Z} and N = {0, 1, 2, …, 9}.

Validation Masks

The strings accepted by the email field have symbols from Σ and must contain a symbol @ and end with .br, where it must have at least one symbol Σ between @ and .br. Furthermore, the email must start with the @ symbol.

The password field may contain symbols of all alphabets Σ, Γ and N. It is necessary at least one symbol of Σ and N. Besides, the password must contain, necessarily, length of 8.
The regex to validation mask of email is given as following:

^[a-z]+@[a-z]+.br$

This expression accepts at least one symbol of Σ, where the character ^ indicates that start with symbols from a to z ([a-z]), the character + ensures at least one symbol of Σ, and $ ensures that string is finished with .br.

The password regex is showed bellow:

(?=.*\d)(?=.*[A-Z])[a-zA-Z0-9]{8}$

The expression (?=.\d) ensures at least one symbol of **N. In the same way, the expression (?=.[A-Z]) ensures at least one symbol of Γ. Besides, the two first expressions indicate that no matter where symbols of N and Γ appear in the password. Lastly, the expression [a-zA-Z0–9]{8} ensures symbols of all alphabets, and the string must have exactly a length of 8.

Example

The function match of re python library is used to test the regex of each field. The first argument of function match is the regex created, and the second is the string to be tested. Bellow is showed the application of re.

import re
regex = '^[a-z]+@[a-z]+.br$'
string = 'whoisgamora@.br'
if bool(re.match(regex, string)):
    print('Valid email!')
else:
    print('Invalid email!')

This example generates the message Invalid email!, because there isn’t any symbol of Σ between @ and .br.

A complete application of the validation mask through regex can be tested in Colab. Other fields were tested in the Colab, as name and last name, and telephone number.

Rastreamento de Contatos por Visão Computacional

Rafael Rocha — Sun, 03 Apr 2022 00:23:37 +0000

Introdução

Devido a situação pandêmica vivenciada desde o inicio do ano de 2020, diversas estratégias surgiram para realizar o rastreamento de contatos de pessoas infectadas com o vírus SARS-CoV-2.

Aqui é apresentado um rascunho de rastreamento de contatos por visão computacional, onde considera-se um ambiente de trabalho fechado (um escritório, por exemplo), com um número limitado de pessoas e câmeras disponíveis em locais específicos para realizar a análise. Esse exemplo visa detectar as pessoas através de detecção de faces, e com base nas distâncias euclidianas (em pixels) entre as faces, avaliar e gerar um alerta informando que determinada pessoa esteve próxima/teve contato com uma pessoa que está com o vírus.

Etapa 1

A primeira etapa da aplicação tem como objetivo detectar todos as faces em uma determinada cena/imagem. Para isso, é utilizado o algoritmo Viola-Jones para detectar as faces, onde utiliza-se um arquivo XML treinado com as características (features) Haar para detecção de faces frontais.

Através dos bounding boxes gerados na detecção das faces, são calculados os pontos centrais destes, que são utilizados na obtenção da distância euclidiana entre as faces na imagem.

Etapa 2

A segunda etapa visa classificar, através de uma rede neural convolucional, as faces detectadas na etapa anterior, cujo objetivo real seria identificar as pessoas (faces) na imagem, possibilitando assim o rastreamento de contatos. Utilizou-se o dataset VGGFace, que possui faces rotuladas de 8631 celebridades. Para a classificação, foi usado o modelo pré-treinado ResNet50.

Os resultados da classificação são mostrados abaixo:

Detecção 1
Edinson Cavani: 98.043%, Juliano Cazarre: 0.075%, Tino Costa: 0.075%

Detecção 2
Anthony Ogogo: 99.913%, Brahim Asloum: 0.007%, Scott Quigg: 0.006%

Detecção 3
Sharon Stone: 97.929%, Noelle Reno: 0.985%, Tina Maze: 0.092

Os resultados mostram que todas as três pessoas foram classificadas corretamente, com distribuições de probabilidades superiores a 95%.

Etapa 3

Por fim, a etapa final, que visa obter as distâncias euclidianas entre as combinações de pessoas (faces) detectadas na imagem. A Tabela abaixo mostra as distâncias euclidianas entre as pessoas na imagem.

Ao considerar o limiar (threshold) da distância entre as pessoas como 500 pixels, e considerando Edinson Cavani com o vírus, a pessoa que teve um contato mais próximo dele é Sharon Stone, sendo necessário alertar essa pessoa para as medidas e precauções cabíveis.

O código com o rascunho do rastreamento de contatos por visão computacional pode ser encontrado em: https://github.com/rlrocha/pdi. Um melhor detalhamento do código desenvolvido encontra-se em: https://youtu.be/RJd7ufq_tZU. Siga o blog se o post for útil para você.