The Basics: Training Your First Model ✨🎨
Welcome to this Tuto where you will train your first Machine Learning model. We will try to keep things simpler here, and we will only provide basic concepts.
The problem we will solve is to convert from Celsius to Fahrenheit, where the approximate formula is:
$$ F = C\times 1.8 + 32 $$
Notice: it would be simple enough to create a simple Python function that directly performs this calculation (Traditional Software Development) :
def FtoC(C):
F = C*1.8 + 32
return F
, but that wouldn't be machine learning. The main goal of this Notebook is show the main difference between the tow approachs : ML & Traditional Software Development ( you can find more in this link)
Let's start : So for build our Ml's model, we will give TensorFlow some sample Celsius values (0, 8, 15, 22, 38) ( called Input Data) and their corresponding Fahrenheit values (32, 46, 59, 72, 100) (Called Output Data).
Then, we will train ( with the Training Dataset ) a model that figures out the above formula through the training process.
Import Packages
First and to keep things so simple, we import TensorFlow as tf
for ease of use.
Next, import NumPy as np
: Numpy helps us to represent our data as highly performant lists.
from __future__ import absolute_import, division, print_function
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
import numpy as np
Set up training data
As we knew, supervised machine learning essentially consists of looking for a performance algorithm from a set of inputs and outputs.
As the objective of this Tuto is to create a model that can convert temperature in degrees Fahrenhet to degrees Celsius, we should create two lists celsius_q and fahrenheit_a that we can use to build our model.
celsius_q = np.array([-40, -10, 0, 8, 15, 22, 38], dtype=float)
fahrenheit_a = np.array([-40, 14, 32, 46, 59, 72, 100], dtype=float)
for i,c in enumerate(celsius_q):
print("{} degrees Celsius = {} degrees Fahrenhet".format(c, fahrenheit_a[i]))
Some IMPORTANT Machine Learning terminology
Feature : The inputs to our model. In our case, a single value : the degrees in Celsius.
Labels : The output of our model predicts. In our case, a single value : the degrees in Fahrenhet.
Example : A pair of inputs/outputs used during training. In our case a pair of values from
celsius_q
andfahrenhet_a
to a particular pointer, such as(38,100)
.
Create the model
Now we will create the model. We will use the simplest model called Dense network : This kind of model will require only a single layer, with a single neuron ( Since the problem is so simple )
Build a layer : l0
We'll call the layer l0
and create it by this function tf.keras.layers.Dense
with the following configuration:
input_shape=[1]
: This specifies that the entry in this layer is a single value. That is, the shape is a one-dimensional array with a member. Since this is the first (and only) layer, this input form is the input form of the entire model. The unique value is a floating point number representing degrees Celsius.units=1
: This determines the number of neurons in the class. The number of neurons determines how many internal variables the class should attempt to learn how to solve the problem (later). Since this is the last layer, it is also the output size of the model: a single float value that represents a degree of Fahrenheit.
l0 = tf.keras.layers.Dense(units=1, input_shape=[1])
Assemble layers into the model
After defined our layers, we need to group these layers to create the model. The **Sequential model ** definition takes a list of layers as argument, specifying the calculation order from the input to the output.
This model has just a single layer, l0
model = tf.keras.Sequential([l0])
Note
We can define our layers inside the model definition as shown below :
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
Compile the model, with loss and optimizer functions
After defining and before training, the model has to be compiled.
Once compiled for the training, the model is given:
Loss function : A way to measure the distance between forecasts and the desired result. (The measured difference is called "loss").
Optimizer function : A way of adjusting internal values in order to minimize the loss.
model.compile(loss='mean_squared_error',
optimizer=tf.keras.optimizers.Adam(0.1))
These are used during training (model.fit()
) to first calculate the loss at each point, and then improve it.
During training, the optimizer function is used to calculate adjustments to the model's internal variables. The goal is to adjust the internal variables until the model (which is really a math function) mirrors the actual equation for converting Celsius to Fahrenheit.
What is useful to know about these parameters are:
The loss function (mean squared error) and the optimizer (Adam) used here are standard for simple models like this one, but many others are available. It is not important to know how these specific functions work at this point.
Note Very Important : One part of the Optimizer you may need to think about when building your own models is the learnign rate (0.1
in the code above). This is the step size taken when adjusting values in the model. If the value is too small, it will take too many iterations to train the model. Too large, and accuracy goes down. Finding a good value often involves some trial and error, but the range is usually within 0.001 (default), and 0.1
Train the model
Train the model by calling the fit
method.
During training, the model takes the Input Data : values in degrees Celsius, performs a calculation using the current internal variables (called "weights"), and generates values that are supposed to be the equivalent in Fahrenheit.
Since the weights are initially randomly defined, the output will not be close to the correct value. The difference between the actual output and the desired output is calculated using the loss function (mean squared error), and the optimization function indicates how the weights should be adjusted.
This cycle is controlled by calculation, comparison and modification in a fit
method. The first argument is the input data, and the second argument is the desired output. The epochs
argument specifies the number of times this session should be run, and the verbose modulus controls the amount of output produced by the method.
history = model.fit(celsius_q, fahrenheit_a, epochs=500, verbose=False)
print("Finished training the model")
Display training statistics
The fit
method returns a history object. We can use this object to plot how the loss of our model goes down after each training epoch.
P.S : A high loss means that the Fahrenheit degrees the model predicts is far from the corresponding value in fahrenheit_a
.
We'll use Matplotlib to visualize this .
As we can see, our model improves very quickly at the beginning, then progresses slowly and gradually until it is almost "perfect" towards the end.
import matplotlib.pyplot as plt
plt.xlabel('Epoch Number')
plt.ylabel("Loss Magnitude")
plt.plot(history.history['loss'])
Use the model to predict values
Now we have a model that has been trained to detect the relationshop between celsius_q
and fahrenheit_a
.
So we can use the prediction method to make it calculate degrees Fahrenheit to previously unknown degrees.
So, for example, if the Celsius value is 100, what do you think the Fahrenheit result will be? (Take a guess before you run this following code ).
print(model.predict([100.0]))
The correct answer is $$100 \times 1.8 + 32 = 212$$. So our model is doing really well.
To review
- We created a model with a Dense layer (Only One Layer )
- We trained it with 3500 (7*500) examples (with : 7 pairs, over 500 epochs).
Our model modified the variables (weight) of the dense layer until it was able to return the correct Fahrenheit value to any Celius value. (Remember that 100 ° C was not part of our training data, it can be called a Test dataset ).
Looking at the layer weights
Finally, let's print the internal variables of the Dense layer using the get_weights()
method.
print("These are the layer variables: {}".format(l0.get_weights()))
The first variable is close to 1.8 and the second to 32. These values (1.8 and 32) are the actual variables in the real conversion formula.
Since the form is the same, the variables should converge on the standard values of 1.8 and 32, which is exactly what happened.
With additional neurons, additional inputs, and additional outputs, the formula becomes much more complex, but the idea is the same.
Just for fun, what if we created more Dense layers with different units, which therefore also has more variables? (Show below )
l0 = tf.keras.layers.Dense(units=4, input_shape=[1])
l1 = tf.keras.layers.Dense(units=4)
l2 = tf.keras.layers.Dense(units=1)
model = tf.keras.Sequential([l0, l1, l2])
model.compile(loss='mean_squared_error', optimizer=tf.keras.optimizers.Adam(0.1))
model.fit(celsius_q, fahrenheit_a, epochs=500, verbose=False)
print("Finished training the model")
print(model.predict([100.0]))
print("Model predicts that 100 degrees Celsius is: {} degrees Fahrenheit".format(model.predict([100.0])))
print("These are the l0 variables: {}".format(l0.get_weights()))
print("These are the l1 variables: {}".format(l1.get_weights()))
print("These are the l2 variables: {}".format(l2.get_weights()))
As we can see, this model is also able to predict the corresponding Fahrenheit value really well. But when you look at the variables (weights) in the l0
and l1
layers, they are nothing even close to ~1.8 and ~32. The added complexity hides the "simple" form of the conversion equation.
Thanks for your attention
I wish that you enjoyed this Notebook 👏✌.
Top comments (0)