In this article, you're going to learn about text classification using a popular Python framework for machine learning, Tensorflow in just a couple of lines of code.
Text classification is a subpart of natural language processing that focuses on grouping a paragraph into predefined groups based on its content, for instance classifying categories of news whether its sports, business, music and etc
In this tutorial, we learn in brief how to perform text classification using Tensorflow, you're going to learn text processing concepts such as word embedding and how to build a neural network with an embedding layer.
You will be learning all those concepts while by building a simple model to properly classify text as negative and positive reviews based on data we used to train it.
For you to successfully follow through with this tutorial, you're supposed to have the following libraries python libraries installed on your machine.
There are two approaches that you can follow when it comes to installing the setup environment for doing machine learning together with data science-based projects.
- Installing Anaconda
- Installing independently using pip
If it's your first time hearing about Anaconda, it is the toolkit that equips you to work with thousands of open-source packages and libraries. It saves the time for installing each library independently together with handling dependencies issues.
What you need to do is go to their official website at Anaconda.com and then follow the guide to download and install it on your machine depending on the Operating system you're using.
Now once dependencies have been installed together with Anaconda its time to install the TensorFlow library, Anaconda comes with its package manager known as conda.
conda create -n tf tensorflow conda activate tf
If you love handling every piece of details of yourself, then you can also install all the required python libraries just by using pip just as shown below;
pip install tensorflow pip install numpy pip install matplotlib pip install jupyter notebook
Now once everything is installed let's start building our classification model
The TensorFlow that has been using while preparing this tutorial is TensorFlow 2.0 which comes with keras already integrated into it, therefore I recommend using it or a more updated version to avoid bugs.
For convenience we usually use a jupyter notebook in training our machine learning models therefore I would you to use it too since in this article I will be showing you individual chunks of code equivalent to a single cell in a jupyter notebook
To start a jupyter notebook it just simple and straight forward it's just you have to type jupyter notebook on your terminal and then it gonna automatically open a notebook on your default browser.
import numpy as np import tensorflow as tf import matplotlib.pyplot as plt
The array below acts as features for training our model consisting of 4 positive and 4 negative short sentences and their respective labels were by 1 for positive and 0 for negative
data_x = [ 'good', 'well done', 'nice', 'Excellent', 'Bad', 'OOps I hate it deadly', 'embrassing', 'A piece of shit' ] label_x = np.array([1,1,1,1, 0,0,0,0])
One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.
Follow the below code to encode the above textual features into numerical values .
one_hot_x = [tf.keras.preprocessing.text.one_hot(d, 50) for d in data_x] print(one_hot_x) [, [9, 34], , , , [41, 26, 9, 17, 26], , [9, 41]]
As we can see after using one-hot encoding to our textual data, it has resulted in an array of different sizes.
The array of textual data require the same length to be well fitted on Machine Learning Model. Therefore we have to process it again to form an array of Identical lengths.
you can edit or change individual array length by changing the maxlen parameter, the choice of value for maxlen depends on where most of the paragraph in your training data lies
padded_x = tf.keras.preprocessing.sequence.pad_sequences(one_hot_x, maxlen=4, padding = 'post') print(padded_x)
array([[21, 0, 0, 0], [ 9, 34, 0, 0], [24, 0, 0, 0], [20, 0, 0, 0], [28, 0, 0, 0], [26, 9, 17, 26], [36, 0, 0, 0], [ 9, 41, 0, 0]], dtype=int32)
After we have already processed the training data now let's create our Sequential Model to fit our data.
model = tf.keras.models.Sequential()
Now Let's add an Embedding Layer to receive the processed textual feature
model.add(tf.keras.layers.Embedding(50, 8, input_length=4))
Add Flatten layer to flatten the features array
Finally, Let's add a dense layer with a sigmoid activation function to effectively learn the textual relationship
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding (Embedding) (None, 4, 8) 400 _________________________________________________________________ flatten (Flatten) (None, 32) 0 _________________________________________________________________ dense (Dense) (None, 1) 33 ================================================================= Total params: 433 Trainable params: 433 Non-trainable params: 0 _________________________________________________________________
Now Let's fit the Model with 1000 epochs & Visualizing the learning process
history = model.fit(padded_x, label_x, epochs=1000, batch_size=2, verbose=0) plt.plot(history.history['loss'])
Let's create a Simple function to predict new words using the model have just created, it won't be as smart since our data was really short
def predict(word): one_hot_word = [tf.keras.preprocessing.text.one_hot(word, 50)] pad_word = tf.keras.preprocessing.sequence.pad_sequences(one_hot_word, maxlen=4, padding='post') result = model.predict(pad_word) if result>0.1: print('you look positive') else: print('damn you\'re negative')
Let's test calling predict method with different word parameters
>>>predict('this tutorial is cool') you look positive >>>predict('This tutorial is bad as me ') damn you're negative
Congratulations you have successfully trained Text classifier using TensorFlow to get the Jupyter notebook guide download here. Otherwise, in case of comment, suggestion, difficulties drop it on the comment box
- 3 ways to convert text to speech in Python
- How to convert picture to sound in Python
- Build a Real-time barcode reader in Python
- How to perform Speech Recognition in Python
- How to detect emotion detection from text Python
- Make your own knowledge-based chatbot in Python
- Getting started with image processing using a pillow
- A Quick guide to twitter sentiment analysis using python
- How to detect Edges in a picture using OpenCV Canny algorithm