Machine Learning with Simple Text Messages

#machinelearning #python

You can classify text data automatically. Say you collect messages about cooking and messages about programming.

A machine learning algorithm can then decide which group (cooking, programming) a new message belongs to.

If you have a list of messages like this:

#!/usr/bin/python3
data = [ "Help me impress the girl of my dreams!", 
         "How do you measure ingredients like butter in cups?",
         "Tips on making fried rice", 
         "immutability in javascript. It has a declarative approach of programming, which means that you focus on describing what your program must accomplish", 
         "Facing a Programming Problem. Everybody has encountered it, the programming problem that makes NO sense. This problem has no fix, it just cannot be done",
         " 5 Uses for the Spread Operator. The spread operator is a favorite of JavaScript developers. It's a powerful piece of syntax that has numerous applications."]

Where each message belongs to a class

target = [ 0,0,0,1,1,1 ]

You can predict the class for a new message:

#!/usr/bin/python3
sentence = input("Enter some text: ")
sentence_x = transfer.transform([sentence])
y_predict = estimator.predict(sentence_x)
print("y_predict:\n", y_predict)

Give it a spin:

Enter some text: im a cook
y_predict: [0]

Another run:

Enter some text: programming javascript is great
y_predict: [1]

The program

The data set we have defined is extremely small (6 samples). The more samples you have, the better it becomes.

#!/usr/bin/python3
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split

def text_classify():
    data = [ "Help me impress the girl of my dreams!", 
             "How do you measure ingredients like butter in cups?",
             "Tips on making fried rice", 
             "immutability in javascript. It has a declarative approach of programming, which means that you focus on describing what your program must accomplish", 
             "Facing a Programming Problem. Everybody has encountered it, the programming problem that makes NO sense. This problem has no fix, it just cannot be done",
             " 5 Uses for the Spread Operator. The spread operator is a favorite of JavaScript developers. It's a powerful piece of syntax that has numerous applications."]
    target = [ 0,0,0,1,1,1 ]
    x_train, x_test, y_train, y_test = train_test_split(data,target)

    transfer = TfidfVectorizer()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = MultinomialNB()
    estimator.fit(x_train,y_train)

    score = estimator.score(x_test, y_test)
    print("score：\n", score)

    sentence = input("Enter some text: ")
    sentence_x = transfer.transform([sentence])
    y_predict = estimator.predict(sentence_x)
    print("y_predict: ", y_predict)

    return None

text_classify()