You can classify text data automatically. Say you collect messages about cooking and messages about programming.
A machine learning algorithm can then decide which group (cooking, programming) a new message belongs to.
If you have a list of messages like this:
#!/usr/bin/python3
data = [ "Help me impress the girl of my dreams!",
"How do you measure ingredients like butter in cups?",
"Tips on making fried rice",
"immutability in javascript. It has a declarative approach of programming, which means that you focus on describing what your program must accomplish",
"Facing a Programming Problem. Everybody has encountered it, the programming problem that makes NO sense. This problem has no fix, it just cannot be done",
" 5 Uses for the Spread Operator. The spread operator is a favorite of JavaScript developers. It's a powerful piece of syntax that has numerous applications."]
Where each message belongs to a class
target = [ 0,0,0,1,1,1 ]
You can predict the class for a new message:
#!/usr/bin/python3
sentence = input("Enter some text: ")
sentence_x = transfer.transform([sentence])
y_predict = estimator.predict(sentence_x)
print("y_predict:\n", y_predict)
Give it a spin:
Enter some text: im a cook
y_predict: [0]
Another run:
Enter some text: programming javascript is great
y_predict: [1]
The program
The data set we have defined is extremely small (6 samples). The more samples you have, the better it becomes.
#!/usr/bin/python3
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
def text_classify():
data = [ "Help me impress the girl of my dreams!",
"How do you measure ingredients like butter in cups?",
"Tips on making fried rice",
"immutability in javascript. It has a declarative approach of programming, which means that you focus on describing what your program must accomplish",
"Facing a Programming Problem. Everybody has encountered it, the programming problem that makes NO sense. This problem has no fix, it just cannot be done",
" 5 Uses for the Spread Operator. The spread operator is a favorite of JavaScript developers. It's a powerful piece of syntax that has numerous applications."]
target = [ 0,0,0,1,1,1 ]
x_train, x_test, y_train, y_test = train_test_split(data,target)
transfer = TfidfVectorizer()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
estimator = MultinomialNB()
estimator.fit(x_train,y_train)
score = estimator.score(x_test, y_test)
print("score:\n", score)
sentence = input("Enter some text: ")
sentence_x = transfer.transform([sentence])
y_predict = estimator.predict(sentence_x)
print("y_predict: ", y_predict)
return None
text_classify()
Related links:
Top comments (1)
Nice! appreciate your time
thank you!