DEV Community

Umberto Calice
Umberto Calice

Posted on

Analysis And Generation Model ML

Analysis_And_Generation_Model_ML

BEFORE READING THIS REPOSITORY IT IS RECOMMENDED TO START FROM:

https://github.com/insidbyte/Analysis_and_processing

I have in fact decided to generate a custom vocabulary to train the model and it would be appropriate to look at the repository code.


SEE THIS REPOSITORY AT: https://github.com/insidbyte/Analysis_And_Generation_Model_ML


OPTIONS:

1)-GENERATE MODEL

2)-TEST WITH HYPERPARAMETER TUNING

3)-PLOT WITH TFIDF VECTORIZER AND SVD TRUNCATED REDUCTION

Enter fullscreen mode Exit fullscreen mode




Menù

Starting the ModelsGenerator.py file from the terminal it will appear:

Image description

OPTION 1:

Model generation:

I decided to use tfidf and support vector machine because they are highly suitable for text processing and
support vector machine with the linear kernel is highly suitable for classifications based on two classes as
in our case: positive and negative

Image description

Kaggle IMDb dataset example:

Image description

I created a Client in Angular to send requests to a Python Server

CLIENT:

Image description

SERVER:

Image description

RESPONSE FROM THE SERVER:

Image description

ANOTHER EXAMPLE:

Image description

Image description

OPTION 2:

Test hyperparameters with gridsearchCV and tfidf vectorizer:

A good way to automate the test phase and save time searching for the best parameters to
generate the most accurate model possible is to use GrisearchCV made available by scikit-learn
The code in ModelsGenerator.py must be customized based on the dataset to be analyzed

WARNING !!

If we don't study the scikit-learn documentation we could start infinite analyzes

so it is always advisable to know what we are doing

Link scikit-learn: https://scikit-learn.org/

Input:

Image description

Output:

Image description

OPTION 3:

Input:

Image description

This option is experimental, the reduction is not applied to model training because it
generates too few components and RAM memory (8GB) of my PC is not enough to generate
more components even if the results are interesting!

Output:

Image description

CONCLUSION:

We got satisfactory results and generated a fairly accurate
model this repository will be updated over time

For info or collaborations contact me at: u.calice@studenti.poliba.it

Top comments (0)