Analysis_And_Generation_Model_ML
BEFORE READING THIS REPOSITORY IT IS RECOMMENDED TO START FROM:
https://github.com/insidbyte/Analysis_and_processing
I have in fact decided to generate a custom vocabulary to train the model and it would be appropriate to look at the repository code.
SEE THIS REPOSITORY AT: https://github.com/insidbyte/Analysis_And_Generation_Model_ML
OPTIONS:
1)-GENERATE MODEL
2)-TEST WITH HYPERPARAMETER TUNING
3)-PLOT WITH TFIDF VECTORIZER AND SVD TRUNCATED REDUCTION
Menù
Starting the ModelsGenerator.py file from the terminal it will appear:
OPTION 1:
Model generation:
I decided to use tfidf and support vector machine because they are highly suitable for text processing and
support vector machine with the linear kernel is highly suitable for classifications based on two classes as
in our case: positive and negative
Kaggle IMDb dataset example:
I created a Client in Angular to send requests to a Python Server
CLIENT:
SERVER:
RESPONSE FROM THE SERVER:
ANOTHER EXAMPLE:
OPTION 2:
Test hyperparameters with gridsearchCV and tfidf vectorizer:
A good way to automate the test phase and save time searching for the best parameters to
generate the most accurate model possible is to use GrisearchCV made available by scikit-learn
The code in ModelsGenerator.py must be customized based on the dataset to be analyzed
WARNING !!
If we don't study the scikit-learn documentation we could start infinite analyzes
so it is always advisable to know what we are doing
Link scikit-learn: https://scikit-learn.org/
Input:
Output:
OPTION 3:
Input:
This option is experimental, the reduction is not applied to model training because it
generates too few components and RAM memory (8GB) of my PC is not enough to generate
more components even if the results are interesting!
Output:
CONCLUSION:
We got satisfactory results and generated a fairly accurate
model this repository will be updated over time
For info or collaborations contact me at: u.calice@studenti.poliba.it











Top comments (0)