DEV Community

Cover image for ๐Ÿ™ŒTop 10 ๐Ÿ Python libraries for any ML projects ๐Ÿš€
Marine for Taipy

Posted on • Edited on

๐Ÿ™ŒTop 10 ๐Ÿ Python libraries for any ML projects ๐Ÿš€

TL;DR

In this article, Iโ€™ll give you the ultimate Python libraries for any Machine Learning project:

  • the must-know libraries for each step of the machine learning cycle - EDA, data cleaning, data engineering, modeling, etcโ€ฆ
  • all open source
  • all python

The office


Full application

1. ๐Ÿš€Taipy

Let's start by talking about something that is often overlooked- actually making your model accessible and useful.
Taipy will do just that, and bring your Machine Learning model to the next level.
It is an open-source library designed for easy development for both front-end (GUI) and your ML/Data pipeline(s). No other knowledge is required (no CSS, no nothing!). It has been designed to expedite application development, from initial prototypes to production-ready applications. It's a simple Python app builder.

Taipy illustration

Taipy ensures your ML model can move into a full-fledged pilot and application that will impress your end-users.


QueenB stars

Star โญ the Taipy repository

We're almost at 1000 stars and couldn't do this without you๐Ÿ™


EDA, Data Cleaning and Data Engineering

2.๐ŸผPandas

How to code in Python without knowing Pandas?
This library has two core data structures: dataframes and series, allowing fast and flexible data cleaning and preparation. Essential functions include:

  • Loading data
  • Reshaping dataframes
  • Basic statistics Pandas is the tool to start your Datascience project. Other concurrents are trying to surpass Pandas but are not as widely used as Dask or Polars. A good subject for a future article!

Pandas illustration


3.๐ŸŒฑNumpy

Although lower level than Pandas, Numpy is an essential tool for scientific computing and data preprocessing.
It evolves around arrays and allows for fast data manipulation and maths functions.
This library is another must-know Python library and, like Pandas is a must-have library for data-centric tasks.

Numpy illustration


4.๐Ÿ”ขStatsmodel

True to its name, this library provides functions for statistical analysis.
The array of capabilities ranges from descriptive analysis to statistical tests; it is also a great library for handling time series data, univariate and multivariate statistics, etc.

Statsmodel illustration


5.๐Ÿ‘“YData Profiling

YData Profiling facilitates the EDA step by thoroughly analyzing your data in one line of code.
The analysis includes missing value detection, correlation, and distribution analysis, etc.
This tool is very user-friendly and straightforward, making it an easy addition to your data science toolbox.

YdataP illustration


Machine Learning/ Deep Learning Algorithm

6.๐Ÿ’ผ Scikit-learn

This might be Pythonโ€™s top 3 most famous libraries, and rightfully so.

Sklearn is a reference in Machine Learning. It includes different models such as K-means clustering, regression, and classification algorithms.
It also excels in dimension reduction techniques.
Sklearn also provides data selection and validation functions. It's easy to learn/use and should be your go-to ML library during your data science journey.

Sklearn illustration


7.๐Ÿง  Keras

Keras is a high-level API that runs on top of frameworks such as TensorFlow. If starting with Neural Networks, start with Keras. It is ideal for quick implementations as it simplifies the implementation process, making it the best beginner-friendly option for Neural Network implementation.

Keras illustration


8.๐Ÿง ๐Ÿ’ชTensorFlow

This library is a must-know for Neural Network modeling. Perfect when dealing with unstructured data such as image classification or NLP (Natural Language Processing). TensorFlow is widely used in research and industries as it provides a complete API for the design and manipulation of Neural Networks. Keras (mentioned above) provides a higher-level (simpler) API (It is built on top of TensorFlow).

TF illustration


9.๐ŸŒดXGBoost

XGBoost is one of the most popular libraries regarding Machine Learning algorithms.
This gradient-boosting library is widely used in real-life use cases, particularly for tabular data.
It is a favorite among Kaggle competition winners.
This library includes regression and classification algorithms but also provides feature selection tools.

XGBoost illustration


10.๐ŸˆCatBoost

This library, standing for Categorical Boosting, is the way to go if your dataset predominantly consists of categorical data. This library will circumvent the complexity of one hot encoding, eliminating the need to preprocess categorical data. It can provide better accuracy than XGBoost when running with default parameters.

Catboost illustration


Hope you enjoyed this article!

Iโ€™m a rookie writer and would welcome any suggestions for improvement!

Rookie gif

Feel free to reach out if you have any questions.

Top comments (14)

Collapse
 
proteusiq profile image
Prayson Wilfred Daniel • Edited

Awesome! I did not know the first one. My pure ML list:

ML

I have not started with time series nor CI/CD in ML ๐Ÿ˜‹

Collapse
 
marisogo profile image
Marine • Edited

That's a great list, will definitely take time to look into some I don't know like Skrub or poniard. Thanks for sharing!

Collapse
 
guybuildingai profile image
Jeffrey Ip

Here's a bonus one: Here's a bonus one: github.com/confident-ai/deepeval

Collapse
 
randellbrianknight profile image
Randell Brian Knight

Thanks for providing this awesome list! ๐ŸŽ‰

Collapse
 
sibprogrammer profile image
Alexey Yuzhakov

Taipy link points to CatBoost )

Collapse
 
marisogo profile image
Marine

Updated, thank you!

Collapse
 
chopslip profile image
chopslip

This sounds really good, thanks for sharing!

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

Nice list! Thanks for sharing

Collapse
 
rym_michaut profile image
Rym

Hey, thanks Marine for this clear article :)

Collapse
 
nevodavid profile image
Nevo David

Great ML list!
Thank you for sharing!

Collapse
 
aleajactaest78 profile image
AleaJactaEst

Love it, thank you for your article!

Collapse
 
thaddaeustedcode profile image
thaddaeustedcode

Python is great