TL;DR
Dive deep into Python with this cheat list featuring the only libraries any Pythonista needs to know.
From data manipulation to Machine Learning and creating web applications, these libraries are essential in your Python coding journey.
Web applications
1. Taipy
Taipy is the new kid on the block. The simplest Python app builder.
It was designed for easy development for both front-end (GUI) and your ML/Data pipeline(s).
Create the application of your dreams thanks to:
- complete customization & interactivity
- multipage & multi-user applications
- pipeline graphical editor
- and so much more!
Your support means a lot🌱, and really helps us in so many ways, like writing articles! 🙏
2. Streamlit
Streamlit is a well-established library to quickly create web applications for pilots. Very easy to use!
Essentials
3. Pandas
This library brings two core concepts, dataframes, and series, making data cleaning and preparation a painless process.
4. Numpy
While Pandas has data frames, Numpy has arrays.
They are known for allowing fast data manipulation, making Numpy an essential tool for scientific computing.
5. Requests
This library makes dealing with HTTPS requests a breeze.
Requests provides functions for interacting with web APIs and managing HTTP responses.
6. Scipy
Based on Numpy, Scipy’s core functions focus on mathematical computing with features around optimization, signal processing, and interpolation.
Date & Time
7. DateTime
DateTime is a standard Python library that is essential for dealing with any DateTime format.
8. Pendulum
Pendulum has additional features necessary for more advanced date and time handling.
They have better time zone support as well as better formatting options.
Machine Learning
9. Scikit-Learn
This library doesn’t need an introduction anymore, and rightfully so.
Scikit Learn is the reference for Machine Learning with algorithms from clustering to classification.
It also includes functions for everything from data validation to data selection.
10. XGboost
This library is well-known for its efficient results for regression and classification algorithms.
11. Catboost
Catboost is a Machine Learning library specifically designed to deal with datasets displaying mostly categorical data.
Deep Learning
12. TensorFlow
TensorFlow is a well-established deep-learning library specializing in Natural Language Processing and image classification.
13. PyTorch
Pytorch or TensorFlow, that is the question.
Ultimately, you choose your team, but PyTorch differentiates with a more significant focus on Natural Language Processing and a more Pythonic feel, reducing the learning curve known to be steep for TensorFlow.
14. Keras
Keras is a great way to start with Deep Learning as it runs on top of TensorFlow but with a simplified implementation process.
15. OpenCV
OpenCV provides various algorithms around real-time computer vision.
You can process multiple formats including objects, humans, and even handwriting.
Natural Language Processing
16. NLTK
NLTK is the go-to library for Natural Language Processing.
NLTKs' key features are: processing and manipulating text( tokenization, stemming, etc.…) and classifications with NLP tasks for sentiment analysis, for example.
17. SpaCy
Is the newer kid on the block, with a focus on making NLP more accessible and user-friendly.
The library optimized the process to guarantee greater speed and efficiency.
Testing
18. Pytest
Pytest is a framework that simplifies test writing and execution. It is user-friendly with its concise syntax.
19. Unitest
Unistest is Python’s built-in testing framework.
Its key features are: test discovery, support for fixtures, effortless organization, and management of test suites.
Audio
20. AudioFlux
The go-to library in Python for audio signal processing, but made easy.
AudioFlux has a plethora of features including sound analysis and can be used for deep learning training.
21. Librosa
This Python library allows for analyzing and extracting features from audio sources.
Code Analysis
22. Black
It is an automated code formatter.
It will format your code automatically for a consistent style throughout your projects.
23. Pylint
As the name infers, Pylint is a linter.
It is a static code analysis tool that checks for code quality and errors.
24. Flake8
It is another linting library that will check your code against the PEP8 coding convention.
25. Ruff
Ruff is the fastest option to equivalent linters.
It adds effectiveness and speed, making the process ten times faster.
Distributed Computing
26. Dask
Dask is a popular Python package for distributed computing, as it is particularly helpful in dealing with large datasets.
It is easy to use as Dask integrates Pandas, Numpy, and Scikit-learn APIs .
27. PySpark
As the name implies, PySpark is a Python API for Apache Spark and allows us to harness Spark’s capabilities directly in Python.
28. Polars
Polars is a DataFrame library created to handle and process large datasets.
It was inspired by python royalty - Pandas, but with a (fast) twist, it’s 10 to 100 times faster.
Documentation
29. Mkdocs
Mkdocs is the most accessible library to generate straightforward documentation.
Suitable for smaller projects and has almost no learning curve.
30. Sphinx
Sphinx is usually preferred for larger-scale projects.
It includes support for multiple formats and allows for specific customization.
31. Pydoc
Pydoc is integrated into the Python ecosystem. It directly generates your documentation from your modules.
Geographical data
32. Geopy
Geopys’ key features are: distance calculations, geocoding & reverse geocoding.
33. Folium
This library allows you to create interactive maps in Python. A game-changer.
34. Geopandas
The way to go when you have geospatial data.
As the title states, Geopandas is Pandas but for geospatial data. This library has functions for easy manipulation and analysis of geo-data.
Games
35. Pygame
Pygame is the go-to, straightforward library that makes creating 2D and interactive video games in Python easy.
36. Arcade
Just like PyGame, Arcade, makes creating video games a fun process in Python.
They have a more modern twist to the classical Pygame, so choosing is really based on personal preference.
Web scraping
37. Scrapy
Scrapy is a well-established library known for web scraping.
Some key features are: support for asynchronous/synchronous operations, HTTPS request handling, etc.
It has an extensive array of functionalities, which may justify the library has a steep learning curve.
38. Beautiful Soup
Beautiful Soup is all you need to deal with pulling data out of XML and HTML files.
It is appreciated by developers thanks to its Pythonic feel.
Visualizations
39. Matplotlib
Matplotlib is the main widget library in Python and for a good reason.
Matplotlib allows the plotting of 2D graphs with a wide range of chart types and also allows for significant customization.
The fine-grain control of the elements is a real advantage of this library.
40. Bokeh
Bokeh, contrary to Matplotlib, has its focus on interactive charts.
41. Seaborn
Seaborn is built on top of Matplotlib.
While Matplolib has an emphasis on preciseness and simplicity, Seaborn has real added value in their sleek visuals while creating complex statistical visualizations.
42. Vizzu
Vizzu found a niche in visualization and do it very well.
They’ve put storytelling and graphs all in one with their highly animated visualizations—a great way to have more dynamic graphs.
Conclusion
Whether you’re a senior Pythonista or dabbling with Python, with this list of indispensable libraries you will be able to undertake any challenge. Have fun coding!
I’m a rookie writer and would welcome any suggestions for improvement!
Feel free to reach out if you have any questions.
Top comments (13)
I would have grouped them into utility categories e.g.
In that way, I could present a package and its alternative. For example Pandas and Polars, Unittest and Pytest, NLTK and Spacy. Here we can see that Polars, Pytest and Spacy are tools designed to solve some issues the others had.
That's a great idea actually!
Wow! This is huge list!
No surprises on this list, but a good starting point !
Should be presented as arrays with comparative pros and cons.
thanks !
Here are a few missing great ones:
Flask — for creating simple and lightweight web apps
FastAPI — for building web APIs
Tornado — for asynchronous networking
Plotly — for interactive plotting and graphing
Pillow — for image processing and manipulation
SymPy — for symbolic mathematics and algebra
PyMongo — for working with MongoDB
love this summary! <3
Thank you!
Wow, 42 libraries to know! That's an impressive list. It seems like you could have put more but decided to stop at 42
What can I say, lucky number!
Awesome.
Now that's a GOOD LIST!
wow, this is a big one! I love how you went for 42 things to cover :).
Agree, Librosa is great