I have created a list of useful python packages for data science.

r0f1 / datascience

Curated list of python packages and tutorials for data science.

Data Science Awesome


pandas | Data structures built on top of numpy.
scikit-learn | Core ML library.
matplotlib | Plotting library.
seaborn | Python data visualization library based on matplotlib.
pandas_summary | Basic statistics using DataFrameSummary(df).summary().
pandas_profiling | Descriptive statistics using ProfileReport.
sklearn_pandas | Helpful DataFrameMapper class.
janitor | Clean messy column names.
missingno | Missing data visualization.

Pandas and Jupyter

General ticks: link
nteract | Open Jupyter Notebooks with doubleclick.
modin | Parallelization library for faster pandas DataFrame.
xarray | Extends pandas to n-dimensional arrays.
blackcellmagic | Code formatting for jupyter notebooks.
pivottablejs | Drag n drop Pivot Tables and Charts for jupyter notebooks.
qgrid | Pandas DataFrame sorting.
nbdime | Diff two notebook files, Alternative Github App: ReviewNB.


textract | Extract text from any document.

Big Data

spark | DataFrame for big data.
spark cheatsheet
dask | Pandas DataFrame for big data…

Sometimes, I have also linked to Youtube Talks, other Github Repos that contain short examples, etc.

Want to contribute? Let me know.

Short examples are great in this space. Appreciate the list.

