DEV Community

Dang Hoang Nhu Nguyen
Dang Hoang Nhu Nguyen

Posted on

[BTY] Day 2: Fancy packages to work with Dataframe

2 packages I want to mention is pandas-profiling and Mito.

pandas-profiling

It will generate profile reports from a pandas DataFrame. It's more powerful than the default df.describe(). These are statistics presented in an interactive HTML report:

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
  • File and Image analysis extract file sizes, creation dates, and dimensions and scan for truncated images or those containing EXIF information.

Mito

The main functionalities are exploring, transforming, and presenting your data with the ease of Excel. All without leaving Jupyter (see the video demo). It's so easy to use and it really makes "advanced data analysis accessible to all."

Check it out!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)