DEV Community

FlyingBanana
FlyingBanana

Posted on

PyGWalker with DuckDB is all you need for large data exploration

PyGWalker is a python library that turns your pandas dataframe into a tableau style user interface for visual analysis.

It is very convenient for data analysts to explore and visualize their dataset. For any dataframe in python, you can turn it into a visual exploration interface with only one line of code:

import pandas as pd
import pygwalker as pyg

df = pd.read_csv('your_dataset.csv')

# start explore your dataset!
walker = pyg.walk(df)
Enter fullscreen mode Exit fullscreen mode

which then you can start your analysis.

In the past, visual exploration costs time and numbers of code to implement. But with pygwalker, you do not need to google how to make visualizations, just use drag and drop operations or natural language queries to visualize your data.

In early versions of PyGWalker, there still some performance issues for handling large dataset.

However, a recent update (version 0.4.2) of pygwalker published with a new computation engine based on DuckDB. It boost its performance and allow you to explore much larger datasets.

Another test with 300M rows only cost 500ms for user's operation. Maybe in most of cases, you don't need a heavy BI systems with large computation cluster services.

A PyGWalker with DuckDB on your local devices is all you need.

Top comments (0)