INTRODUCTION
For those accustomed with python for data manipulation, Pandas is an household name. It can be used to manipulate a
particular set of data until it is clean and useful for usage.
A new data manipulation framework: Polars, has been recently introduced and this new library might just be a saving grace to python users. Read and you just might find yourself switching from Pandas to Polars.
WHY SWITCH TO POLARS?
Pandas is an essential library in the field of Data Science which is primarily used in data manipulation. Although Pandas is a great library, it does comes with a certain drawback: It is very slow in processing large datasets. As such, Polars was designed to process data much faster than Pandas, making Polars a Pandas alternative.
Let's take a look at some of the similarities and differences between the Pandas and Polars code.
- Importing Data
Pandas
import pandas as pd
Polars
import polars as pl
- Reading CSV file
Pandas
df = pd.read_csv(file)
Polars
df = pl.read_csv(file)
- Memory Usage
Pandas
df.memory_usage()
Polars
df.estimate_size()
- Delete Column
Pandas
df.drop(columns=["columns"])
Polars
df.drop(name=["columns"])
- Sort
Pandas
df.sort_values("column")
Polars
df.sort("column")
- Unique values
Pandas
df.column.unique()
Polars
df.column.unique()
- Lazy Execution
Pandas
Not Supported
Polars
df.lazy()
- Filter Data
Pandas
df.df[column > 10]
Polars
df.df[column > 10]
or
df.filter(pl.col("column" >10))
CONCLUSION
Both are great libraries to use but Polars might just have the advantage as regards speed. Although most pandas users might be a little reluctant to shift over to Polars as they are well accustomed to pandas and going to Polars might just mean they will have to adjust to some of the code differences.
Top comments (0)