DEV Community

Cover image for 📄 Kedro - My Data Is Not A Table
Waylon Walker
Waylon Walker

Posted on • Originally published at waylonwalker.com

6 1

📄 Kedro - My Data Is Not A Table

In python data science/engineering most of our data is in the form of some sort of table, typically a DataFrame from a library like pandas, spark, or dask.

DataFrames are the heart of most pipelines

These containers for data contain many convenient methods to manipulate table-like data structures. Sometimes we leverage other data types, namely vanilla
types like lists and dicts, or even numpy data types.

unfamiliar with kedro, check out this post

Sometimes datasets are not tables

There are times when our data doesn't fit nicely into a DataFrame. Lucky for us Kedro has pickle support out of the box. Pickle is a way to store any python
object to disk. Beware that pickle files coming from an unknown source can run malicious code and are considered unsafe. For the most part though when you
read and write your own pickle files they are a good tool to consider.

See more about pickle from python.org.

Cataloging Pickle

I may have a dictionary that describes some cars.

{
  'truck-012-abc': {
    'type': 'truck'
    'sales': [12, 2, 3, 4, 8]
    'weight': 9024,
    'accesories': ['leather', 'audio-1']
}
Enter fullscreen mode Exit fullscreen mode

In the catalog, we will simply set the type as pickle.PickleDataSet and give it a filepath.

cars:
  filepath: data/cars.pkl
  type: pickle.PickleDataSet
Enter fullscreen mode Exit fullscreen mode

This filepath does not have to be on the local filesystem it can be on the cloud thanks to how kedro utilizes fsspec for each of its datasets.

Loading the dataset

The benefit of cataloging this dataset compared to leaving it as a MemoryDataSet is that you can easily load this data back into memory for further development or debugging without running any of the pipelines.

catalog.load('cars')
Enter fullscreen mode Exit fullscreen mode

Reinvent your career. Join DEV.

It takes one minute and is worth it for your career.

Get started

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay