DEV Community

Cover image for πŸ¦† From API to scheduled offline copies with DuckDB on Kaggle ♾️
adriens
adriens

Posted on β€’ Originally published at kaggle.com

1

πŸ¦† From API to scheduled offline copies with DuckDB on Kaggle ♾️

❔ About

While I was working on endoflife.date integrations, the need for offline copy started to raise:

Offline copy of data #2530

I really like the idea but to avoid repeated calls of the API for every product I would like data on, I would like to be maintain a local copy of the data and then only download updates each time I start my application (or after a particular time period e.g. only request updates once every 24 hours)

Ideally, I would be able to get the data in JSON format which I can then manage locally.

Alternative would be to call the API for every product to get the product data for each product. But this would also require that I know all of the products in the first place which given the dynamic nature of the data isn't very attractive.

After some various attempts, I finally found a Kaggle based solution.

I wanted the data to:

  • πŸ‘ Be easy to share
  • βœ… Rely on the official API
  • πŸ” Up-to date (without any effort)
  • πŸ”— Easy to integrate with third party products
  • πŸ§‘β€πŸ”¬ Be deployed on a datacentric/datascience platform
  • πŸ€“ Show source code (Open Source)
  • πŸš€ Be easily extensible

Therefore I created a Notebook that does the following things once a week:

  1. Queries the API
  2. Load & store data in a DuckDb database
  3. Export resulting database in sql an csv
  4. Export database a Apache Parquet files

🧰 Tools

All you need is Python and DuckDB json functions:

JSON - DuckDB

DuckDB is an in-process database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bindings for C/C++, Python and R.

favicon duckdb.org

🎯 Result

As you can see, for now, the only input is the API:

Image description

... while we have fresh output files:

Image description

Image description

πŸ—£οΈ Conclusion

Finally I delivered the following solution to the community:

🎁 Weekly Scheduled offline exports on Kaggle ♾️ #2633

❔ About

Getting an easy to use offline copyof endoflife.date would be very convenient to be able to produce data-analysis.

πŸ‘‰ This issue is about using endoflife.date API to get an automated offline copy of the datas.

🎁 The Notebook

Below are the very portable outputs :

image

πŸ’° Benefits

Weekly:

πŸ”– Related resources

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more β†’

Top comments (3)

Collapse
 
adriens profile image
adriens β€’

Collapse
 
adriens profile image
adriens β€’
Collapse
 
adriens profile image
adriens β€’

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs