DEV Community

Cover image for πŸ¦† From API to scheduled offline copies with DuckDB on Kaggle ♾️
adriens
adriens

Posted on β€’ Originally published at kaggle.com

1

πŸ¦† From API to scheduled offline copies with DuckDB on Kaggle ♾️

❔ About

While I was working on endoflife.date integrations, the need for offline copy started to raise:

Offline copy of data #2530

I really like the idea but to avoid repeated calls of the API for every product I would like data on, I would like to be maintain a local copy of the data and then only download updates each time I start my application (or after a particular time period e.g. only request updates once every 24 hours)

Ideally, I would be able to get the data in JSON format which I can then manage locally.

Alternative would be to call the API for every product to get the product data for each product. But this would also require that I know all of the products in the first place which given the dynamic nature of the data isn't very attractive.

After some various attempts, I finally found a Kaggle based solution.

I wanted the data to:

  • πŸ‘ Be easy to share
  • βœ… Rely on the official API
  • πŸ” Up-to date (without any effort)
  • πŸ”— Easy to integrate with third party products
  • πŸ§‘β€πŸ”¬ Be deployed on a datacentric/datascience platform
  • πŸ€“ Show source code (Open Source)
  • πŸš€ Be easily extensible

Therefore I created a Notebook that does the following things once a week:

  1. Queries the API
  2. Load & store data in a DuckDb database
  3. Export resulting database in sql an csv
  4. Export database a Apache Parquet files

🧰 Tools

All you need is Python and DuckDB json functions:

JSON - DuckDB

DuckDB is an in-process database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bindings for C/C++, Python and R.

favicon duckdb.org

🎯 Result

As you can see, for now, the only input is the API:

Image description

... while we have fresh output files:

Image description

Image description

πŸ—£οΈ Conclusion

Finally I delivered the following solution to the community:

🎁 Weekly Scheduled offline exports on Kaggle ♾️ #2633

❔ About

Getting an easy to use offline copyof endoflife.date would be very convenient to be able to produce data-analysis.

πŸ‘‰ This issue is about using endoflife.date API to get an automated offline copy of the datas.

🎁 The Notebook

Below are the very portable outputs :

image

πŸ’° Benefits

Weekly:

πŸ”– Related resources

Image of Docusign

πŸ› οΈ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (3)

Collapse
 
adriens profile image
adriens β€’

Collapse
 
adriens profile image
adriens β€’
Collapse
 
adriens profile image
adriens β€’

Sentry image

See why 4M developers consider Sentry, β€œnot bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more