After the recent announcement on the Databricks blog about querying your Delta Lake natively with Python (and other languages) without Apache Spark, I got curious about how a Flask API endpoint would look like, so here it is.
from deltalake import DeltaTable
from flask import request, jsonify
app = flask.Flask(__name__)
app.config["DEBUG"] = True
@app.route('/read-delta-table', methods=['GET'])
def home():
dt = DeltaTable("/tmp/delta/students-delta-table/")
pd = dt.to_pyarrow_dataset().to_table().to_pandas()
json_str = pd.to_json(orient = "records")
parsed = json.loads(json_str)
return jsonify(parsed)
app.run()
Running the API
Pre-requisites:
-
In order to compile the code you need to use the nightly version of rust
[to install]
$ rustup toolchain install nightly
[to use]
$ cd ~/projects/needs-nightly
$ rustup override set nightly
-
You need to use maturin package to build the .whl
$ pip install maturin
$ maturin build
This is still an experimental interface to Delta Lake for Rust with native bindings for Python so proceed with caution, you wouldn't want to expose an ocean of data through an endpoint so proceed with caution.
I'm excited about this project, being able to query delta tables from front-end apps (not via Apache Spark) was a missing piece in the puzzle of delta lakes.
Fantastic effort by the delta-rs contributors:
Delta-rs Git repo is here
Top comments (0)