Introduction
What is Flyte?
- Kubernetes-native workflow automation platform
- Open-source
- Makes it easy create concurrent, scalable, and maintainable workflows
- DLF AI & Data Incubation Project
- Opinionated, scalable & hosted workflow automating platform
- Extensible, Auditable, Observable
Integrations
Flyte supports a ton of integrations such as Hugging Face, Vaex,
Polars, Modin, BigQuery, DuckDB, Hive, etc...
This is an overall view of how many integrations they support!
You can check out all the integrations they support by clicking here
Trust by Companies
Flyte is used in production at LinkedIn, Spotify, Intel and others.
Setting Up Flyte
Note: You can skip this step and use Flyte on the browser if you don't want to download Flyte on your PC, https://sandbox.union.ai/
Requirements
- Docker
- Python
Ensure that your Docker Daemon is running
Installation
pip install flytekit flytekitplugins-deck-standard scikit-learn
Installing FlyteCTL
FlyteCTL is a command-line interface for Flyte
OSX
brew install flyteorg/homebrew-tap/flytectl
Other Operating Systems
curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin
Creating an Example Flyte Script
Just to checkout your setup works and have a bit of fun with Flyte.
Let's create an example script with flyte that:
- Trains a model on the Wine Dataset from sklearn
Here's the script, insert it into any python file
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
def get_data():
"""Get the wine dataset."""
return load_wine(as_frame=True).frame
def process_data(data):
"""Simplify the task from a 3-class to a binary classification problem."""
return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))
def train_model(data):
"""Train a model on the wine dataset."""
features = data.drop("target", axis="columns")
target = data["target"]
return LogisticRegression(max_iter=1000).fit(features, target)
def training_workflow():
"""Put all of the steps together into a single workflow."""
data = get_data()
processed_data = process_data(data)
return train_model(processed_data)
if __name__ == "__main__":
print(f"Running training_workflow() {training_workflow()}")
Running Flyte workflows
You can run the workflow in example.py on a local Python environment or a Flyte cluster.
Running a workflow using a local python env
Run this command to kickstart your newly created workflow using a python env
NOTE: Change example.py
with the filename your Python file is!
pyflyte run example.py training_workflow
Creating a Demo Flyte Cluster
Run this command to kickstart your newly created workflow using a Flyte Cluster.
flytectl demo start
Then run the workflow on the cluster with the following command:
pyflyte run --remote example.py training_workflow
If you have setup everything correctly, You should receive the following message:
Great! You have run and successfully setup Flyte in your computer
Conclusion
🎉 Congratulations! In this getting started guide, you:
- 🤓 You learned all about Flyte
- đź’» Setup Flyte in your computer
- đź“ś Created a Flyte script
- 🛥 Created a demo Flyte cluster on your local system.
- đź‘ź Ran a workflow locally and on a demo Flyte cluster.
Flyte is a great workflow automation tool for Data, Machine Learning Processes
Lastly, don't forget to leave a LIKE
and key in your feedback in the comments!
Top comments (0)