An open source platform that delivers resilience and manageability to object-storage based data lakes
What is lakeFS
lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.
With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.
lakeFS supports AWS S3 or Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.
Experimentation - try tools, upgrade versions and evaluate code changes in isolation.
Reproducibility - go back to any point of time to a consistent version of your data lake.
Continuous Data Integration
Ingest new data safely by enforcing best practices - make sure new data sources adhere to your lake’s best practices such as format and schema enforcement, naming convention, etc.
lakeFS is a data lake management platform written in Go.
This is our first hacktoberfest and we couldn't be more excited.
We created hacktoberfest issues that new contributors can easily tackle:
github.com/treeverse/lakeFS/labels...
And you're welcome to join our slack channel for help and more ideas.
treeverse / lakeFS
An open source platform that delivers resilience and manageability to object-storage based data lakes
What is lakeFS
lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.
With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.
lakeFS supports AWS S3 or Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.
For more information see the Official Documentation.
Capabilities
Development Environment for Data
Continuous Data Integration