Data Pipeline Techniques in Action

Sohil Shah — Sun, 25 Aug 2024 17:40:11 +0000

Take a deep dive into the architectural concepts of data pipelines along with a hands-on tutorial for implementation, demonstrating the concepts in action.

The topics covered are:

Data pipeline architecture
High-scale data ingestion
Data transformation and processing
Data storage
Staging data delivery
Operational data
Hands-on exercise

Article : https://dzone.com/articles/data-pipeline-techniques-in-action

Open Source High-Scale Data Pipeline Platform for Enterprise Data, Analytics, and Machine Learning Applications

Sohil Shah — Sat, 13 Apr 2024 22:41:21 +0000

Braineous is designed for an optimal out-of-the-box experience for developers focused on ETL, ELT, Analytics and Machine Learning.

Documentation: https://bugsbunnyshah.github.io/braineous/guides/developer-guide

Get Started: https://bugsbunnyshah.github.io/braineous/get-started/

GitHub: https://github.com/bugsbunnyshah/braineous_dataplatform

License: https://github.com/bugsbunnyshah/braineous_dataplatform/blob/main/LICENSE

Roadmap: https://bugsbunnyshah.github.io/braineous/about/

Apache Kafka is the backbone for high scale data ingestion and maintenance of source of data truth and in the future for CDC and time travel for a system in the past and training AI models for predictive analytics.

More details: https://bugsbunnyshah.github.io/braineous/container-first/

The downstream engine is Apache Flink. If Apache Flink is the brain, then Apache Kafka is the spinal chord. A biological analogy.

More details: https://bugsbunnyshah.github.io/braineous/about/

Braineous is built on Apache Flink as its data processing engine and supports Apache Hive based data lakes.

Future releases of Braineous will include a Data Lake Connector framework that can support custom data lakes.

More details: https://bugsbunnyshah.github.io/braineous/data-lake/

Braineous bridges the unstructured dataset to the structured dataset on the fly. Your data lake evolves with the dataset. Analytics and Machine Learning need structured queries for training the AI model.

Braineous bridges two Worlds on the fly. Downtime is a time that is entirely unacceptable for Braineous.

More details: https://bugsbunnyshah.github.io/braineous/developer-joy/

We would love your feedback when it comes to developer experience, ease of use, and ability to go from 0 to 60 in 15 minutes when it comes to data processing.

Developer input would be valuable to shape the roadmap.

Feedback: https://github.com/bugsbunnyshah/braineous_dataplatform/discussions/16

Thanks

Sohil

DEV Community: Sohil Shah

Data Pipeline Techniques in Action

Open Source High-Scale Data Pipeline Platform for Enterprise Data, Analytics, and Machine Learning Applications