I have been participating in Data Engineering Accelerated Program for 3 days and this is overview I perceived from program. I skipped technical aspect like coding and pipeline, another article would be better :)
Program is undergoing and the rest 5 days are waiting for me!
History of data engineering
Organization would like to utilize data such as Dashboard. To compute dashbaord, operator need to pull data from sources (on the left one), clean it, and put to dashboard for analytic user (right one)
When data source grow bigger. Manual operator is not capable. Data Engineering is come to automate the task.
Extract, Transform, Load (ETL)
"We really know how consumer use data."
Pull data from source, clean and trasnform, save in storage.
Extract, Load, Transform (ELT)
"let consumer choose"
Pull data, store in database, then let consumer choose which data they want to use and extract from our source.
It speed up time to market at some use-cases.
Whole process of data engineering in organization.
Data Engineer with ETL/ELT.
Data engineer write script (python, scala) to automate
1) pulling data from source,
2) cleaning them
3) loading them to database.
Then dashboard will use data to visualize to dashboard.
Data Warehouse and Data Lake.
Data werehouse is type of database that is optimized to store, search, query huge amount of structure, semi-structure, and (sometime) unstructure data.
Typical dashboard can only show fixed data to user. To acquire insightful one to help making decision, We need data scientist. Somehow current pipeline data can't give proper data to data scientist to explore.
We need place that store raw data, without preprocessing, and able to pull by data scientist. Its name is [[Data Lake]].
OLTP & OLAP
Online Transactional Processing
is focusing on transaction in database. It's read, write, update frequently. So, it heavily rely on fast processing.
Usecase:
- Banking
- Shopping
- Retail scanning
Online Analyltical Processing
on the other hand, focus on large volumn, high dimensional data from data warehouse.
Usecase:
- Data analytic purpose.
- Machine learning
_I think those are differentiated by database architecture.
Culture and Organization shifting.
Data democratization
Core concept: enable everyone in organization access to data. It impact intention of people, culture of organization, and tool for everyone to access.
Reference
- Content: https://data-derp.github.io/
- Image & Content: https://youtu.be/qWru-b6m030
Top comments (0)