I have been working with ADF in production for 5 years. Most irritating errors are not in Pipeline monitoring dashboards we built , not in production review calls , but raised by user . Something is wrong but no trace in the pipeline on what tripped the system to get this bug . Here are 3 of those irritating issue which i wish i caught before any user raises concern on integrity of the data we are showing.
The Decimal ID That Broke Your Fact Table Join
ID Column from the Source is coming with decimals i.e 1453.0 , 1238.0 etc because source view changed the formatting , The Same ID Column in other Dimension Data Load is still integer , Now your Fact Table Cannot Join with Dimension Table Causing Sudden Failure of the Dashboard numbers , No ADF Pipeline Notification , but now you need to answer to Management on why numbers are wrong.
ADF Does not Catch it as there is no issue in data type , because to handle nulls you have made all your landing table columns as varchar which accepts integer as well as decimal numbers
Data Quality Rules should be integrated into ETL Data Load , which will detect the column types and notify any change .
The Child Pipeline That Failed Quietly While the Parent Celebrated
A Pipeline has multiple child pipelines , each to load into different Target Tables , One of the Dimensional Load Failed but not passed back to Parent Pipeline causing the Parent Pipeline Status to be Green.
Parent Pipeline has multiple Child Pipelines , But the Child Pipelines are added as "Completion" instead of "On Success" for next activity , so the Entire Pipeline ( Parent Pipeline ) Shows Success.
Running Some of the Child Pipelines in Parallel should be fine , but if any of the failure of this pipeline should be passed back to parent efficiently . And If a Fact Load is dependent on Dimension Load child Pipeline these two should be tied up in sequence with "On Success" , Instead of "Completion"
The Invoice Date From 1900 That Confused Everyone
No ADF Failures , No Notifications , but the Dashboard Breaks because of future date or Invoice Creation Date is in 1900.
Many of the Entries in ERP is still manual , and Often we see in Production Incorrect Dates are coming from ERP . Causing the Users of your dashboard confused.
Having Data Quality Checks baked into ETL is the solution in handling these kind of issues . Any Invoice Future Dated should be flagged and verified with Source and Business Processing Team before we load it into Production Database . Often these entries are corrected in the source in the span of Days , But Data Engineering Team should not be showing wrong entries for a week .
These are 3 of the 12 production failure patterns I have documented from real ADF work. I am running a free live session covering the most dangerous ones this Tuesday 8:30 PM IST / 11 AM EST. Link here: luma.com/84fdillp
Top comments (0)