π§© Background
When using DataHub for data lineage analysis, a common issue is:
π Failure to correctly parse complex SQL (especially BigQuery procedural SQL)
As mentioned in this issue:
- DataHub issue #11654
Typical symptoms include:
- Multi-statement SQL (e.g.,
BEGIN...END) cannot be parsed - Missing or incomplete lineage
- Column-level lineage cannot be extracted
π‘ Solution
Use the sidecar tool provided by Gudu:
π gsp-datahub-sidecar
GitHub repository:
https://github.com/gudusoftware/gsp-datahub-sidecar
Capabilities
- Supports BigQuery procedural SQL
- Supports column-level lineage
- No modification required to DataHub itself
π― Validation Objective
This article verifies whether the following pipeline works:
BigQuery procedural SQL
β
gsp-datahub-sidecar
β
DataHub GMS
β
DataHub UI lineage visualization
π§ͺ Test Steps (Reproducible)
Prerequisite: You already have a running DataHub instance
(GMS default: http://localhost:8080)
Step 0: Install the Sidecar
pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git
π If you are using macOS + Homebrew Python (PEP 668 issue), recommended:
brew install pipx
pipx install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git
Verify installation:
gsp-datahub-sidecar --version
Step 1: Get Test SQL
git clone https://github.com/gudusoftware/gsp-datahub-sidecar.git
cd gsp-datahub-sidecar
Use the built-in example:
examples/bigquery_procedural.sql
Step 2: Validate SQL Parsing (Dry Run Mode)
gsp-datahub-sidecar \
--mode authenticated \
--user-id YOUR_USER_ID \
--secret-key YOUR_SECRET_KEY \
--sql-file examples/bigquery_procedural.sql \
--dry-run
Dry-run mode is for debugging only. It parses lineage for review but does NOT send data to the DataHub server.
β
Example output:
Detected procedural SQL β sending as single statement
Lineage: PROJECT.DATASET.VIEW_NAME --> TEMP_TABLE (12 columns)
Lineage: TEMP_TABLE_DELTA --> FINAL_OUTPUT (5 columns)
[DRY RUN] Would emit 10 MCPs
Step 3: Ingest into DataHub
Test with BigQuery SQL
gsp-datahub-sidecar \
--mode authenticated \
--user-id YOUR_USER_ID \
--secret-key YOUR_SECRET_KEY \
--sql-file examples/bigquery_procedural.sql
β
Example output:
Detected procedural SQL β sending as single statement
Lineage: PROJECT.DATASET.VIEW_NAME --> TEMP_TABLE (12 columns)
Lineage: TEMP_TABLE_DELTA --> FINAL_OUTPUT (5 columns)
Emitted 10 MCPs
π Verify Table-Level Lineage
- In DataHub UI, search for
temp_table, you should see:
project.dataset.view_name β temp_table
- Search for
final_output, you should see:
temp_table_delta β final_output
π¬ Verify Column-Level Lineage
You should observe column-level relationships such as:
emailidfield
mapped between temp_table_delta and final_output.
π§ͺ Test with Oracle SQL
gsp-datahub-sidecar \
--mode authenticated \
--user-id YOUR_USER_ID \
--secret-key YOUR_SECRET_KEY \
--sql-file examples/oracle_create_view.sql \
--db-vendor dbvoracle \
--datahub-server http://localhost:8080
In the UI, search for vsal, and you should see correct table and column lineage relationships.
π Test Using Self-Hosted Mode
gsp-datahub-sidecar --mode self_hosted \
--sqlflow-url http://localhost:8165/api/gspLive_backend/sqlflow/generation/sqlflow/exportFullLineageAsJson \
--user-id YOUR_USER_ID \
--secret-key YOUR_SECRET_KEY \
--sql-file examples/oracle_create_view.sql \
--db-vendor dbvoracle
β οΈ Note: You must install and run SQLFlow locally in advance.
π Validation Results
| Capability | Result |
|---|---|
| Procedural SQL Parsing | β |
| Table-Level Lineage | β |
| Column-Level Lineage | β |
| DataHub Integration | β |
| UI Visualization | β |
𧨠Key Conclusion
π gsp-datahub-sidecar effectively solves DataHub's limitations in parsing complex SQL
Especially suitable for:
- 20+ mainstream databases
- Multi-statement SQL
- Column-level lineage analysis
If your problem is:
DataHub cannot correctly parse complex SQL lineage
Then:
π This tool has been validated as a reliable solution





Top comments (0)