Testing data pipelines is crucial for ensuring their efficiency and reliability. When dealing with data sources like BigQuery external tables, it becomes necessary to emulate these data sources to perform thorough pipeline testing.
So idea is to create external CSV table using Python's BigQuery library.
For additional context, we treat empty value as "" and null
as \N
.
Unfortunately, this library does not allow you to specify null_marker
for CSV files while creating an external CSV table using ExternalConfig
The null_marker
option is not presented in CsvOptions
, in contrast to, for example, skip_leading_rows
.
external_config = ExternalConfig(ExternalSourceFormat.CSV)
external_config.options.skip_leading_rows = 1
external_config.options.null_marker = '\\N' # does not work
Simultaneously, it's important to note that while a Data Definition Language (DDL) statement that includes the null_marker
option may work successfully when executed directly in the BigQuery console, it might not function as expected when submitted using the Python client.
CREATE
OR REPLACE EXTERNAL TABLE project_name.dataset_name.table_name (
`field1` STRING,
`field2` STRING,
) OPTIONS (
uris = ['gs://bucket_name/prefix_name/file_name.csv'],
format = CSV,
skip_leading_rows = 1,
null_marker = '\\N'
)
If you execute this query in the BigQuery console, your table will be created successfully and ready for querying. However, if you attempt to submit the same query using the Python library, you will encounter the following exception:
>>> bq_client.query(query=create_table_ddl).result()
google.api_core.exceptions.BadRequest: 400 Syntax error: Illegal escape sequence: \N
Even when attempting to shield the \\\\N
sequence, it may not produce the expected outcome. The actual null values are not processed as intended; instead, they remain as \\N
rather than being interpreted as null.
If you have insights or solutions regarding the challenge of handling null values in emulated tables, kindly share them in the comments section. This will help improve the guide and provide a comprehensive solution for others facing a similar issue.
Top comments (0)