Pawan Vashishtha

Posted on Mar 8

Data Engineers: What If Your BigQuery Function Could Return Multiple Tables?

#database #dataengineering #googlecloud #sql

When working with Google BigQuery, one limitation often catches many engineers off guard:

A table function can return only a single table with a predefined schema.

However, in real-world data pipelines, the output is rarely just one dataset. A typical pipeline may generate several types of results, such as:

Cleaned or transformed data
Validation errors
Pipeline metrics
Execution logs

This raises an important question:

What should you do if your function logically produces multiple outputs?

Many engineers assume the only option is to use multiple queries or stored procedures to handle each output separately.

But there’s actually a clever workaround that allows you to structure multiple outputs within a single function.

Let’s take a closer look at how this can be achieved. 🚀

The Problem

Suppose you are building a data pipeline that processes raw data and produces:

Final transformed data
Validation errors
Pipeline execution metrics

Ideally, a function would return multiple result sets.
However, BigQuery table functions can only return one result table.
So how do we design around this limitation?

The Multi-Output Function Pattern

The trick is to return a single STRUCT that contains multiple arrays.
Each array represents a logical dataset (or “table”).

CREATE OR REPLACE TABLE FUNCTION dataset.pipeline_function()
AS (
  SELECT
    STRUCT(

      -- Main data output
      ARRAY(
        SELECT STRUCT(id, name)
        FROM dataset.source_table
      ) AS data_rows,

      -- Validation errors
      ARRAY(
        SELECT STRUCT(id, error_message)
        FROM dataset.validation_errors
      ) AS validation_errors,

      -- Pipeline metrics
      STRUCT(
        CURRENT_TIMESTAMP() AS run_time,
        (SELECT COUNT(*) FROM dataset.source_table) AS total_rows
      ) AS metrics

    ) AS result
);

Now the function returns one column (result), but inside it contains multiple datasets.

Running the Function

SELECT *
FROM dataset.pipeline_function();

Output structure:

BigQuery returns one result, we now have multiple logical outputs.

result
 ├── data_rows ARRAY<STRUCT<id INT64, name STRING>>
 ├── validation_errors ARRAY<STRUCT<id INT64, error_message STRING>>
 └── metrics STRUCT<run_time TIMESTAMP, total_rows INT64>

Splitting the Outputs

Now, we extract each dataset using UNNEST.
Insert main data

INSERT INTO dataset.final_table
SELECT
  d.id,
  d.name
FROM dataset.pipeline_function(),
UNNEST(result.data_rows) AS d;

Insert validation errors

INSERT INTO dataset.error_table
SELECT
  e.id,
  e.error_message
FROM dataset.pipeline_function(),
UNNEST(result.validation_errors) AS e;

Store pipeline metrics

INSERT INTO dataset.pipeline_metrics
SELECT
  result.metrics.run_time,
  result.metrics.total_rows
FROM dataset.pipeline_function();

Important Performance Tip

If your function contains expensive logic, avoid calling it multiple times. Instead, store the result once:

CREATE TEMP TABLE temp_result AS
SELECT *
FROM dataset.pipeline_function();

Then extract outputs from temp_result.
This prevents the function from executing repeatedly.

Why This Pattern Is Useful

This design enables self-monitoring pipelines.
Your function can return:

Data
Validation errors
Metrics
Logs
Metadata

Then the pipeline splits them into separate tables.

JSON vs STRUCT (Important)

Some engineers try to return JSON instead.
However, for large datasets:

STRUCT + ARRAY is the better choice. Why?

BigQuery natively optimizes nested STRUCT types
JSON requires parsing
STRUCT improves query performance

Final Thoughts

BigQuery table functions may seem limited at first.
But with the STRUCT + ARRAY pattern, you can design functions that behave like they return multiple tables.
This small trick unlocks powerful pipeline patterns.
Sometimes the best engineering solutions come from working around platform constraints creatively.

💬 Question for data engineers:

How do you handle multi-output logic in BigQuery pipelines?

DEV Community