<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pawan Vashishtha</title>
    <description>The latest articles on DEV Community by Pawan Vashishtha (@pawan_vashishtha_0b2d8926).</description>
    <link>https://dev.to/pawan_vashishtha_0b2d8926</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3812956%2F73a79939-218b-4835-9637-922aebe49b96.jpg</url>
      <title>DEV Community: Pawan Vashishtha</title>
      <link>https://dev.to/pawan_vashishtha_0b2d8926</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pawan_vashishtha_0b2d8926"/>
    <language>en</language>
    <item>
      <title>Data Engineers: What If Your BigQuery Function Could Return Multiple Tables?</title>
      <dc:creator>Pawan Vashishtha</dc:creator>
      <pubDate>Sun, 08 Mar 2026 14:28:38 +0000</pubDate>
      <link>https://dev.to/pawan_vashishtha_0b2d8926/data-engineers-what-if-your-bigquery-function-could-return-multiple-tables-1004</link>
      <guid>https://dev.to/pawan_vashishtha_0b2d8926/data-engineers-what-if-your-bigquery-function-could-return-multiple-tables-1004</guid>
      <description>&lt;p&gt;When working with Google BigQuery, one limitation often catches many engineers off guard:&lt;/p&gt;

&lt;p&gt;A table function can return only a single table with a predefined schema.&lt;/p&gt;

&lt;p&gt;However, in real-world data pipelines, the output is rarely just one dataset. A typical pipeline may generate several types of results, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cleaned or transformed data&lt;/li&gt;
&lt;li&gt;Validation errors&lt;/li&gt;
&lt;li&gt;Pipeline metrics&lt;/li&gt;
&lt;li&gt;Execution logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This raises an important question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What should you do if your function logically produces multiple outputs?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Many engineers assume the only option is to use multiple queries or stored procedures to handle each output separately.&lt;/p&gt;

&lt;p&gt;But there’s actually a clever workaround that allows you to structure multiple outputs within a single function.&lt;/p&gt;

&lt;p&gt;Let’s take a closer look at how this can be achieved. 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Suppose you are building a data pipeline that processes raw data and produces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Final transformed data&lt;/li&gt;
&lt;li&gt;Validation errors&lt;/li&gt;
&lt;li&gt;Pipeline execution metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Ideally, a function would return multiple result sets.&lt;br&gt;
However, BigQuery table functions can only return one result table.&lt;br&gt;
So how do we design around this limitation?&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;The Multi-Output Function Pattern&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The trick is to return a single STRUCT that contains multiple arrays.&lt;br&gt;
Each array represents a logical dataset (or “table”).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE OR REPLACE TABLE FUNCTION dataset.pipeline_function()
AS (
  SELECT
    STRUCT(

      -- Main data output
      ARRAY(
        SELECT STRUCT(id, name)
        FROM dataset.source_table
      ) AS data_rows,

      -- Validation errors
      ARRAY(
        SELECT STRUCT(id, error_message)
        FROM dataset.validation_errors
      ) AS validation_errors,

      -- Pipeline metrics
      STRUCT(
        CURRENT_TIMESTAMP() AS run_time,
        (SELECT COUNT(*) FROM dataset.source_table) AS total_rows
      ) AS metrics

    ) AS result
);

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the function returns one column (result), but inside it contains multiple datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running the Function&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT *
FROM dataset.pipeline_function();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output structure:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BigQuery returns one result, we now have multiple logical outputs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result
 ├── data_rows ARRAY&amp;lt;STRUCT&amp;lt;id INT64, name STRING&amp;gt;&amp;gt;
 ├── validation_errors ARRAY&amp;lt;STRUCT&amp;lt;id INT64, error_message STRING&amp;gt;&amp;gt;
 └── metrics STRUCT&amp;lt;run_time TIMESTAMP, total_rows INT64&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Splitting the Outputs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now, we extract each dataset using &lt;strong&gt;UNNEST&lt;/strong&gt;.&lt;br&gt;
&lt;strong&gt;Insert main data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO dataset.final_table
SELECT
  d.id,
  d.name
FROM dataset.pipeline_function(),
UNNEST(result.data_rows) AS d;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Insert validation errors&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO dataset.error_table
SELECT
  e.id,
  e.error_message
FROM dataset.pipeline_function(),
UNNEST(result.validation_errors) AS e;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Store pipeline metrics&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO dataset.pipeline_metrics
SELECT
  result.metrics.run_time,
  result.metrics.total_rows
FROM dataset.pipeline_function();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Important Performance Tip
&lt;/h2&gt;

&lt;p&gt;If your function contains expensive logic, avoid calling it multiple times. Instead, store the result once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TEMP TABLE temp_result AS
SELECT *
FROM dataset.pipeline_function();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then extract outputs from temp_result.&lt;br&gt;
This prevents the function from executing repeatedly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Is Useful
&lt;/h2&gt;

&lt;p&gt;This design enables self-monitoring pipelines.&lt;br&gt;
Your function can return:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data&lt;/li&gt;
&lt;li&gt;Validation errors&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Metadata&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then the pipeline splits them into separate tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  JSON vs STRUCT (Important)
&lt;/h2&gt;

&lt;p&gt;Some engineers try to return JSON instead.&lt;br&gt;
However, for large datasets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STRUCT + ARRAY is the better choice. Why?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BigQuery natively optimizes nested STRUCT types&lt;/li&gt;
&lt;li&gt;JSON requires parsing&lt;/li&gt;
&lt;li&gt;STRUCT improves query performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;BigQuery table functions may seem limited at first.&lt;br&gt;
But with the STRUCT + ARRAY pattern, you can design functions that behave like they return multiple tables.&lt;br&gt;
This small trick unlocks powerful pipeline patterns.&lt;br&gt;
Sometimes the best engineering solutions come from working around platform constraints creatively.&lt;/p&gt;

&lt;h2&gt;
  
  
  💬 Question for data engineers:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;How do you handle multi-output logic in BigQuery pipelines?&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>googlecloud</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
