DEV Community

Ivan N
Ivan N

Posted on

Batch rename BigQuery tables with SQL

TLDR;
Recently I had a situation where we had 400+ wildcard tables that were not named correctly. I used a single SQL statement to programatically fetch all the table names and rename them based on a regex.

What are wildcard tables and what can you do with them

Wildcard tables are individual tables that have the same preffix e.g.: sales_20220101, sales_20220102, etc.
For more details see the official documentation.

BigQuery has a filter function _TABLE_SUFFIX that allows you to do some clever stuff with wildcard tables:

SELECT
count(*)
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE _TABLE_SUFFIX BETWEEN '29' and '35'
Enter fullscreen mode Exit fullscreen mode

This will select all the tables 1929, 1930 ... 1935.

I personally prefer partitioned tables, but wildcard have their uses. Namely when you load data from a 3rd party and you expect that they might change their schema at some point in the future.
More on that in another article.

The problem

  • Tables were annoyingly named ABC_20220101_0700_BLA which made using _TABLE_SUFFIX impossible.

  • Luckily with a simple string subset I can easily get to the correct naming schema ABC_20220101.

  • However the biggest issue was the volume of tables (over 400), so anything manual was out of the question.

The solution

My first thought was to use the BigQuery Python API, but I decided to challenge myself and use just SQL.
There are several features in BQ SQL that I used:

  • Data definition language (DDL) - you know this from CREATE TABLE or DROP TABLE, But did you know there is
    ALTER TABLE as well

  • Procedural language - Most SQL users will be familiar with DECLARE and SET keywords. But BigQuery SQL allows you add complex logic like IF and WHILE statements. In this problem I used FOR IN

  • INFORMATION_SCHEMA is a special "dataset" that has several views which give you a lot of metadata about your warehouse (datastes, tables, views, etc.).
    Read more about it here.

The complete code:

FOR t IN
  (
      SELECT 
        table_id as old_name,
        regexp_extract(table_id, r"ABC_\d{8}") as new_name
      FROM my_dataset.__TABLES__
  )
DO
EXECUTE IMMEDIATE (
    "ALTER TABLE my_dataset." || t.old_name || " RENAME TO " || t.new_name
);
END FOR;
Enter fullscreen mode Exit fullscreen mode

Explanation:

FOR t IN
  (
      -- some query that we loop over
  )
DO
  -- run some other query 
END FOR;
Enter fullscreen mode Exit fullscreen mode
  • The FOR loop allows us to itterate over each row of any query
SELECT 
  table_id as old_name,
FROM my_dataset.__TABLES__
Enter fullscreen mode Exit fullscreen mode
  • my_dataset.__TABLES__ wil return a table that has a lot of information about each table in this dataset. I only cared about the name of the tables: table_id
  regexp_extract(table_id, r"ABC_\d{8}") as new_name
Enter fullscreen mode Exit fullscreen mode
  • The regular expression only takes the literal string ABC_ and then 8 digits
EXECUTE IMMEDIATE (
    "ALTER TABLE my_dataset." || t.old_name || " RENAME TO " || t.new_name
);
Enter fullscreen mode Exit fullscreen mode
  • ALTER TABLE my_dataset.x RENAME TO y does exactly what it says on the tin - renames a table. (Don't forget to add the dataset)
  • EXECUTE IMMEDIATE allows us to run a string as a query. Think of it as eval in Python.
  • BigQuery SQL syntax will not let us to SELECT ... FROM my_dataset.some_var because it will treat some_var as literal rather than replacing it with its value.
  • That is why we need to first build a string of the query. And that's what the || chars are for. They just concatenate strings.
  • Here is what the evaluated string looks like: "ALTER TABLE my_dataset.ABC_123_BLA RENAME TO ABC_123_BLA "

Final thoughts:

I was getting some weird error:

Not found: Dataset my-project:my_dataset was not found in location US at [3:7]
Enter fullscreen mode Exit fullscreen mode

It turns out that using the bacticks "`" around my table and dataset names was causing this - so I removed them and it all worked 🤷

Top comments (0)