DEV Community: Jeongho Son

Be Careful with @run_date in BigQuery Scheduled Queries – Especially in KST/JST!

Jeongho Son — Thu, 19 Jun 2025 15:17:13 +0000

In Google BigQuery, there’s a convenient function called @run_date. It automatically sets the reference date for your query to the time it is executed. Sounds great, right?

Well... only if you're manually running the query from the Scheduled Queries UI. If you set it to run automatically at a scheduled time, and you haven’t explicitly set the timezone, the query will run in UTC.

Here’s the catch:
For countries like South Korea or Japan(UTC+9), this can lead to unexpected behavior, especially if your query runs across midnight UTC.
A job scheduled for 9:00 AM KST will actually run at midnight UTC, and @run_date will evaluate to the previous date in local time. That’s a nasty, silent bug waiting to happen.

To avoid this mess, always set the timezone explicitly when scheduling queries.
And more importantly, use @run_time instead of @run_date if you need to control the exact time, with timezone.

-- You don't need to wrap with DATE() unless you need a DATE type
DATE(@run_time, 'Asia/Seoul')
DATE(@run_time, 'Asia/Tokyo')

Unfortunately, many of the datamarts in our team were built using @run_date without timezone awareness, so now there’s a mountain of fixes ahead. But hey... we'll survive. Somehow. Ha!

Reference: https://qiita.com/sushi_edo/items/8250902ce2af778c2e8f

구글 빅쿼리 @run_date 함수에 관하여

Jeongho Son — Thu, 19 Jun 2025 15:15:48 +0000

구글 빅쿼리에는 @run_date라는 함수가 있는데 이 함수를 이용하면 쿼리를 실행할 때 데이터 집계 기준 시간을 쿼리 실행시각으로 자동 설정할 수 있다.

단, 어디까지나 구글 빅쿼리의 스케줄된쿼리 메뉴에서 사용자가 수동으로 쿼리를 실행할 때의 이야기이고 정해진 시각에 자동으로 실행하도록 설정되어 있는 쿼리에 대해선 따로 시각을 설정해두지 않으면 세계표준시(UTC)로 실행되어버린다.

이 경우 한국,일본과 같이 세계표준시보다 9시간 빠른 국가의 경우에 쿼리의 실행시간이 날짜를 걸쳐서 실행되면 의도하지 않은 결과를 낳을 수 있기 때문에 주의를 요한다.

이를 방지하기 위해 타임존을 따로 설정할 수 있는데, 타임존을 설정할 경우엔 @run_date가 아니라 @run_time 함수를 사용하여야한다.

-- 타임스탬프형으로 사용하고 싶은 경우엔 굳이 DATE함수를 사용하지 않아도 괜찮다.
DATE(@run_time,'Asia/Seoul')
DATE(@run_time,'Asia/Tokyo')

팀내에서 만든 데이터마트들이 타임존 설정 없이 죄다 @run_date로 설정되어 있었기 때문에, 앞으로 고쳐야할 부분이 산더미 같지만...뭐 어떻게든 되겠지 하하...

참고 문서: https://qiita.com/sushi_edo/items/8250902ce2af778c2e8f

How to Undo git add / git commit / git push

Jeongho Son — Sat, 14 Jun 2025 13:38:32 +0000

Goal:

Be able to undo git add.
Be able to undo git commit.
Be able to undo git push.
Be able to delete untracked files.
Be able to restore modified files back to their original state before changes.

Undo git add (Unstage Files)

When you accidentally add files to the staging area that you didn’t intend to include:
You can remove files from the staging area without deleting your actual changes.

# First, check the file status
$ git status

# To unstage a specific file
$ git reset HEAD <file_name>

# To unstage all files from the staging area
$ git reset HEAD

Undo git commit

When you accidentally committed files you didn’t intend to:
If you forgot to include certain files and committed too early.

# Check commit history
$ git log

# 1. Undo the commit but keep the changes staged (as if `git add` was done)
$ git reset --soft HEAD^

# 2. Undo the commit and unstage the changes (files remain in the working directory)
$ git reset --mixed HEAD^  # --mixed is the default, so you can omit it

# 3. Undo the commit and discard all changes (files will be deleted)
$ git reset --hard HEAD^

# If you want to reset your working directory to the latest commit from the remote (not recommended)
$ git reset --hard HEAD

Undo git push

If you're not working on a shared repository, it's usually fine. But if you're collaborating with others, always discuss with your teammates before undoing a push.

Reason:
Commits made after the one you're reverting will be lost.

Personally, I try to avoid canceling a push.
Instead, I prefer modifying the files and committing the changes again.
That way, the edit history is preserved and it's easier for teammates to track what was fixed.

# 1. Undo the latest commit
$ git reset HEAD^

# 2. Check the commit history
$ git log

# 3. Reset the working directory to a specific commit
$ git reset <commit_id>

# 4. After making necessary changes, force push to the remote repository (force push will overwrite remote history)
$ git push -f origin <branch_name>

Appendix 1: Deleting Untracked Files

For some reason, untracked files tend to pile up while working... Since I can't stand messy workspaces, I clean them up pretty often

# Delete untracked files
$ git clean -f

# Delete untracked files and directories
$ git clean -f -d

# Delete untracked files, directories, and ignored files
$ git clean -f -d -x

Appendix 2: Changing Commit Messages

If you wrote the wrong commit message, you can change it using the following command:

git commit --amend

Embulk & Digdag (Loading into Google BigQuery)

Jeongho Son — Sat, 14 Jun 2025 13:23:39 +0000

Recently, I've been experimenting with ETL tools while learning data engineering. Every time I hear "ETL," I can't help but think of NewJeans' ETA... It's driving me crazy.

Anyway, here’s a quick summary of two open-source tools: Embulk and Digdag — especially when you're loading data into Google BigQuery.

Embulk: OSS Tool for Data Transfer (Load) Between DBs, Storages, File Formats, and Cloud Services
You can easily transfer local files into Google Cloud BigQuery using Embulk.

You write a simple .yml configuration file to define input and output:
(e.g. which file to read, how to parse it, and where to load it.)

Execution is straightforward:

bash
embulk run <file_name.yml>

Tip: When handling JSON files, make sure to define only one column during parsing.

Digdag: Task Pipeline, Scheduling, and Workflow Automation
In simple terms, Digdag is a workflow automation tool.

After extracting data, you can run SQL queries on the table and load the processed results back into BigQuery.

Of course, you can also specify target table names during the workflow.

To execute:

bash
digdag run <file_name.dig>

Example Workflow
Here’s a simple example of how Embulk and Digdag can be combined:

Load CSV File using Embulk (Input & Output)
Parse the file and load it into a staging table.
Transform the Loaded Table using SQL
Use SELECT to extract and split specific fields into proper columns.
Load the Transformed Data into a Final Table

That's it. Very simple but very effective combo when you're dealing with lightweight ETL pipelines into BigQuery.

Official Google BigQuery Docs

JSON functions: https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions

Conversion functions : https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_functions

data-types : https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types