Taking Action on your GCP bill: Automating BigQuery Storage Cleanup

#automation #database #dataengineering #googlecloud

In my last post, we explored how to decode GCP Billing with Antigravity and BigQuery MCP to turn an opaque GCP billing export into a granular, custom FinOps CLI. We successfully moved from scratching our heads over cost spikes to having a clear, actionable dashboard right in the terminal.

But observation is only half the FinOps battle. Once you identify the cost drivers, you need a safe, repeatable way to remediate them.

Working deeply in the BigQuery ecosystem every day, I frequently see storage costs silently accumulate from staging environments, daily snapshot dumps, or temporary processing tables.

When it comes to cleaning these up, you often don't want to completely DROP the tables. Dropping a table means destroying its schema, field descriptions, metadata, and carefully crafted IAM policies. Often, you just want to zero out the storage bytes while keeping the structure intact for the next pipeline run.

The solution? TRUNCATE TABLE.

The Ideal State: Dataset Expiration Rules

In a perfect world, the best way to handle these temporary processing tables is to isolate them in a dedicated dataset and configure a Default Table Expiration. By setting this rule at the dataset level, BigQuery automatically drops any table created within it after a specified number of days, zero maintenance required.

Unfortunately, in the real world, that’s not always possible.

Data architectures get messy. Staging tables often end up living alongside long-term reference data where a blanket expiration rule would cause chaos. Or, you might need to keep a specific temp table around for an unpredictable amount of time to debug a broken pipeline. When blunt-force dataset rules are too risky or simply not an option due to legacy architecture, you need a more surgical approach.

Here is how I leveraged Antigravity's agentic workflow to build a reusable bash script to automate this targeted cleanup safely.

The Truncation Tool: Moving from Analysis to Action

When building scripts that perform destructive actions across dozens or hundreds of tables, safety and precision are key. Just like in the exploration phase, having an AI agent that can test raw commands against your actual BigQuery environment via MCP eliminates the usual trial-and-error of writing bash utilities.

Here is the script to solve this (use at your discretion):

#!/bin/bash
# =============================================================================
# BigQuery Table Truncation Script
# =============================================================================
# Safely truncates tables in a BigQuery dataset based on a prefix.
# Defaults to DRY RUN mode.
#
# Usage: ./truncate_tables.sh --project ID --dataset NAME [--prefix PREFIX] [--execute]
# ./truncate_tables.sh --project [my-project]--dataset [mydataset] --prefix PREFIX
# =============================================================================

set -e

# Defaults
PROJECT_ID=""
DATASET_NAME=""
TABLE_PREFIX=""
DRY_RUN=true

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
BOLD='\033[1m'

# Parse arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --project)
            PROJECT_ID="$2"
            shift 2
            ;;
        --dataset)
            DATASET_NAME="$2"
            shift 2
            ;;
        --prefix)
            TABLE_PREFIX="$2"
            shift 2
            ;;
        --execute)
            DRY_RUN=false
            shift
            ;;
        *)
            echo "Unknown option: $1"
            echo "Usage: $0 --project ID --dataset NAME [--prefix PREFIX] [--execute]"
            exit 1
            ;;
    esac
done

# Validation
# Validation
if [ -z "$PROJECT_ID" ] || [ -z "$DATASET_NAME" ]; then
    echo -e "${RED}Error: Missing required arguments.${NC}"
    echo "Usage: $0 --project ID --dataset NAME [--prefix PREFIX] [--execute]"
    exit 1
fi

print_header() {
    echo ""
    echo -e "${BLUE}${BOLD}═══════════════════════════════════════════════════════════════${NC}"
    echo -e "${BLUE}${BOLD}  $1${NC}"
    echo -e "${BLUE}${BOLD}═══════════════════════════════════════════════════════════════${NC}"
    echo ""
}

print_header "🗑️  BigQuery Table Truncation Tool"

if [ "$DRY_RUN" = true ]; then
    echo -e "${YELLOW}${BOLD}[DRY RUN MODE]${NC} No data will be deleted."
    echo "Use --execute to perform the actual truncation."
else
    echo -e "${RED}${BOLD}[EXECUTION MODE]${NC} Tables WILL be truncated."
fi
echo ""

if [ -z "$TABLE_PREFIX" ]; then
    echo "Fetching ALL tables in ${PROJECT_ID}.${DATASET_NAME}..."
    TABLES=$(bq ls --project_id="$PROJECT_ID" --max_results=1000 "$DATASET_NAME" | awk '{if(NR>2) print $1}')
else
    echo "Fetching tables matching prefix '${TABLE_PREFIX}' in ${PROJECT_ID}.${DATASET_NAME}..."
    TABLES=$(bq ls --project_id="$PROJECT_ID" --max_results=1000 "$DATASET_NAME" | grep -E "\b${TABLE_PREFIX}" | awk '{print $1}')
fi

if [ -z "$TABLES" ]; then
    if [ -z "$TABLE_PREFIX" ]; then
         echo "No tables found in dataset '${DATASET_NAME}'."
    else
         echo "No tables found matching prefix '${TABLE_PREFIX}'."
    fi
    exit 0
fi

echo ""
echo -e "${BOLD}Found the following tables:${NC}"
COUNT=0
for table in $TABLES; do
    echo "  - ${DATASET_NAME}.${table}"
    ((COUNT++))
done
echo ""
echo "Total tables to truncate: $COUNT"
echo ""

if [ "$DRY_RUN" = true ]; then
    echo -e "${GREEN}Dry run complete. To truncate these tables, run:${NC}"
    if [ -z "$TABLE_PREFIX" ]; then
        echo "./scripts/truncate_tables.sh --project $PROJECT_ID --dataset $DATASET_NAME --execute"
    else
        echo "./scripts/truncate_tables.sh --project $PROJECT_ID --dataset $DATASET_NAME --prefix $TABLE_PREFIX --execute"
    fi
    exit 0
fi

# Confirmation prompt for Execution Mode
echo -e "${RED}${BOLD}WARNING: You are about to TRUNCATE (delete all data from) the $COUNT tables listed above.${NC}"
read -p "Are you absolutely sure? Type 'CONFIRM' to proceed: " CONFIRMATION

if [ "$CONFIRMATION" != "CONFIRM" ]; then
    echo "Operation cancelled."
    exit 0
fi

echo ""
echo "Starting truncation..."

for table in $TABLES; do
    FULL_TABLE_ID="${PROJECT_ID}.${DATASET_NAME}.${table}"
    echo -n "Truncating $FULL_TABLE_ID ... "

    if bq query --use_legacy_sql=false --quiet "TRUNCATE TABLE \`${FULL_TABLE_ID}\`"; then
        echo -e "${GREEN}DONE${NC}"
    else
        echo -e "${RED}FAILED${NC}"
    fi
done

echo ""
echo -e "${GREEN}All operations completed.${NC}"

How It Works (and Why It’s Built This Way)
Writing this script involved piecing together the bq command-line tool, string manipulation, and standard shell logic. Here are the core design decisions that make it robust:

Defaulting to "Dry Run" The most dangerous scripts are the ones that execute destructive actions by default. This script requires an explicit --execute flag. If you run a command like this:

./truncate_tables.sh --project my-project --dataset mDWH_pre --prefix stg_

It will simply output a neatly formatted list of the tables it would have truncated, giving you complete visibility before pulling the trigger.

Prefix Targeting via awk and grep
Piping bq ls output into grep and awk can be time consuming due to the formatting of the bq CLI tables. Because Antigravity could validate these commands live via MCP, it quickly nailed the exact regex and column isolation needed to cleanly extract just the table names, whether you are targeting the entire dataset or just a specific prefix.
The "Human in the Loop" Failsafe
Even with the --execute flag, truncating data is a one-way door. To prevent accidental executions from simply up-arrowing in the terminal and hitting enter too quickly, the script implements a hard pause:

read -p "Are you absolutely sure? Type 'CONFIRM' to proceed: "

The FinOps Payoff

By combining the analytical script from part one with this targeted remediation script, you close the loop on cloud waste. You can identify the exact dataset driving your BigQuery storage costs, and within seconds, safely truncate hundreds of obsolete staging tables while preserving your carefully constructed data warehouse schema.

With AI tools like Antigravity providing live context into your environment, creating these bespoke, highly effective utility scripts takes minutes instead of hours. The barrier to maintaining a lean cloud environment has never been lower.