<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marcelo Costa</title>
    <description>The latest articles on DEV Community by Marcelo Costa (@mesmacosta).</description>
    <link>https://dev.to/mesmacosta</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F306400%2F552cc732-1ff0-4577-ae76-7ba29d72b33d.png</url>
      <title>DEV Community: Marcelo Costa</title>
      <link>https://dev.to/mesmacosta</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mesmacosta"/>
    <language>en</language>
    <item>
      <title>Taking Action on your GCP bill: Automating BigQuery Storage Cleanup</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sat, 14 Mar 2026 16:21:53 +0000</pubDate>
      <link>https://dev.to/gde/taking-action-on-your-gcp-bill-automating-bigquery-storage-cleanup-4nma</link>
      <guid>https://dev.to/gde/taking-action-on-your-gcp-bill-automating-bigquery-storage-cleanup-4nma</guid>
      <description>&lt;p&gt;In my last post, we explored how to decode GCP Billing with Antigravity and BigQuery MCP to turn an opaque GCP billing export into a granular, custom FinOps CLI. We successfully moved from scratching our heads over cost spikes to having a clear, actionable dashboard right in the terminal.&lt;/p&gt;

&lt;p&gt;But observation is only half the FinOps battle. Once you identify the cost drivers, you need a safe, repeatable way to remediate them.&lt;/p&gt;

&lt;p&gt;Working deeply in the BigQuery ecosystem every day, I frequently see storage costs silently accumulate from staging environments, daily snapshot dumps, or temporary processing tables.&lt;/p&gt;

&lt;p&gt;When it comes to cleaning these up, you often don't want to completely &lt;code&gt;DROP&lt;/code&gt; the tables. Dropping a table means destroying its schema, field descriptions, metadata, and carefully crafted IAM policies. Often, you just want to zero out the storage bytes while keeping the structure intact for the next pipeline run.&lt;/p&gt;

&lt;p&gt;The solution? &lt;code&gt;TRUNCATE TABLE&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ideal State: Dataset Expiration Rules
&lt;/h2&gt;

&lt;p&gt;In a perfect world, the best way to handle these temporary processing tables is to isolate them in a dedicated dataset and configure a &lt;strong&gt;Default Table Expiration&lt;/strong&gt;. By setting this rule at the dataset level, BigQuery automatically drops any table created within it after a specified number of days, zero maintenance required. &lt;/p&gt;

&lt;p&gt;Unfortunately, in the real world, that’s not always possible. &lt;/p&gt;

&lt;p&gt;Data architectures get messy. Staging tables often end up living alongside long-term reference data where a blanket expiration rule would cause chaos. Or, you might need to keep a specific temp table around for an unpredictable amount of time to debug a broken pipeline. When blunt-force dataset rules are too risky or simply not an option due to legacy architecture, you need a more surgical approach.&lt;/p&gt;

&lt;p&gt;Here is how I leveraged Antigravity's agentic workflow to build a reusable bash script to automate this targeted cleanup safely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Truncation Tool: Moving from Analysis to Action
&lt;/h2&gt;

&lt;p&gt;When building scripts that perform destructive actions across dozens or hundreds of tables, safety and precision are key. Just like in the exploration phase, having an AI agent that can test raw commands against your actual BigQuery environment via MCP eliminates the usual trial-and-error of writing bash utilities.&lt;/p&gt;

&lt;p&gt;Here is the script to solve this (use at your discretion):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# =============================================================================&lt;/span&gt;
&lt;span class="c"&gt;# BigQuery Table Truncation Script&lt;/span&gt;
&lt;span class="c"&gt;# =============================================================================&lt;/span&gt;
&lt;span class="c"&gt;# Safely truncates tables in a BigQuery dataset based on a prefix.&lt;/span&gt;
&lt;span class="c"&gt;# Defaults to DRY RUN mode.&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# Usage: ./truncate_tables.sh --project ID --dataset NAME [--prefix PREFIX] [--execute]&lt;/span&gt;
&lt;span class="c"&gt;# ./truncate_tables.sh --project [my-project]--dataset [mydataset] --prefix PREFIX&lt;/span&gt;
&lt;span class="c"&gt;# =============================================================================&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="c"&gt;# Defaults&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nv"&gt;TABLE_PREFIX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nv"&gt;DRY_RUN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Colors&lt;/span&gt;
&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;31m'&lt;/span&gt;
&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;32m'&lt;/span&gt;
&lt;span class="nv"&gt;YELLOW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[1;33m'&lt;/span&gt;
&lt;span class="nv"&gt;BLUE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;34m'&lt;/span&gt;
&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0m'&lt;/span&gt; &lt;span class="c"&gt;# No Color&lt;/span&gt;
&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[1m'&lt;/span&gt;

&lt;span class="c"&gt;# Parse arguments&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nv"&gt;$# &lt;/span&gt;&lt;span class="nt"&gt;-gt&lt;/span&gt; 0 &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    case&lt;/span&gt; &lt;span class="nv"&gt;$1&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
        &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="nb"&gt;shift &lt;/span&gt;2
            &lt;span class="p"&gt;;;&lt;/span&gt;
        &lt;span class="nt"&gt;--dataset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="nb"&gt;shift &lt;/span&gt;2
            &lt;span class="p"&gt;;;&lt;/span&gt;
        &lt;span class="nt"&gt;--prefix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nv"&gt;TABLE_PREFIX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="nb"&gt;shift &lt;/span&gt;2
            &lt;span class="p"&gt;;;&lt;/span&gt;
        &lt;span class="nt"&gt;--execute&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nv"&gt;DRY_RUN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
            shift&lt;/span&gt;
            &lt;span class="p"&gt;;;&lt;/span&gt;
        &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Unknown option: &lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Usage: &lt;/span&gt;&lt;span class="nv"&gt;$0&lt;/span&gt;&lt;span class="s2"&gt; --project ID --dataset NAME [--prefix PREFIX] [--execute]"&lt;/span&gt;
            &lt;span class="nb"&gt;exit &lt;/span&gt;1
            &lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;esac&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Validation&lt;/span&gt;
&lt;span class="c"&gt;# Validation&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATASET_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;Error: Missing required arguments.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Usage: &lt;/span&gt;&lt;span class="nv"&gt;$0&lt;/span&gt;&lt;span class="s2"&gt; --project ID --dataset NAME [--prefix PREFIX] [--execute]"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;print_header&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BLUE&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;═══════════════════════════════════════════════════════════════&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BLUE&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BLUE&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;═══════════════════════════════════════════════════════════════&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

print_header &lt;span class="s2"&gt;"🗑️  BigQuery Table Truncation Tool"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DRY_RUN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;YELLOW&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;[DRY RUN MODE]&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; No data will be deleted."&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Use --execute to perform the actual truncation."&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;[EXECUTION MODE]&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; Tables WILL be truncated."&lt;/span&gt;
&lt;span class="k"&gt;fi
&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TABLE_PREFIX&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Fetching ALL tables in &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;
    &lt;span class="nv"&gt;TABLES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;bq &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;--project_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATASET_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{if(NR&amp;gt;2) print $1}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Fetching tables matching prefix '&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TABLE_PREFIX&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' in &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;
    &lt;span class="nv"&gt;TABLES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;bq &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;--project_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATASET_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TABLE_PREFIX&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $1}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;fi

if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TABLES&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TABLE_PREFIX&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
         &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No tables found in dataset '&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;'."&lt;/span&gt;
    &lt;span class="k"&gt;else
         &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No tables found matching prefix '&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TABLE_PREFIX&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;'."&lt;/span&gt;
    &lt;span class="k"&gt;fi
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;Found the following tables:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;COUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="k"&gt;for &lt;/span&gt;table &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$TABLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  - &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;table&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="o"&gt;((&lt;/span&gt;COUNT++&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;done
&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Total tables to truncate: &lt;/span&gt;&lt;span class="nv"&gt;$COUNT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DRY_RUN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;Dry run complete. To truncate these tables, run:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TABLE_PREFIX&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"./scripts/truncate_tables.sh --project &lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt; --dataset &lt;/span&gt;&lt;span class="nv"&gt;$DATASET_NAME&lt;/span&gt;&lt;span class="s2"&gt; --execute"&lt;/span&gt;
    &lt;span class="k"&gt;else
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"./scripts/truncate_tables.sh --project &lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt; --dataset &lt;/span&gt;&lt;span class="nv"&gt;$DATASET_NAME&lt;/span&gt;&lt;span class="s2"&gt; --prefix &lt;/span&gt;&lt;span class="nv"&gt;$TABLE_PREFIX&lt;/span&gt;&lt;span class="s2"&gt; --execute"&lt;/span&gt;
    &lt;span class="k"&gt;fi
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Confirmation prompt for Execution Mode&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;BOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;WARNING: You are about to TRUNCATE (delete all data from) the &lt;/span&gt;&lt;span class="nv"&gt;$COUNT&lt;/span&gt;&lt;span class="s2"&gt; tables listed above.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Are you absolutely sure? Type 'CONFIRM' to proceed: "&lt;/span&gt; CONFIRMATION

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CONFIRMATION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"CONFIRM"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Operation cancelled."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Starting truncation..."&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;table &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$TABLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nv"&gt;FULL_TABLE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;table&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"Truncating &lt;/span&gt;&lt;span class="nv"&gt;$FULL_TABLE_ID&lt;/span&gt;&lt;span class="s2"&gt; ... "&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;bq query &lt;span class="nt"&gt;--use_legacy_sql&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt; &lt;span class="s2"&gt;"TRUNCATE TABLE &lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;FULL_TABLE_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;DONE&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;FAILED&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;fi
done

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;All operations completed.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How It Works (and Why It’s Built This Way)&lt;br&gt;
Writing this script involved piecing together the bq command-line tool, string manipulation, and standard shell logic. Here are the core design decisions that make it robust:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Defaulting to "Dry Run"
The most dangerous scripts are the ones that execute destructive actions by default. This script requires an explicit &lt;code&gt;--execute&lt;/code&gt; flag. If you run a command like this:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./truncate_tables.sh &lt;span class="nt"&gt;--project&lt;/span&gt; my-project &lt;span class="nt"&gt;--dataset&lt;/span&gt; mDWH_pre &lt;span class="nt"&gt;--prefix&lt;/span&gt; stg_
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will simply output a neatly formatted list of the tables it would have truncated, giving you complete visibility before pulling the trigger.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Prefix Targeting via awk and grep&lt;br&gt;
Piping &lt;code&gt;bq ls&lt;/code&gt; output into &lt;code&gt;grep&lt;/code&gt; and &lt;code&gt;awk&lt;/code&gt; can be time consuming due to the formatting of the bq CLI tables. Because Antigravity could validate these commands live via MCP, it quickly nailed the exact regex and column isolation needed to cleanly extract just the table names, whether you are targeting the entire dataset or just a specific prefix.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The "Human in the Loop" Failsafe&lt;br&gt;
Even with the &lt;code&gt;--execute&lt;/code&gt; flag, truncating data is a one-way door. To prevent accidental executions from simply up-arrowing in the terminal and hitting enter too quickly, the script implements a hard pause:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Are you absolutely sure? Type 'CONFIRM' to proceed: "&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The FinOps Payoff
&lt;/h2&gt;

&lt;p&gt;By combining the analytical script from part one with this targeted remediation script, you close the loop on cloud waste. You can identify the exact dataset driving your BigQuery storage costs, and within seconds, safely truncate hundreds of obsolete staging tables while preserving your carefully constructed data warehouse schema.&lt;/p&gt;

&lt;p&gt;With AI tools like Antigravity providing live context into your environment, creating these bespoke, highly effective utility scripts takes minutes instead of hours. The barrier to maintaining a lean cloud environment has never been lower.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Profiling Memory Leaks in Rust: A Tale of Unexpected Challenges</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Mon, 20 Jan 2025 23:49:22 +0000</pubDate>
      <link>https://dev.to/mesmacosta/profiling-memory-leaks-in-rust-a-tale-of-unexpected-challenges-1p3b</link>
      <guid>https://dev.to/mesmacosta/profiling-memory-leaks-in-rust-a-tale-of-unexpected-challenges-1p3b</guid>
      <description>&lt;p&gt;Rust, with its strong ownership and borrowing system, is well known for its ability to prevent many common programming errors, including memory leaks. However, even Rust isn't immune to these issues under specific circumstances. This blog post serves as a reminder to my past self, who had to identify and resolve a memory leak in a Rust application, and a cautionary tale for my future self, emphasizing the importance of proactive profiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unlikely Culprit: A Rust Memory Leak
&lt;/h2&gt;

&lt;p&gt;Imagine a Rust application deployed to Google Cloud Run. It has been running smoothly for weeks. However, over time, the memory usage gradually increases, leading to eventual crashes due to insufficient memory. In this chart we can see how each day the memory would hit it's peak then reset due a crash:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fti0v98cf38uqpur7cmh6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fti0v98cf38uqpur7cmh6.png" alt="cloud run metrics" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While Rust's ownership system prevents many common memory errors, certain scenarios can still lead to leaks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reference Cycles&lt;/strong&gt;: Circular references between objects can create a situation where objects hold onto each other, preventing them from being deallocated. This is similar to how memory leaks occur in languages with garbage collection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unintentional Rc or Arc Cycles&lt;/strong&gt;: Using Rc (reference counting) or Arc (atomic reference counting) can introduce cycles if not managed carefully. If objects have strong references to each other through these types, they can keep each other alive indefinitely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global Variables with Interior Mutability&lt;/strong&gt;: Global variables with interior mutability (RefCell, Mutex, etc.) can leak memory if the mutable references are not properly managed. If a reference is held indefinitely, the data it points to will also remain in memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgotten drop Implementations&lt;/strong&gt;: If a type owns resources that need explicit deallocation (e.g., file handles, network connections), forgetting to implement the drop trait can lead to resource leaks, which can manifest as memory leaks.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Challenge of Troubleshooting Memory Leaks
&lt;/h2&gt;

&lt;p&gt;Pinpointing the root cause of a memory leak can be a challenging task, even for experienced developers. Many programmers tend to avoid diving deep into memory profiling, due it's time consuming nature to narrow down the problem. It's a time-consuming process of elimination, akin to diagnosing a rare medical condition. You formulate hypotheses, test them, and discard them one by one until the culprit has nowhere left to hide.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcsyqx6n0oi35sjeb9m6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcsyqx6n0oi35sjeb9m6.jpg" alt="house" width="272" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my case, given the critical nature of our service, we needed to act quickly. Within minutes of identifying the memory leak, we implemented a temporary workaround. A GitHub Workflow was set up to automatically restart our Cloud Run service every two hours. &lt;/p&gt;

&lt;p&gt;Basically we just forced a redeploy pointing to the latest image, using GitHub Actions' cron functionality, sample:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: Redeploy every 2 hours

on:
  schedule:
    - cron: '0 */2 * * *' # Runs every 2 hours
env:
  ...

jobs:
  init:
    ...
  tenant-deploys:
    needs: [ init ]
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service: [ tentant-1, tentant-2, tentant-3 ]
    steps:
      ...
      - name: Deploy on Cloud Run
        uses: google-github-actions/deploy-cloudrun@v1
        with:
          service: ${{ matrix.service }}
          image: ${{needs.init.outputs.image_name}}:latest
          region: ${{ env.REGION }}
          gcloud_component: beta
          env_vars: |
            ENV=${{ needs.init.outputs.env }}
            COMMIT_ID=${{ env.COMMIT_ID }}
            RUST_BACKTRACE=full
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was enough to prevent any downtime, now back to the drawing board:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfaamsnw2ighp8lht4rx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfaamsnw2ighp8lht4rx.png" alt="house drawing board" width="425" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspiration from the Rust Community
&lt;/h2&gt;

&lt;p&gt;I came by this great reference from the community: &lt;a href="https://nnethercote.github.io/perf-book/profiling.html" rel="noopener noreferrer"&gt;The Rust Performance Book&lt;/a&gt;, I started testing the options from the list, until I got to Instruments:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7ipw88dv70go3ipvlxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7ipw88dv70go3ipvlxx.png" alt="perf-book" width="800" height="637"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then that led me to these 2 videos:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=JRMOIE_wAFk&amp;amp;t=974s" rel="noopener noreferrer"&gt;Profiling Code in Rust by Vitaly Bragilevsky&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=P3dXH61Kr5U" rel="noopener noreferrer"&gt;Profiling DataFusion with Instruments&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I had used different memory profiling tools for other languages in the past, given those recommendations, I decided to explore Instruments' capabilities for profiling my Rust application.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Unexpected Source of the Leak
&lt;/h2&gt;

&lt;p&gt;After looking at Instruments profiling report:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmiod62gn2199e95xmcr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmiod62gn2199e95xmcr.png" alt="report" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I was able to narrow it down to the allocation of a few Pyo3 objects, the leak was triggered by a complex interaction between Rust and Python which was specific to our application code. The Rust code, calling Python, was holding onto memory Pyo3 objects that were needed during execution, but never released. Circling back to the beginning of the post, it was a bit like the &lt;strong&gt;Forgotten drop Implementations&lt;/strong&gt; scenario.  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A quick tip, if you try to build your rust binary and use it in Instruments, you may get this error:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnppzg7l9ogcfrulsj40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnppzg7l9ogcfrulsj40.png" alt="Sign Error" width="800" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You need to build the binary with debugging symbols:&lt;/p&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[profile.release]
debug = true
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;and sign the binary as:&lt;br&gt;
&lt;a href="https://forums.developer.apple.com/forums/thread/681687?answerId=734339022#734339022" rel="noopener noreferrer"&gt;https://forums.developer.apple.com/forums/thread/681687?answerId=734339022#734339022&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;Again, a bit specific to our custom implementation since we were loading some custom objects into memory, we then added a cleanup method, that we would call after running the Python code, a simple one liner did it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;py03_module.call_method0(“cleanup”)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After rerunning Instruments, with the fix the memory would stay well behaved:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4vjchnp5nfuh9qnkqr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4vjchnp5nfuh9qnkqr5.png" alt=" " width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Leveraging Instruments
&lt;/h2&gt;

&lt;p&gt;Instruments proved to be an invaluable tool in identifying the memory leak, that I'd certainly recommend and use again! By analyzing the memory allocation patterns, I was able to pinpoint the exact line of Rust code responsible for the issue. Once the culprit was identified, fixing the memory leak was relatively straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pragmatism over perfection&lt;/strong&gt;: Sometimes, a temporary workaround is the most practical approach. In our case, implementing a quick fix freed us to thoroughly investigate the memory leak without impacting users. This allowed us to dedicate the necessary time to find a permanent solution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool up&lt;/strong&gt;: Familiarize yourself with a great memory profiler. When you encounter a memory leak, having the right tools can significantly speed up the debugging process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embrace the challenge&lt;/strong&gt;: While frustrating at times, hunting down memory leaks can be make you learn a lot about how the language works. The satisfaction of identifying and resolving the issue is a reward in itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By sharing this experience, I hope to encourage other Rust developers to embrace profiling as a best practice and to be on the watch for unexpected memory leaks, happy profiling!&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cloud</category>
      <category>programming</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Customizing Retry Predicates in Google Cloud Python Libraries</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sat, 16 Nov 2024 12:27:16 +0000</pubDate>
      <link>https://dev.to/mesmacosta/customizing-retry-predicates-in-google-cloud-python-libraries-feb</link>
      <guid>https://dev.to/mesmacosta/customizing-retry-predicates-in-google-cloud-python-libraries-feb</guid>
      <description>&lt;p&gt;Google Cloud's Python libraries are designed for resilience. They add strong retry mechanisms to handle transient errors effectively. However, there may be situations where the default retry behavior isn't suitable. For example, you might encounter certain errors that should not trigger a retry, or you may require more control over the retry logic.&lt;/p&gt;

&lt;p&gt;This blog post explores how the Google Cloud's Python libraries interacts with custom retry predicates, allowing you to customize the retry behavior to better meet your specific requirements.&lt;/p&gt;

&lt;p&gt;In this blog post, I want to highlight a specific example related to using service account impersonation within Google Cloud libraries. In an architecture I designed and am currently working on, we isolate user environments into separate Google Cloud projects. We noticed that some of our services were experiencing degraded performance in certain user flows. After investigating, we traced the issue back to the default retry behavior of the libraries mentioned earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Default Retry Mechanism
&lt;/h2&gt;

&lt;p&gt;Before we go into customization, it's important to understand the default retry behavior of Google Cloud Python libraries. These libraries typically have an exponential backoff strategy with added jitter for retries. This means that when a transient error occurs, the library will retry the operation after a brief delay, with the delay increasing exponentially after each subsequent attempt. The inclusion of jitter introduces randomness to the delay, which helps prevent synchronization of retries across multiple clients.&lt;/p&gt;

&lt;p&gt;While this strategy is effective in many situations, it may not be ideal for every scenario. For example, if you're using service account impersonation and encounter an authentication error, attempting to retry the operation may not be helpful. In such cases, the underlying authentication issue likely needs to be resolved before a retry can succeed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter Custom Retry Predicates
&lt;/h2&gt;

&lt;p&gt;In Google Cloud libraries, custom retry predicates enable you to specify the precise conditions under which a retry attempt should be made. You can create a function that accepts an exception as input and returns True if the operation should be retried, and False if it should not.&lt;/p&gt;

&lt;p&gt;For example, here’s a custom retry predicate that prevents retries for certain authentication errors that occur during service account impersonation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.api_core.exceptions import GoogleAPICallError
from google.api_core.retry import Retry, if_transient_error

def custom_retry_predicate(exception: Exception) -&amp;gt; bool:
    if if_transient_error(exception): # exceptions which should be retried
        if isinstance(exception, GoogleAPICallError):
            if "Unable to acquire impersonated credentials" in exception.message: # look for specific impersonation error
                return False
        return True
    return False
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This predicate checks if the exception is a &lt;code&gt;GoogleAPICallError&lt;/code&gt; and specifically looks for the message "Unable to acquire impersonated credentials". If this condition is met, it returns False, preventing a retry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Custom Predicates with Google Cloud Libraries
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Firestore&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.cloud import firestore

# ... your Firestore setup ...

retry = Retry(predicate=custom_retry_predicate, timeout=10)

# example of an arbitrary firestore api call, works with all
stream = collection.stream(retry=retry)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;BigQuery&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.cloud import bigquery

# ... your BigQuery setup ...

retry = Retry(predicate=custom_retry_predicate, timeout=10)

# example of an arbitrary bigquery api call, works with all
bq_query_job = client.get_job(job_id, retry=retry)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In both examples, we create a &lt;code&gt;Retry&lt;/code&gt; object with our custom predicate and a timeout value. This &lt;code&gt;Retry&lt;/code&gt; object is then passed as an argument to the respective API calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of Custom Retry Predicates
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fine-grained control&lt;/strong&gt;: Define retry conditions based on specific exceptions or error messages with precision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved efficiency&lt;/strong&gt;: Avoid unnecessary retries for non-transient errors, thus saving resources and time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced application stability&lt;/strong&gt;: Handle specific errors gracefully to prevent cascading failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Custom retry predicates offer an effective way to enhance the resilience of your Google Cloud applications. By customizing the retry behavior to suit your specific requirements, you can ensure that your applications are robust, efficient, and scalable. Take charge of your error handling and master the retry process!&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>BigQuery's New JSON Functions: Struct vs. JSON - Choosing the Right Structure</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sun, 25 Aug 2024 18:01:45 +0000</pubDate>
      <link>https://dev.to/mesmacosta/bigquerys-new-json-functions-struct-vs-json-choosing-the-right-structure-2b23</link>
      <guid>https://dev.to/mesmacosta/bigquerys-new-json-functions-struct-vs-json-choosing-the-right-structure-2b23</guid>
      <description>&lt;p&gt;BigQuery recently expanded its capabilities with new JSON helper functions, as seen on their &lt;a href="https://cloud.google.com/bigquery/docs/release-notes" rel="noopener noreferrer"&gt;release notes&lt;/a&gt;:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjva42hq6o3r0ts5n8g5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjva42hq6o3r0ts5n8g5.png" alt="BigQuery new JSON functions" width="800" height="170"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Combined with enhancements to &lt;a href="https://cloud.google.com/blog/products/data-analytics/moving-to-log-analytics-for-bigquery-export-users" rel="noopener noreferrer"&gt;log analytics&lt;/a&gt; (which utilizes JSON columns) and the power of &lt;a href="https://cloud.google.com/blog/products/data-analytics/pinpoint-unique-elements-with-bigquery-search-features" rel="noopener noreferrer"&gt;search functions&lt;/a&gt; across JSON data:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqrpkdlcfjwypyg9dl50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqrpkdlcfjwypyg9dl50.png" alt=" " width="800" height="266"&gt;&lt;/a&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud8tddi1tzvcqs2hf6lt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud8tddi1tzvcqs2hf6lt.png" alt=" " width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's an exciting time to use BigQuery to best leverage these data types. Then let's dive into when to use Struct vs JSON columns in BigQuery, considering their strengths and potential trade-offs.&lt;/p&gt;
&lt;h2&gt;
  
  
  STRUCT
&lt;/h2&gt;

&lt;p&gt;A simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE customers (
  customer_id INT64,
  customer_name STRING,
  address STRUCT&amp;lt;
    street STRING,
    city STRING,
    state STRING,
    zip_code STRING
  &amp;gt;,
  contact STRUCT&amp;lt;
    email STRING,
    phone STRING
  &amp;gt;
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Schema Enforcement:&lt;/strong&gt; Enforces a clear structure, ensuring data consistency and integrity.&lt;br&gt;
No need to run a &lt;code&gt;JSON_KEYS&lt;/code&gt; keys like function, remember? When your data environment starts to get past a few tables to hundreds, that certainly makes a big difference!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Query Performance &amp;amp; Cost Savings&lt;/strong&gt;: Optimized for querying specific nested attributes, leading to potentially faster performance and lower costs for well-structured data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Illustrative example, referencing a BigQuery public dataset, we see how querying different Struct fields impacts the amount of bytes processed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjgzif4g83v3qi9ddjpk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjgzif4g83v3qi9ddjpk.png" alt=" " width="650" height="1280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Querying the &lt;code&gt;ci&lt;/code&gt; field processes significantly fewer bytes compared to querying the &lt;code&gt;system&lt;/code&gt; field, demonstrating the potential cost savings when targeting specific Struct attributes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ci&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0ms0ejvh5l8qin9jo4e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0ms0ejvh5l8qin9jo4e.png" alt=" " width="800" height="53"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;system&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4fxrzz004yseqoxam2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4fxrzz004yseqoxam2y.png" alt=" " width="800" height="50"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;: Simple syntax with dot notation for accessing nested fields, making queries more readable.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
  customer_name, 
  address.city, 
  contact.email 
FROM customers;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  JSON
&lt;/h2&gt;

&lt;p&gt;A simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE products (
  product_id INT64,
  product_name STRING,
  details JSON
);

SELECT 
  product_name, 
  JSON_EXTRACT_SCALAR(details, '$.color') AS color,
  JSON_VALUE(details, '$.price') AS price
FROM products;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Exchange:&lt;/strong&gt; A widely used format for seamless integration with external systems and APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Handles dynamic or evolving data structures without schema changes - perfect for unpredictable or unstructured data.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;"With great flexibility comes great responsibility"&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggf62gm6bxinb8w3b2kj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggf62gm6bxinb8w3b2kj.jpeg" alt=" " width="602" height="402"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The challenge with JSON is handling varying key values. Upstream validation using frameworks like &lt;a href="https://andrew-jones.com/categories/data-contracts/" rel="noopener noreferrer"&gt;data contracts&lt;/a&gt; or other techniques, can help enforce consistency, but if that level of rigor is needed, Structs might be a better fit.&lt;/p&gt;

&lt;p&gt;For genuine JSON needs, new functions like &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#json_keys" rel="noopener noreferrer"&gt;JSON_KEYS&lt;/a&gt; and &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#JSONPath_mode" rel="noopener noreferrer"&gt;JSONPath_mode&lt;/a&gt; provide powerful tools for querying and managing your data.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjncwyr8vvmcp63sldl5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjncwyr8vvmcp63sldl5o.png" alt=" " width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right Structure
&lt;/h2&gt;

&lt;p&gt;The ideal choice between STRUCT and JSON hinges on your specific data characteristics and priorities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;STRUCT&lt;/strong&gt;: When you require strict schema enforcement, predictable query performance, and ease of use with nested data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt;: When you need to accommodate flexible or evolving data structures and prioritize seamless data exchange.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whichever path you choose, BigQuery has you covered! The latest enhancements provide greater control and flexibility in managing both structured and semi-structured data.&lt;/p&gt;

</description>
      <category>bigquery</category>
      <category>googlecloud</category>
      <category>database</category>
    </item>
    <item>
      <title>Working with Files in Cloud Run Jobs: Introducing GCS Fuse</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sun, 21 Jul 2024 13:42:18 +0000</pubDate>
      <link>https://dev.to/mesmacosta/working-with-files-in-cloud-run-jobs-introducing-gcs-fuse-iob</link>
      <guid>https://dev.to/mesmacosta/working-with-files-in-cloud-run-jobs-introducing-gcs-fuse-iob</guid>
      <description>&lt;p&gt;When it comes to processing files within your Cloud Run Jobs, having a familiar filesystem interface can make things a whole lot easier.  That's where GCS Fuse comes in! It bridges the gap between Google Cloud Storage (GCS) and your Cloud Run Job's environment, allowing you to mount GCS buckets as if they were local directories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GCS Fuse?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simplified File Access&lt;/strong&gt;: Read, write, and list files using standard commands and libraries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: GCS Fuse caches frequently accessed files, making subsequent reads faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Integrate with your existing file-based workflows and tools effortlessly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cloud Storage Volume Mounts
&lt;/h2&gt;

&lt;p&gt;Before it was a bit of a hassle to set up GCS Fuse in either Cloud run or Cloud run jobs, you had to install it manually in a Docker container and start it, as you could see in Google &lt;a href="https://github.com/GoogleCloudPlatform/python-docs-samples/blob/fe75ea9941bdf30d3ad4cb5d3266c0d64abac1af/run/filesystem/gcsfuse.Dockerfile" rel="noopener noreferrer"&gt;samples repo&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;When Google announced managed support for it:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0njp7ew6qjooot0012aq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0njp7ew6qjooot0012aq.png" alt="Release Notes" width="800" height="242"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;It was great news! Made my job and the job of many folks that leverage "serverless" solutions in many different parts of their architectures much easier!&lt;/p&gt;

&lt;p&gt;Now, what are those cloud storage volume mounts, you may ask?&lt;/p&gt;

&lt;p&gt;The managed version of GCS Fuse leverages a Cloud Run feature called &lt;strong&gt;Cloud Storage volume mounts&lt;/strong&gt;. Essentially, this allows you to specify a GCS bucket in your Cloud Run Job's configuration, and the job will have direct access to the files within that bucket.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting it up
&lt;/h2&gt;

&lt;p&gt;All you need is to include a volumes section to define the mount point and the GCS bucket you want to access. &lt;a href="https://cloud.google.com/run/docs/configuring/services/cloud-storage-volume-mounts" rel="noopener noreferrer"&gt;docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5cboncnqlmt10fd5cr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5cboncnqlmt10fd5cr.png" alt="Yaml File" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Python library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    container = run_v2.Container()
    container.volume_mounts = [
        run_v2.VolumeMount(
            name=volume_name,
            mount_path=my_local_dir_path,
        ),
    ]

    job = run_v2.Job()
    job.template.template.volumes = [
        run_v2.Volume(
            name=volume_name,
            gcs=run_v2.GCSVolumeSource(
                bucket=my_bucket_path,
            ),
        ),
    ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To use any files that lives inside the bucket, the beauty about it, is you abstract away all the GCS code, and only need to deal with local files.&lt;/p&gt;

&lt;p&gt;Really simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;f = open(f"{my_local_dir_path}/sample-logfile.txt", "a")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood the GCS Fuse config will be doing all the necessary list and read operations, same for writing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tips and Considerations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Caching&lt;/strong&gt;: Keep in mind that GCS Fuse uses caching, so changes you make to files in the mounted directory might not immediately propagate back to GCS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency&lt;/strong&gt;: For multi-worker jobs, be aware of potential concurrency issues if multiple workers try to modify the same file simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File Locking&lt;/strong&gt;: GCS Fuse doesn't provide file locking, so consider how your job handles concurrent writes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;That's it&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;GCS Fuse and now Cloud Storage volume mounts provide a powerful way to deal with file operations in your Cloud Run Jobs. I use this feature extensively in production, make sure you dive into the official documentation for more details and start leveraging this feature to enhance your cloud-based workflows.&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>database</category>
      <category>cloudstorage</category>
    </item>
    <item>
      <title>How to programmatically backup your Firestore database with simple steps</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sun, 28 Apr 2024 19:40:33 +0000</pubDate>
      <link>https://dev.to/mesmacosta/how-to-programmatically-backup-your-firestore-database-with-simple-steps-1k9o</link>
      <guid>https://dev.to/mesmacosta/how-to-programmatically-backup-your-firestore-database-with-simple-steps-1k9o</guid>
      <description>&lt;p&gt;&lt;strong&gt;Why this post&lt;/strong&gt;? Recently, Google Cloud announced in preview a way to &lt;a href="https://cloud.google.com/firestore/docs/backups" rel="noopener noreferrer"&gt;automatically setup and schedule&lt;/a&gt; your Firestore backups. Prior to the announcement, the &lt;a href="https://cloud.google.com/firestore/docs/solutions/schedule-export" rel="noopener noreferrer"&gt;recommended approach&lt;/a&gt; required multiple serverless components, such as Cloud Functions and Cloud Scheduler.&lt;/p&gt;

&lt;p&gt;At the time this post was written, there was no public documentation around how to use Google Cloud APIs to run the aforementioned feature, but &lt;a href="https://cloud.google.com/firestore/docs/backups#create_a_daily_backup_schedule" rel="noopener noreferrer"&gt;using gcloud&lt;/a&gt;:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8lcjvpocoxuvfny3npj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8lcjvpocoxuvfny3npj.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  How to do it programmatically with Python
&lt;/h1&gt;

&lt;p&gt;Many users are not aware, but sometimes the newest API operations or available features are not immediately available on Google SDKs, but you have something they call &lt;a href="https://developers.google.com/apis-explorer" rel="noopener noreferrer"&gt;discovery API client&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;In summary, the Google API Discovery service simplifies the process of working with Google APIs by providing structured and standardized documentation, which under the hood is utilized by their client libraries:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cbm2h3v8hec2lf7xcy6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cbm2h3v8hec2lf7xcy6.png" alt=" " width="800" height="139"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically, it's a document that tells machines how to interact with their APIs, which sometimes can be helpful as documentation. I recommend always using each of Google's SDK services and relying on the discovery client if the operation is unavailable in the SDK or if you want to get more details on what is available for that service with its models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Then how to use it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First, start by installing the &lt;a href="https://pypi.org/project/google-api-python-client/" rel="noopener noreferrer"&gt;google-api-python-client&lt;/a&gt; PyPI package.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkie8ibne0qvkne3ck07b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkie8ibne0qvkne3ck07b.png" alt=" " width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, after looking at the discovery JSON that you can get in this &lt;a href="https://developers.google.com/apis-explorer/" rel="noopener noreferrer"&gt;link&lt;/a&gt;, and finding what is the right service and operation you need to call, you build the service object:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcdj25uc59o4pr2h0lqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcdj25uc59o4pr2h0lqg.png" alt=" " width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, by inspecting what the &lt;code&gt;gcloud&lt;/code&gt; command was doing, I got to the service I needed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzuts4o45tnvrebn8mqit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzuts4o45tnvrebn8mqit.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The full code sample is here; I hope it helps!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import googleapiclient.discovery

# change to your project and db ids
project_id = "MY_PROJECT_ID"
database_id = "MY_FIRSTORE_DB_ID"

api_service_name = "firestore"
api_version = "v1"
discovery_url = f"https://{api_service_name}.googleapis.com/$discovery/rest?version={api_version}"
service = googleapiclient.discovery.build(
    api_service_name, api_version, discoveryServiceUrl=discovery_url
)
created_backup = (
    service.projects()
    .databases()
    .backupSchedules()
    .create(
        parent=f"projects/{project_id}/databases/{database_id}",
        body={
            "retention": "604800s",
            "dailyRecurrence": {},
        },
    )
    .execute()
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I chose 604800s, equivalent to 7 days, and &lt;code&gt;dailyRecurrence&lt;/code&gt; which doesn't require any payload attributes for daily backups. If you are looking to schedule it weekly, you may change &lt;code&gt;dailyRecurrence&lt;/code&gt; to something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"weeklyRecurrence": {
  # day of week enum
  "day": "MONDAY"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>firestore</category>
      <category>googlecloud</category>
      <category>database</category>
      <category>python</category>
    </item>
    <item>
      <title>How to combine BigQuery with DuckDB</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sat, 27 Apr 2024 14:01:28 +0000</pubDate>
      <link>https://dev.to/mesmacosta/how-to-combine-bigquery-with-duckdb-55gk</link>
      <guid>https://dev.to/mesmacosta/how-to-combine-bigquery-with-duckdb-55gk</guid>
      <description>&lt;p&gt;This blog post will discuss the benefits of integrating Google BigQuery, a leading data warehouse solution, with DuckDB, an embedded analytical database. This powerful combination can enhance your data analysis processes by offering the best of both worlds: BigQuery's massive scalability and DuckDB's agility for quick and on-the-fly queries.&lt;/p&gt;

&lt;p&gt;Before we start, here is a quick summary of the key features for each:  &lt;/p&gt;

&lt;h2&gt;
  
  
  BigQuery
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Serverless Architecture&lt;/strong&gt;: BigQuery manages infrastructure automatically, scaling to meet query demands without manual resource provisioning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage and Computation Separation&lt;/strong&gt;: Users can store large amounts of data independently, reducing costs and optimizing performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-time Analytics&lt;/strong&gt;: Supports real-time analysis with the capability to stream and query data almost instantaneously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Machine Learning Integration&lt;/strong&gt;: BigQuery ML offers machine learning capabilities inside the database, allowing SQL practitioners to build and deploy models using SQL commands.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  DuckDB
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;In-Process Database&lt;/strong&gt;: Designed for embedded processes, it is ideal for applications and analytics tools requiring a built-in database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Simple Integration&lt;/strong&gt;: Easy to set up&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's review two easy options for bringing your BigQuery data to DuckDB.&lt;/p&gt;

&lt;h2&gt;
  
  
  Export Data From BigQuery to DuckDB
&lt;/h2&gt;

&lt;p&gt;Export it to cloud storage, then download it manually or use &lt;a href="https://cloud.google.com/storage/docs/gsutil" rel="noopener noreferrer"&gt;gsutil&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXPORT DATA
  OPTIONS (
    uri = 'gs://bq_export_demo/export/*.parquet',
    format = 'PARQUET',
    overwrite = true)
AS (
  SELECT ssn, user_name
  FROM `demo-project.bq_dataset_0024.org_extend_rich_schemas_2890`
  ORDER BY user_name
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the &lt;a href="https://duckdb.org/docs/guides/network_cloud_storage/gcs_import.html" rel="noopener noreferrer"&gt;cloud storage import&lt;/a&gt; feature from DuckDB is also possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  BigQuery Client Library
&lt;/h2&gt;

&lt;p&gt;Make sure your environment has the following libraries installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install duckdb
pip install pyarrow
pip install google-cloud-bigquery
pip install google-cloud-bigquery-storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then an efficient way of querying the data is using the bigquery storage client and its underlying abstractions that map the rows to pyarrow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import duckdb
from google.cloud import bigquery

bqclient = bigquery.Client()
table = bigquery.TableReference.from_string(
    "demo-project.bq_dataset_0024.org_extend_rich_schemas_2890"
)
rows = bqclient.list_rows(table)
org_extend_rich_schemas_2890 = rows.to_arrow(create_bqstorage_client=True)
cursor = duckdb.connect()
print(cursor.execute('SELECT * FROM org_extend_rich_schemas_2890').fetchall())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Push Data from DuckDB to BigQuery
&lt;/h2&gt;

&lt;p&gt;DuckDB has the advantage of allowing you to run everything on your local machine without having to worry about costs. However, it is important to keep in mind that if you are dealing with sensitive or customer-related data, you should take appropriate security measures to protect it.&lt;/p&gt;

&lt;h4&gt;
  
  
  DuckDB: Transform Data and Export to Parquet
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Load the Parquet file
CREATE OR REPLACE TABLE original_data AS
SELECT *
FROM read_parquet('/path/bq_export_demo/export/*.parquet');

-- Perform transformations
CREATE OR REPLACE TABLE transformed_data AS
SELECT
    column1,
    column2,
    column3 + 10 AS new_column3,
    UPPER(column4) AS new_column4
FROM original_data;

-- Export the transformed data to a new Parquet file
COPY transformed_data
TO '/path/to/output_file.parquet' (FORMAT 'parquet');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have your transformed Parquet file, you can load it into BigQuery using a load job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bq load --source_format=PARQUET --autodetect \
mydataset.new_table \
'gs://your_bucket/path/to/output_file.parquet'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that's it! Combining both is certainly something I have in my data toolkit, and it helps me with my day-to-day work.&lt;/p&gt;

&lt;p&gt;Having said that here are some final caveats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don’t Overload DuckDB with Big Data Tasks&lt;/strong&gt;:&lt;br&gt;
DuckDB is not designed to handle data of the same scale as BigQuery. Avoid using DuckDB for large datasets better suited to BigQuery’s infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don’t Neglect the Cost Implications&lt;/strong&gt;:&lt;br&gt;
Be mindful of the costs associated with data storage and transfer, especially when moving large amounts of data between BigQuery and DuckDB.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don’t Forget to Scale Appropriately&lt;/strong&gt;:&lt;br&gt;
As your data grows or your analytical needs change, revisit your use of BigQuery and DuckDB. Scalability is a crucial concern, and what works at one scale may not work well at another.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Worry about Security&lt;/strong&gt;:&lt;br&gt;
Moving sensitive data from a secure production warehouse to your local environment or any environment where DuckDB is used as an embedded database can raise security concerns. Therefore, it is essential to handle sensitive data with care.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hope this helps!&lt;/p&gt;

</description>
      <category>bigquery</category>
      <category>googlecloud</category>
      <category>duckdb</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Sample code on Service-to-Service Authentication in Google Cloud Run for Production and Local environments</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sat, 13 Apr 2024 13:48:12 +0000</pubDate>
      <link>https://dev.to/mesmacosta/sample-code-on-service-to-service-authentication-in-google-cloud-run-for-production-and-local-environments-ehm</link>
      <guid>https://dev.to/mesmacosta/sample-code-on-service-to-service-authentication-in-google-cloud-run-for-production-and-local-environments-ehm</guid>
      <description>&lt;p&gt;When using Google Cloud Run, securing communications between services is crucial. If your system architecture utilizes multiple services, it's likely that these services will need to communicate with each other either synchronously or asynchronously. Some of these services may be private and require authentication credentials for access.&lt;/p&gt;

&lt;p&gt;It's often not easy to find sample code for setting it up for production and local environments and working with both scenarios with a good developer experience. The goal of this blog post is to provide sample code for the aforementioned scenarios for Python and Node/Javascript.&lt;/p&gt;

&lt;h2&gt;
  
  
  Javascript
&lt;/h2&gt;

&lt;p&gt;set up libraries and functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { execSync } from "child_process";
import { GoogleAuth } from "google-auth-library";

function exec(command: string): string {
  return execSync(command).toString().trim();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;get id token for local env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function getLocalIdToken(): string {
  return exec("gcloud auth print-identity-token");
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;get id token for production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async function getProductionIdToken(url: string) {
  const auth = new GoogleAuth();
  const targetAudience = `https://${url}`;
  const client = await auth.getIdTokenClient(targetAudience);
  return await client.idTokenProvider.fetchIdToken(targetAudience);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;suggested approach to use an env variable to switch it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const idToken = process.env.NODE_ENV === "production" ? 
await getProductionIdToken(url)) : getLocalIdToken();
// add your additional logic here that uses the idToken in the Rest or GRPC call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Python
&lt;/h2&gt;

&lt;p&gt;set up libraries and functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import google.auth.transport.requests
import google.oauth2.id_token
from google import auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;get id token for local env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_local_id_token() -&amp;gt; str:
    creds, _ = auth.default(
        scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )
    request = google.auth.transport.requests.Request()
    creds.refresh(request)
    return creds.id_token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;get id token for production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_production_id_token(url: str) -&amp;gt; str:
    auth_request = google.auth.transport.requests.Request()
    audience = f"https://{url}"
    return google.oauth2.id_token.fetch_id_token(auth_request, audience=audience)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;suggested approach to use an env variable to switch it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_id_token(url: str, env: str) -&amp;gt; str:
    if env == "production":
        return get_production_id_token(url)

    return get_local_id_token()

// add your additional logic here that uses the idToken in the Rest or GRPC call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the time of writing this blog post, it was not yet possible to use the exact same code for both strategies. Therefore, I recommend switching the presented logic using an environment or configuration variable. I hope this helps!&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>cloudrun</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Fix Cloud Run Jobs Logging</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Fri, 29 Dec 2023 14:49:21 +0000</pubDate>
      <link>https://dev.to/mesmacosta/how-to-fix-cloud-run-jobs-logging-1ie</link>
      <guid>https://dev.to/mesmacosta/how-to-fix-cloud-run-jobs-logging-1ie</guid>
      <description>&lt;p&gt;Google Cloud Run is a great product and became even better after allowing you to use it to run background Jobs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzk8kyx0nzx2jdp3o8h8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzk8kyx0nzx2jdp3o8h8.png" alt=" " width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recently it became even more flexible allowing users to override a bunch execution args:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F750z1ckxoo4kqrbybz4m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F750z1ckxoo4kqrbybz4m.png" alt=" " width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But at the time this blog post was written, it lacks a bit on some developer tooling, for instance automatically showing logs in &lt;a href="https://cloud.google.com/logging?hl=en" rel="noopener noreferrer"&gt;Google Cloud Logging&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you set up your application and instrument it to run in Google Cloud Run, the logs are automatically set as a &lt;code&gt;gce_instance&lt;/code&gt; resource type. Then if you go in Google Cloud Console, and jump into the Logs of a Cloud Run Job, you see nothing... because it expects it to live under a &lt;code&gt;cloud_run_job&lt;/code&gt; resource type.&lt;/p&gt;

&lt;p&gt;This most likely happens since this feature is kinda new, so I'd imagine this will be fixed, but in meanwhile, here's some Python sample code that fixes this behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import google.cloud.logging
from google.cloud.logging.handlers import CloudLoggingHandler
from google.cloud.logging_v2.handlers import setup_logging
from google.cloud.logging_v2.resource import Resource
from google.cloud.logging_v2.handlers._monitored_resources import retrieve_metadata_server, _REGION_ID, _PROJECT_NAME

client = google.cloud.logging.Client()

cloud_run_job = os.environ.get("CLOUD_RUN_JOB")
if cloud_run_job:
    region = retrieve_metadata_server(_REGION_ID)
    project = retrieve_metadata_server(_PROJECT_NAME)

    # build a manual resource object
    cr_job_resource = Resource(
        type="cloud_run_job",
        labels={
            "job_name": cloud_run_job,
            "location": region.split("/")[-1] if region else "",
            "project_id": project,
        },
    )
    labels = {"run.googleapis.com/execution_name": os.environ.get("CLOUD_RUN_EXECUTION")}
    handler = CloudLoggingHandler(client, resource=cr_job_resource, labels=labels)
    setup_logging(handler)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hope it helps!&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>serverless</category>
      <category>programming</category>
      <category>logging</category>
    </item>
    <item>
      <title>How to use BigQuery Query Caching with Dynamic Wildcard Tables</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Fri, 29 Dec 2023 14:21:11 +0000</pubDate>
      <link>https://dev.to/mesmacosta/how-to-use-bigquery-query-caching-with-dynamic-wildcard-tables-4mhc</link>
      <guid>https://dev.to/mesmacosta/how-to-use-bigquery-query-caching-with-dynamic-wildcard-tables-4mhc</guid>
      <description>&lt;p&gt;&lt;strong&gt;The Problem: Caching does not work with wildcard tables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From BigQuery &lt;a href="https://cloud.google.com/bigquery/docs/querying-wildcard-tables" rel="noopener noreferrer"&gt;official docs&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyftawqstn59z5o16xllp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyftawqstn59z5o16xllp.png" alt="Wildcard Limitation" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's say you have some tables named &lt;code&gt;my_data_2023_*&lt;/code&gt;, where the asterisk represents various months. You want to analyze data across all these tables. Since BigQuery doesn't know automatically when new tables were created, it will invalidate any available cache and run a fresh query, so cache won't be used.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Just for reference, it's not a good practice to use date sharded tables:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp6ubodygndz0ghvlf3h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp6ubodygndz0ghvlf3h.png" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;br&gt;
Recently I faced a scenario where tables where dynamically created based on a business domain field, the date example is only for illustration purposes, if you are using sharded tables, the better solution is to migrate it to BigQuery partitions instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The Solution: Union THEM ALL!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enter the BigQuery Information Schema:&lt;/p&gt;

&lt;p&gt;The BigQuery &lt;code&gt;INFORMATION_SCHEMA&lt;/code&gt; views are read-only, system-defined views that provide metadata information about your BigQuery objects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgs7bbuuapa7ofyhuxgk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgs7bbuuapa7ofyhuxgk.png" alt=" " width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can use the &lt;code&gt;tables&lt;/code&gt; view to dynamically generate a list of all tables matching our pattern (e.g., &lt;code&gt;my_data_2023_*&lt;/code&gt;). Then, we leverage UNION to combine individual queries for each identified table.&lt;/p&gt;

&lt;p&gt;Here's a sample using Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.cloud import bigquery

client = bigquery.Client()

# Specify the dataset and wildcard pattern
dataset_id = "your-project.your_dataset"
wildcard_pattern = "my_data_2023_"

# Query the INFORMATION_SCHEMA to get matching table names
query = f"""
    SELECT table_name
    FROM `{dataset_id}.INFORMATION_SCHEMA.TABLES`
    WHERE table_name LIKE '{wildcard_pattern}%'
"""

rows = list(client.query(f"SELECT table_name FROM `{dataset_id}.INFORMATION_SCHEMA.TABLES` "
                                f"where table_name like '{your_table_prefix_}%'"))

if not rows:
    return

view_query = __create_sql(dict(rows[0])["table_name"])
for row in table_names[1:]:
    view_query = f"""
    {view_query}
    UNION ALL
    {__create_sql(dict(row['table_name'])}
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I omitted the &lt;code&gt;__create_sql&lt;/code&gt; function, which is just a logic that creates a complex SQL based on each table name, with the generated SQL then you can use it to create a BigQuery view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;view = bigquery.Table(table_ref)
view.view_query = view_query
client.create_table(view, exists_ok=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hope that helps, cheers!&lt;/p&gt;

</description>
      <category>bigdata</category>
      <category>googlecloud</category>
      <category>bigquery</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Impersonate a Service Account Using Bigquery Client Library</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sat, 30 Sep 2023 19:01:47 +0000</pubDate>
      <link>https://dev.to/mesmacosta/how-to-impersonate-a-service-account-using-bigquery-client-library-3bmc</link>
      <guid>https://dev.to/mesmacosta/how-to-impersonate-a-service-account-using-bigquery-client-library-3bmc</guid>
      <description>&lt;p&gt;If you are not familiar with Service Accounts in Google Cloud, here's a short text explaining it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A service account is a special kind of account typically used by an application or compute workload, such as a Compute Engine instance, rather than a person. A service account is identified by its email address, which is unique to the account.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The most common way to make an application act like a service account is by connecting the service account to the resource where the application is running. For instance, you can link a service account to a Compute Engine instance so that the applications running on that instance can act as the service account. After that, you can give the service account special permissions (IAM roles) so that it, and the applications on the instance, can use Google Cloud resources.&lt;/p&gt;

&lt;p&gt;In some scenarios such as multi-tentant deployments where you need to have more strict control permissions for each organisation or customer it may make sense to tailor down the permissions, there are multiple ways of dealing with it, but recently upon facing that scenario, I used a feature from Google Cloud called Service Account impersonation to isolate each organisation resources access controls.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When an authenticated principal, such as a user or another service account, authenticates as a service account to gain the service account's permissions, it's called impersonating the service account. Impersonating a service account lets an authenticated principal access whatever the service account can access. Only authenticated principals with the appropriate permissions can impersonate service accounts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's also a quite nice feature since it allows you to use a short-lived token flow as stated in this part of Google Cloud documentation:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8re3fq8ske5drzsj70lh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8re3fq8ske5drzsj70lh.png" alt="Google docs" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quite common scenario if you don't want to have our engineering team downloading service accounts and potentially exposing those credentials. See &lt;a href="https://cloud.google.com/iam/docs/service-account-impersonation" rel="noopener noreferrer"&gt;Service account impersonation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to use it within BigQuery Client Library
&lt;/h2&gt;

&lt;p&gt;There are several ways of doing Service Account impersonation and many samples out there, but at the time this post was written I didn't find sample code showing how to do it using BigQuery client library, so after digging a little bit and some tests here is a working version of it:&lt;/p&gt;

&lt;p&gt;Packages used:&lt;br&gt;
&lt;code&gt;pip install google-cloud-bigquery&lt;/code&gt;&lt;br&gt;
&lt;code&gt;pip install google-auth&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Sample code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google import auth
from google.auth import impersonated_credentials
from google.cloud import bigquery


# Set scopes, usually using the global cloud-platform is enough since the actual persmissions 
# will be set at the Service Account level.
target_scopes = ["https://www.googleapis.com/auth/cloud-platform"]

source_credentials, project = auth.default()
creds = impersonated_credentials.Credentials(
    source_credentials=source_credentials,
    target_principal="[MY_SERVICE_ACCOUNT_ID]@[MYGCP_PROJECT_ID].iam.gserviceaccount.com",
    target_scopes=target_scopes,
)
client = bigquery.Client(credentials=creds, project=project, location=settings.region)

# Then run any additional commands with the impesonated auth scope
# client.query(...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hope this helps!&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>bigquery</category>
      <category>cloudcomputing</category>
      <category>security</category>
    </item>
    <item>
      <title>How to create a SLO for Cloud Run programatically</title>
      <dc:creator>Marcelo Costa</dc:creator>
      <pubDate>Sat, 19 Aug 2023 22:16:18 +0000</pubDate>
      <link>https://dev.to/mesmacosta/how-to-create-a-slo-for-cloud-run-programatically-hp</link>
      <guid>https://dev.to/mesmacosta/how-to-create-a-slo-for-cloud-run-programatically-hp</guid>
      <description>&lt;p&gt;The goal of this post is not to explain what Cloud Run or a SLO is, but providing sample code explaining how to programatically set it up using Google API's.&lt;/p&gt;

&lt;p&gt;If you want more context around SLO's and general SRE concepts I recommend taking a look at the free &lt;a href="https://sre.google/workbook/foreword-I/" rel="noopener noreferrer"&gt;Google SRE book&lt;/a&gt; and more specifically the &lt;a href="https://sre.google/workbook/implementing-slos/" rel="noopener noreferrer"&gt;SRE chapter&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Service level objectives (SLOs) specify a target level for the reliability of your service. Because SLOs are key to making data-driven decisions about reliability, they’re at the core of SRE practices&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why this post?&lt;/strong&gt; Recently I faced a scenario where I had the need to dynamically create Cloud Run services, it's pretty straightforward to create a SLO using the Cloud Run UI with some simple steps, but if you are creating them by using the &lt;a href="https://cloud.google.com/python/docs/reference/run/latest" rel="noopener noreferrer"&gt;Python SDK&lt;/a&gt; or any other programming language SDK, SLO operations are not available.&lt;/p&gt;

&lt;p&gt;At the time this post was written there was no public documentation around using SLO's with Cloud Run through API's, so I wanted to share how I did it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to create the SLO
&lt;/h2&gt;

&lt;p&gt;Many users are not aware but sometimes the newest API operations or available features are not immediately available on Google SDK's, but you have something they call &lt;a href="https://developers.google.com/apis-explorer/" rel="noopener noreferrer"&gt;discovery api client&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;In summary, the Google API Discovery service simplifies the process of working with Google APIs by providing structured and standardised documentation, which under the hood is utilised by their own client libraries:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx92m773xk4tjint9j1xs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx92m773xk4tjint9j1xs.png" alt=" " width="800" height="192"&gt;&lt;/a&gt;   &lt;/p&gt;

&lt;p&gt;Basically it's a document that tells machines how to interact with their API's, which sometimes can be useful as documentation. I recommend always using each of Google services SDK's, and relying on the discovery client if the operation is not available in the SDK or if you want to get more details on what is available for that service with its models.&lt;/p&gt;

&lt;p&gt;Then how to use it?&lt;/p&gt;

&lt;p&gt;First you start by installing the &lt;a href="https://pypi.org/project/google-api-python-client/" rel="noopener noreferrer"&gt;google-api-python-client&lt;/a&gt; PyPI package.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkie8ibne0qvkne3ck07b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkie8ibne0qvkne3ck07b.png" alt=" " width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next after at looking at the discovery JSON that you can get in this &lt;a href="https://developers.google.com/apis-explorer/" rel="noopener noreferrer"&gt;link&lt;/a&gt;, and finding what is the right service and operation you need to call, you build the service object:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcdj25uc59o4pr2h0lqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcdj25uc59o4pr2h0lqg.png" alt=" " width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By inspecting what the Cloud Run UI was doing, I got to the monitoring service and that I needed to basically do 3 steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First make sure you have created your Cloud Run service and copy it's name.&lt;/li&gt;
&lt;li&gt;Call the service create operation with your Cloud Run service name using the monitoring API's.&lt;/li&gt;
&lt;li&gt;Call the create_service_level_objective API for each SLO using the service name generated at #2 and not #1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I ended up creating two SLO's&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;SLO for latency using a calendar day config&lt;/li&gt;
&lt;li&gt;SLO for status health using a rolling day config&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full code sample is here, hope it helps!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import logging
import os

from google.cloud import monitoring_v3
import googleapiclient.discovery
from googleapiclient import errors


logger = logging.getLogger(__name__)


def run(project_id: str, location: str, service_name: str) -&amp;gt; None:
    try:
        monitoring_client = monitoring_v3.ServiceMonitoringServiceClient()

        api_service_name = 'monitoring'
        api_version = 'v3'
        # https://developers.google.com/apis-explorer/
        discovery_url = f'https://{api_service_name}.googleapis.com/$discovery/rest?version={api_version}'
        service = googleapiclient.discovery.build(api_service_name, api_version, discoveryServiceUrl=discovery_url)

        body = {
                "displayName": service_name,
                "cloudRun": {
                    "serviceName": service_name,
                    "location": location
                }
            }

        created_service = service.services().create(parent=f'projects/{project_id}', body=body).execute()

        if created_service:
            service_id = created_service['name'].split("/")[-1]
            slo_configuration = monitoring_v3.ServiceLevelObjective()
            slo_configuration.display_name = '90% - Latency - Calendar day'
            slo_configuration.goal = 0.9

            request = monitoring_v3.CreateServiceLevelObjectiveRequest()
            slo_configuration.calendar_period = "DAY"
            sli_configuration = monitoring_v3.ServiceLevelIndicator()
            sli_configuration.basic_sli = {
                "latency": {
                    "threshold": "1200s"
                }
            }
            slo_configuration.service_level_indicator = sli_configuration
            request.service_level_objective = slo_configuration
            service_name_for_slo = f'projects/{project_id}/services/{service_id}'
            request.parent = service_name_for_slo
            monitoring_client.create_service_level_objective(request)

            slo_configuration = monitoring_v3.ServiceLevelObjective()
            slo_configuration.display_name = '90% - Availability - Rolling day'
            slo_configuration.goal = 0.9

            request = monitoring_v3.CreateServiceLevelObjectiveRequest()
            slo_configuration.rolling_period = "86400s"
            sli_configuration = monitoring_v3.ServiceLevelIndicator()
            sli_configuration.basic_sli = {
                "availability": {}
            }
            slo_configuration.service_level_indicator = sli_configuration
            request.service_level_objective = slo_configuration
            service_name_for_slo = f'projects/{project_id}/services/{service_id}'
            request.parent = service_name_for_slo
            monitoring_client.create_service_level_objective(request)
    except errors.HttpError:
        logger.info("Monitoring SlO's already created, skipping")


if __name__ == '__main__':
    project_id = os.getenv('PROJECT_ID')
    location = os.getenv('LOCATION')
    service_name = os.getenv('CLOUD_RUN_SERVICE_NAME')
    run(project_id=project_id,
        location=location,
        service_name=service_name)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>googlecloud</category>
      <category>monitoring</category>
      <category>slo</category>
      <category>sre</category>
    </item>
  </channel>
</rss>
