DEV Community

Cover image for Using Terraform to Configure BigQuery Data Transfer Service for Google Ads & GA4
Charlotte Towell
Charlotte Towell

Posted on

Using Terraform to Configure BigQuery Data Transfer Service for Google Ads & GA4

The benefit of staying within the Google ecosystem shines through when it comes to syncing data into BigQuery with the Data Transfer Service. Rather than building our own API-based integrations and handle scheduling, it is almost* completely handled for us in the case of Google Ads & GA4!

Note the *almost. For an e-commerce use case, an acceptable delay can be to always have yesterday's data available. Unfortunately when it comes to BQDTS, we are working moreso with a 2-day up to date window on the automated syncs.

With a lot of pulling my hair out and trying to figure out if a timezone conversion was failing me I came to the discovery that:

  • Scheduled data transfers can only do as frequent as 2 days in the past
  • Manually triggered backfills for a 'date range' has this same limitation
  • Manually triggered backfills for 'run one-time transfer' magically do pull through the latest data available up until the current date

Therefore, using a combination of the normal scheduled sync + an automated 'manual' triggered sync, we can achieve our yesterday's-data availability method. To do this, I used self scheduling cloud tasks which I recently posted about to handle the daily 'one off transfer' sync.

BQDTS doesn't just run the one latest day per schedule but rather refreshes up to 7 days in the past each run, so we can be assured that our partial data will always be updated in full the following day.

This system has been running well for a little while for me now. A slight hassle? absolutely. But I still much prefer the ease of using Google's transfer service to handle all my data processing without writing the integration myself from scratch.

Without further ado, here are the code examples used to make this run:


Create the destination dataset

resource "google_bigquery_dataset" "skipper_au_google_analytics_ga4" {
  dataset_id                 = "skipper_au_google_analytics_ga4"
  friendly_name             = "Skipper Dataset for Google Analytics GA4 - AU"
  description               = "Dataset for Skipper Google Analytics GA4 BigQuery Data Transfer Service Integration"
  location                  = "australia-southeast1"
  project                   = var.project_id
  default_table_expiration_ms = null
}
Enter fullscreen mode Exit fullscreen mode

Configure BigQuery Data Transfer Service with an 'every day' frequency

resource "google_bigquery_data_transfer_config" "google_analytics__ga4_transfer" {
  project                = var.project_id
  display_name           = "Google Analytics GA4 Transfer"
  location               = var.region
  data_source_id         = "ga4"
  destination_dataset_id = google_bigquery_dataset.skipper_au_google_analytics_ga4.dataset_id
  schedule               = "every day 18:00" #5AM AEST
  service_account_name   = var.bq_data_transfer_sa_email

  params = {
    property_id           = var.google_analytics_property_id
    table_filter          = "Audiences,DemographicDetails,EcommercePurchases,Events,LandingPage,PagesAndScreens,Promotions,TechDetails,TrafficAcquisition,UserAcquisition"
  }

  depends_on = [google_bigquery_dataset.skipper_au_google_analytics_ga4]
}
Enter fullscreen mode Exit fullscreen mode

And now, the trick to enable our latest day stats coming through:

Call the manual transfer run via API

from google.cloud import bigquery_datatransfer_v1


def manual_bigquery_data_transfer_run(TRANSFER_CONFIG_ID: str):

    client = bigquery_datatransfer_v1.DataTransferServiceClient()

    LOCATION = "<location>"
    parent = f"projects/{GCP_PROJECT_ID}/locations/{LOCATION}/transferConfigs/{TRANSFER_CONFIG_ID}"


    # Use proper UTC+10 timezone object
    tz_utc_plus_10 = timezone(timedelta(hours=10))
    now_utc10 = datetime.now(tz_utc_plus_10)
    run_time = now_utc10 - timedelta(minutes=30)

    request = bigquery_datatransfer_v1.types.StartManualTransferRunsRequest(
        parent=parent,
        requested_run_time=run_time.isoformat(),
    )

    response = client.start_manual_transfer_runs(request=request)
    return response
Enter fullscreen mode Exit fullscreen mode

Top comments (0)