Suhas Mallesh

Posted on Apr 16

Vertex AI Feature Store with Terraform: BigQuery Offline + Bigtable Online Serving 🗃️

#gcp #terraform #mlops #ai

Feature Store on GCP uses BigQuery as the offline store and Bigtable for low-latency online serving. Feature groups register your data, feature views sync it to the online store. Here's how to provision the full stack with Terraform.

In the previous posts, we set up Workbench for development and deployed endpoints for inference. But the features feeding those models need a home. Training uses historical features from BigQuery. Inference needs the latest values with sub-millisecond latency. When these two sources diverge, you get training-serving skew.

Vertex AI Feature Store bridges this gap. BigQuery is the offline store - your features live in tables you already manage. Bigtable is the online store - an auto-scaling, low-latency serving layer that syncs from BigQuery on a schedule. You don't copy data to a separate system. Feature Store reads directly from BigQuery and syncs to Bigtable for serving. 🎯

🏗️ Feature Store Architecture

Component	What It Does
Feature Group	Registers a BigQuery table as a feature source
Feature	Individual column within a feature group
Feature Online Store	Bigtable instance for real-time serving
Feature View	Defines which features sync to the online store
Data Sync	Scheduled or continuous sync from BigQuery to Bigtable

The key insight: BigQuery is already your offline store. You don't move data. Feature Store registers your existing BigQuery tables, then syncs selected features to Bigtable for online serving.

🔧 Terraform: Create the Feature Online Store

APIs

# feature_store/apis.tf

resource "google_project_service" "required" {
  for_each = toset([
    "aiplatform.googleapis.com",
    "bigtable.googleapis.com",
    "bigtableadmin.googleapis.com",
    "bigquery.googleapis.com",
  ])
  project = var.project_id
  service = each.value
}

Feature Online Store (Bigtable-backed)

# feature_store/online_store.tf

resource "google_vertex_ai_feature_online_store" "this" {
  name     = "${var.environment}-feature-store"
  region   = var.region
  project  = var.project_id

  bigtable {
    auto_scaling {
      min_node_count         = var.bigtable_min_nodes
      max_node_count         = var.bigtable_max_nodes
      cpu_utilization_target = var.bigtable_cpu_target
    }
  }

  labels = {
    environment = var.environment
    managed_by  = "terraform"
  }
}

Bigtable autoscaling adjusts nodes based on CPU utilization. Set cpu_utilization_target to 50-60% for production workloads. The store scales up automatically during traffic spikes and scales down during quiet periods.

Feature Group (Register BigQuery Source)

# feature_store/feature_group.tf

resource "google_vertex_ai_feature_group" "customer_features" {
  name     = "${var.environment}-customer-features"
  region   = var.region
  project  = var.project_id

  big_query {
    big_query_source {
      input_uri = "bq://${var.project_id}.${var.dataset_id}.${var.customer_features_table}"
    }
    entity_id_columns = ["customer_id"]
  }

  labels = {
    domain = "customer"
  }
}

entity_id_columns defines the primary key for feature lookups. This is what you use to retrieve features for a specific customer during inference.

Register Individual Features

# feature_store/features.tf

resource "google_vertex_ai_feature_group_feature" "total_purchases" {
  name           = "total_purchases"
  region         = var.region
  feature_group  = google_vertex_ai_feature_group.customer_features.name
  project        = var.project_id
}

resource "google_vertex_ai_feature_group_feature" "avg_order_value" {
  name           = "avg_order_value"
  region         = var.region
  feature_group  = google_vertex_ai_feature_group.customer_features.name
  project        = var.project_id
}

resource "google_vertex_ai_feature_group_feature" "days_since_last_purchase" {
  name           = "days_since_last_purchase"
  region         = var.region
  feature_group  = google_vertex_ai_feature_group.customer_features.name
  project        = var.project_id
}

Each feature maps to a column in your BigQuery table. Registering features enables metadata tracking, drift monitoring, and controlled syncing to the online store.

Feature View (Sync to Online Store)

# feature_store/feature_view.tf

resource "google_vertex_ai_feature_online_store_featureview" "customer_view" {
  name                 = "${var.environment}-customer-view"
  region               = var.region
  feature_online_store = google_vertex_ai_feature_online_store.this.name
  project              = var.project_id

  sync_config {
    cron = var.sync_schedule
  }

  feature_registry_source {
    feature_groups {
      feature_group_id = google_vertex_ai_feature_group.customer_features.name
      feature_ids      = [
        google_vertex_ai_feature_group_feature.total_purchases.name,
        google_vertex_ai_feature_group_feature.avg_order_value.name,
        google_vertex_ai_feature_group_feature.days_since_last_purchase.name,
      ]
    }
  }
}

The feature view selects which features from which groups sync to the online store. The cron schedule controls how frequently BigQuery data is synced to Bigtable.

📐 BigQuery Source Table Structure

Your BigQuery table needs an entity ID column and a feature timestamp:

CREATE TABLE `project.ml_features.customer_features` (
  customer_id STRING NOT NULL,
  feature_timestamp TIMESTAMP NOT NULL,
  total_purchases INT64,
  avg_order_value FLOAT64,
  days_since_last_purchase INT64,
  account_age_days INT64,
  is_premium BOOL
);

Feature Store reads this table directly. The feature_timestamp column enables point-in-time queries for training. The online store always serves the latest snapshot.

🐍 Read Features (SDK)

Online Store (Real-Time Inference)

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

feature_online_store = aiplatform.FeatureOnlineStore("prod-feature-store")
feature_view = feature_online_store.get_feature_view("prod-customer-view")

# Fetch features for a specific customer
response = feature_view.fetch_feature_values(
    entity_ids=["cust-12345"],
)

for entity in response:
    print(entity.to_dict())
# {'customer_id': 'cust-12345', 'total_purchases': 47, 'avg_order_value': 89.5, ...}

Offline Store (Training via BigQuery)

from google.cloud import bigquery

client = bigquery.Client()

query = """
SELECT customer_id, total_purchases, avg_order_value, is_premium
FROM `project.ml_features.customer_features`
WHERE feature_timestamp BETWEEN '2025-01-01' AND '2025-12-31'
"""

training_df = client.query(query).to_dataframe()
print(f"Training data: {len(training_df)} rows")

No separate offline store to manage. Query BigQuery directly with standard SQL.

📐 Environment Configuration

# environments/dev.tfvars
environment          = "dev"
bigtable_min_nodes   = 1
bigtable_max_nodes   = 1
bigtable_cpu_target  = 80
sync_schedule        = "0 */6 * * *"    # Every 6 hours

# environments/prod.tfvars
environment          = "prod"
bigtable_min_nodes   = 1
bigtable_max_nodes   = 5
bigtable_cpu_target  = 50
sync_schedule        = "0 * * * *"      # Every hour

Sync frequency vs freshness: Hourly sync means online features can be up to 1 hour stale. For near-real-time features, use continuous data sync (requires Bigtable online serving and BigQuery source in specific regions).

⚠️ Gotchas and Tips

BigQuery is the source of truth. Unlike other feature stores where you ingest data into a proprietary system, Vertex AI Feature Store reads from BigQuery. Your existing ETL pipelines that write to BigQuery already feed the feature store.

Bigtable minimum cost. Even at 1 node, Bigtable costs roughly $0.65/hour (~$470/month). For dev environments, consider whether you need online serving at all, or if BigQuery direct queries suffice.

Optimized online serving is deprecated. As of May 2026, only Bigtable online serving is supported. Don't use optimized {} in new deployments. Migrate existing optimized stores to Bigtable.

Sync latency. Scheduled sync has an inherent delay based on your cron schedule. Continuous sync is near-real-time but only available in specific regions (us, eu, us-central1).

Feature monitoring. Register features through feature groups to enable drift detection and anomaly monitoring. Without registration, you lose this observability.

Bigtable serving latency. Expect ~30ms server-side latency at moderate load (~100 QPS). Client-side latency adds 5ms+. This is fast enough for most inference use cases but not sub-millisecond.

⏭️ What's Next

This is Post 3 of the GCP ML Pipelines & MLOps with Terraform series.

Post 1: Vertex AI Workbench 🔬
Post 2: Vertex AI Endpoints - Deploy to Prod 🚀
Post 3: Vertex AI Feature Store (you are here) 🗃️
Post 4: Vertex AI Pipelines + Cloud Build

Your features have a home. BigQuery for offline training, Bigtable for online serving, automatic sync between them. No data duplication. No training-serving skew. Your existing BigQuery tables are the source of truth, all provisioned with Terraform. 🗃️

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

DEV Community