Feature Store on GCP uses BigQuery as the offline store and Bigtable for low-latency online serving. Feature groups register your data, feature views sync it to the online store. Here's how to provision the full stack with Terraform.
In the previous posts, we set up Workbench for development and deployed endpoints for inference. But the features feeding those models need a home. Training uses historical features from BigQuery. Inference needs the latest values with sub-millisecond latency. When these two sources diverge, you get training-serving skew.
Vertex AI Feature Store bridges this gap. BigQuery is the offline store - your features live in tables you already manage. Bigtable is the online store - an auto-scaling, low-latency serving layer that syncs from BigQuery on a schedule. You don't copy data to a separate system. Feature Store reads directly from BigQuery and syncs to Bigtable for serving. π―
ποΈ Feature Store Architecture
| Component | What It Does |
|---|---|
| Feature Group | Registers a BigQuery table as a feature source |
| Feature | Individual column within a feature group |
| Feature Online Store | Bigtable instance for real-time serving |
| Feature View | Defines which features sync to the online store |
| Data Sync | Scheduled or continuous sync from BigQuery to Bigtable |
The key insight: BigQuery is already your offline store. You don't move data. Feature Store registers your existing BigQuery tables, then syncs selected features to Bigtable for online serving.
π§ Terraform: Create the Feature Online Store
APIs
# feature_store/apis.tf
resource "google_project_service" "required" {
for_each = toset([
"aiplatform.googleapis.com",
"bigtable.googleapis.com",
"bigtableadmin.googleapis.com",
"bigquery.googleapis.com",
])
project = var.project_id
service = each.value
}
Feature Online Store (Bigtable-backed)
# feature_store/online_store.tf
resource "google_vertex_ai_feature_online_store" "this" {
name = "${var.environment}-feature-store"
region = var.region
project = var.project_id
bigtable {
auto_scaling {
min_node_count = var.bigtable_min_nodes
max_node_count = var.bigtable_max_nodes
cpu_utilization_target = var.bigtable_cpu_target
}
}
labels = {
environment = var.environment
managed_by = "terraform"
}
}
Bigtable autoscaling adjusts nodes based on CPU utilization. Set cpu_utilization_target to 50-60% for production workloads. The store scales up automatically during traffic spikes and scales down during quiet periods.
Feature Group (Register BigQuery Source)
# feature_store/feature_group.tf
resource "google_vertex_ai_feature_group" "customer_features" {
name = "${var.environment}-customer-features"
region = var.region
project = var.project_id
big_query {
big_query_source {
input_uri = "bq://${var.project_id}.${var.dataset_id}.${var.customer_features_table}"
}
entity_id_columns = ["customer_id"]
}
labels = {
domain = "customer"
}
}
entity_id_columns defines the primary key for feature lookups. This is what you use to retrieve features for a specific customer during inference.
Register Individual Features
# feature_store/features.tf
resource "google_vertex_ai_feature_group_feature" "total_purchases" {
name = "total_purchases"
region = var.region
feature_group = google_vertex_ai_feature_group.customer_features.name
project = var.project_id
}
resource "google_vertex_ai_feature_group_feature" "avg_order_value" {
name = "avg_order_value"
region = var.region
feature_group = google_vertex_ai_feature_group.customer_features.name
project = var.project_id
}
resource "google_vertex_ai_feature_group_feature" "days_since_last_purchase" {
name = "days_since_last_purchase"
region = var.region
feature_group = google_vertex_ai_feature_group.customer_features.name
project = var.project_id
}
Each feature maps to a column in your BigQuery table. Registering features enables metadata tracking, drift monitoring, and controlled syncing to the online store.
Feature View (Sync to Online Store)
# feature_store/feature_view.tf
resource "google_vertex_ai_feature_online_store_featureview" "customer_view" {
name = "${var.environment}-customer-view"
region = var.region
feature_online_store = google_vertex_ai_feature_online_store.this.name
project = var.project_id
sync_config {
cron = var.sync_schedule
}
feature_registry_source {
feature_groups {
feature_group_id = google_vertex_ai_feature_group.customer_features.name
feature_ids = [
google_vertex_ai_feature_group_feature.total_purchases.name,
google_vertex_ai_feature_group_feature.avg_order_value.name,
google_vertex_ai_feature_group_feature.days_since_last_purchase.name,
]
}
}
}
The feature view selects which features from which groups sync to the online store. The cron schedule controls how frequently BigQuery data is synced to Bigtable.
π BigQuery Source Table Structure
Your BigQuery table needs an entity ID column and a feature timestamp:
CREATE TABLE `project.ml_features.customer_features` (
customer_id STRING NOT NULL,
feature_timestamp TIMESTAMP NOT NULL,
total_purchases INT64,
avg_order_value FLOAT64,
days_since_last_purchase INT64,
account_age_days INT64,
is_premium BOOL
);
Feature Store reads this table directly. The feature_timestamp column enables point-in-time queries for training. The online store always serves the latest snapshot.
π Read Features (SDK)
Online Store (Real-Time Inference)
from google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
feature_online_store = aiplatform.FeatureOnlineStore("prod-feature-store")
feature_view = feature_online_store.get_feature_view("prod-customer-view")
# Fetch features for a specific customer
response = feature_view.fetch_feature_values(
entity_ids=["cust-12345"],
)
for entity in response:
print(entity.to_dict())
# {'customer_id': 'cust-12345', 'total_purchases': 47, 'avg_order_value': 89.5, ...}
Offline Store (Training via BigQuery)
from google.cloud import bigquery
client = bigquery.Client()
query = """
SELECT customer_id, total_purchases, avg_order_value, is_premium
FROM `project.ml_features.customer_features`
WHERE feature_timestamp BETWEEN '2025-01-01' AND '2025-12-31'
"""
training_df = client.query(query).to_dataframe()
print(f"Training data: {len(training_df)} rows")
No separate offline store to manage. Query BigQuery directly with standard SQL.
π Environment Configuration
# environments/dev.tfvars
environment = "dev"
bigtable_min_nodes = 1
bigtable_max_nodes = 1
bigtable_cpu_target = 80
sync_schedule = "0 */6 * * *" # Every 6 hours
# environments/prod.tfvars
environment = "prod"
bigtable_min_nodes = 1
bigtable_max_nodes = 5
bigtable_cpu_target = 50
sync_schedule = "0 * * * *" # Every hour
Sync frequency vs freshness: Hourly sync means online features can be up to 1 hour stale. For near-real-time features, use continuous data sync (requires Bigtable online serving and BigQuery source in specific regions).
β οΈ Gotchas and Tips
BigQuery is the source of truth. Unlike other feature stores where you ingest data into a proprietary system, Vertex AI Feature Store reads from BigQuery. Your existing ETL pipelines that write to BigQuery already feed the feature store.
Bigtable minimum cost. Even at 1 node, Bigtable costs roughly $0.65/hour (~$470/month). For dev environments, consider whether you need online serving at all, or if BigQuery direct queries suffice.
Optimized online serving is deprecated. As of May 2026, only Bigtable online serving is supported. Don't use optimized {} in new deployments. Migrate existing optimized stores to Bigtable.
Sync latency. Scheduled sync has an inherent delay based on your cron schedule. Continuous sync is near-real-time but only available in specific regions (us, eu, us-central1).
Feature monitoring. Register features through feature groups to enable drift detection and anomaly monitoring. Without registration, you lose this observability.
Bigtable serving latency. Expect ~30ms server-side latency at moderate load (~100 QPS). Client-side latency adds 5ms+. This is fast enough for most inference use cases but not sub-millisecond.
βοΈ What's Next
This is Post 3 of the GCP ML Pipelines & MLOps with Terraform series.
- Post 1: Vertex AI Workbench π¬
- Post 2: Vertex AI Endpoints - Deploy to Prod π
- Post 3: Vertex AI Feature Store (you are here) ποΈ
- Post 4: Vertex AI Pipelines + Cloud Build
Your features have a home. BigQuery for offline training, Bigtable for online serving, automatic sync between them. No data duplication. No training-serving skew. Your existing BigQuery tables are the source of truth, all provisioned with Terraform. ποΈ
Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! π¬
Top comments (0)