Suhas Mallesh

Posted on Apr 17

Azure ML Feature Store with Terraform: Managed Feature Materialization for Training and Inference 🗃️

#azure #machinelearning #ai #devops

Azure ML Feature Store is a specialized workspace that manages feature engineering, offline materialization to storage, and online serving with Redis. Terraform provisions the infrastructure, SDK defines feature sets. Here's how to build it.

In the previous posts, we set up the ML workspace and deployed endpoints. Now we need consistent features feeding those endpoints. Training uses historical features from batch sources. Inference needs the latest values in real time. When these diverge, your model's accuracy degrades silently.

Azure ML Feature Store is implemented as a special type of Azure ML workspace (kind = "FeatureStore"). It manages feature transformation pipelines, materializes features to offline storage (ADLS/Blob) and an online store (Redis), and provides point-in-time feature retrieval for training. Terraform provisions the infrastructure; the SDK defines entities, feature sets, and materialization schedules. 🎯

🏗️ Feature Store Architecture

Component	What It Does
Feature Store	Specialized ML workspace with `kind = "FeatureStore"`
Entity	Logical key (e.g., customer_id, account_id) shared across feature sets
Feature Set	Collection of features with transformation code and source definition
Offline Store	ADLS/Blob storage for materialized historical features
Online Store	Redis cache for low-latency inference lookups
Materialization	Spark jobs that compute and sync features on a schedule

The key concept: feature sets include transformation code. Raw data goes in, computed features come out. The same transformation runs for both offline materialization (training) and online materialization (inference), eliminating training-serving skew.

🔧 Terraform: Provision Feature Store Infrastructure

Feature Store Workspace

# feature_store/workspace.tf

resource "azurerm_machine_learning_workspace" "feature_store" {
  name                = "${var.environment}-feature-store"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  application_insights_id = azurerm_application_insights.ml.id
  key_vault_id            = azurerm_key_vault.ml.id
  storage_account_id      = azurerm_storage_account.ml.id

  kind = "FeatureStore"

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

kind = "FeatureStore" is the critical setting. This creates a workspace optimized for feature management rather than general ML development.

Offline Materialization Store

# feature_store/offline_store.tf

resource "azurerm_storage_account" "offline_store" {
  name                     = "${var.environment}fsoffline${random_string.suffix.result}"
  location                 = azurerm_resource_group.ml.location
  resource_group_name      = azurerm_resource_group.ml.name
  account_tier             = "Standard"
  account_replication_type = var.storage_replication
  is_hns_enabled           = true   # ADLS Gen2

  tags = var.tags
}

resource "azurerm_storage_container" "features" {
  name                  = "features"
  storage_account_id    = azurerm_storage_account.offline_store.id
  container_access_type = "private"
}

is_hns_enabled = true enables ADLS Gen2 hierarchical namespace, which is required for efficient feature materialization with Parquet files.

Online Store (Redis Cache)

# feature_store/online_store.tf

resource "azurerm_redis_cache" "online_store" {
  count               = var.enable_online_store ? 1 : 0
  name                = "${var.environment}-fs-redis"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  capacity            = var.redis_capacity
  family              = var.redis_family
  sku_name            = var.redis_sku
  minimum_tls_version = "1.2"

  redis_configuration {
    maxmemory_policy = "allkeys-lru"
  }

  tags = var.tags
}

The online store is optional. Enable it when you need low-latency feature lookups during inference. Skip it in dev if you only need offline features for training.

Compute for Materialization

# feature_store/compute.tf

resource "azurerm_machine_learning_compute_cluster" "materialization" {
  name                          = "${var.environment}-materialization"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.feature_store.id
  location                      = azurerm_resource_group.ml.location
  vm_size                       = var.materialization_vm_size
  vm_priority                   = "LowPriority"

  identity {
    type = "SystemAssigned"
  }

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = var.materialization_max_nodes
    scale_down_nodes_after_idle_duration  = "PT5M"
  }

  tags = var.tags
}

Materialization jobs run as Spark pipelines on this compute cluster. min_node_count = 0 means you pay nothing when no materialization is running.

🐍 Define Entities and Feature Sets (SDK)

Terraform provisions infrastructure. The SDK defines the feature engineering logic:

Create an Entity

from azure.ai.ml import MLClient
from azure.ai.ml.entities import FeatureStoreEntity, DataColumn
from azure.identity import DefaultAzureCredential

fs_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="...",
    resource_group_name="...",
    workspace_name="prod-feature-store",
)

account_entity = FeatureStoreEntity(
    name="account",
    version="1",
    index_columns=[DataColumn(name="accountID", type="string")],
    description="Account entity for transaction features",
)

fs_client.feature_store_entities.begin_create_or_update(account_entity).result()

Entities define shared join keys. Multiple feature sets can reference the same entity, ensuring consistent joins.

Define Feature Set with Transformation Code

Feature set specification (YAML):

# featuresets/transactions/spec/FeaturesetSpec.yaml
$schema: https://azuremlschemas.azureedge.net/latest/featureSetSpec.schema.json

source:
  type: parquet
  path: abfss://data@storage.dfs.core.windows.net/transactions/
  timestamp_column:
    name: timestamp

feature_transformation_code:
  path: ./transformation_code
  transformer_class: transaction_transform.TransactionFeatureTransformer

features:
  - name: transaction_count_7d
    type: integer
  - name: avg_transaction_amount_7d
    type: float
  - name: total_spend_3d
    type: float
  - name: max_transaction_amount
    type: float

index_columns:
  - name: accountID
    type: string

Transformation code (Spark):

# transformation_code/transaction_transform.py
from pyspark.sql import DataFrame
from pyspark.sql import functions as F
from pyspark.sql.window import Window

class TransactionFeatureTransformer:
    def transform(self, raw_data: DataFrame) -> DataFrame:
        window_7d = Window.partitionBy("accountID").orderBy("timestamp").rangeBetween(-7*86400, 0)
        window_3d = Window.partitionBy("accountID").orderBy("timestamp").rangeBetween(-3*86400, 0)

        return raw_data.select(
            "accountID",
            "timestamp",
            F.count("*").over(window_7d).alias("transaction_count_7d"),
            F.avg("amount").over(window_7d).alias("avg_transaction_amount_7d"),
            F.sum("amount").over(window_3d).alias("total_spend_3d"),
            F.max("amount").over(window_7d).alias("max_transaction_amount"),
        )

Register and Materialize

from azure.ai.ml.entities import FeatureSet, FeatureSetSpecification

transaction_fset = FeatureSet(
    name="transactions",
    version="1",
    description="7-day and 3-day rolling transaction aggregations",
    entities=["azureml:account:1"],
    specification=FeatureSetSpecification(
        path="./featuresets/transactions/spec"
    ),
    tags={"data_type": "nonPII"},
)

fs_client.feature_sets.begin_create_or_update(transaction_fset).result()

Configure Materialization Schedule

from azure.ai.ml.entities import (
    MaterializationSettings,
    MaterializationComputeResource,
    RecurrenceTrigger,
)

materialization = MaterializationSettings(
    resource=MaterializationComputeResource(instance_type="Standard_E8s_v3"),
    schedule=RecurrenceTrigger(frequency="Hour", interval=6),
    offline_enabled=True,
    online_enabled=True,
)

fset = fs_client.feature_sets.get(name="transactions", version="1")
fset.materialization_settings = materialization
fs_client.feature_sets.begin_create_or_update(fset).result()

📐 Environment Configuration

# environments/dev.tfvars
environment              = "dev"
enable_online_store      = false        # No Redis in dev
storage_replication      = "LRS"
materialization_vm_size  = "Standard_E4s_v3"
materialization_max_nodes = 2

# environments/prod.tfvars
environment              = "prod"
enable_online_store      = true
redis_sku                = "Standard"
redis_capacity           = 1
redis_family             = "C"
storage_replication      = "GRS"
materialization_vm_size  = "Standard_E8s_v3"
materialization_max_nodes = 8

⚠️ Gotchas and Tips

Feature store is a workspace. It's implemented as kind = "FeatureStore" on azurerm_machine_learning_workspace. It needs the same dependencies (storage, KV, App Insights) as a regular workspace.

Transformation code runs as Spark. Feature transformations execute on the materialization compute cluster using PySpark. Test your transformations locally with a Spark session before registering.

Entities enforce consistent joins. Define entities once (e.g., "account" with key "accountID") and reuse across feature sets. This prevents mismatched join keys between teams.

Materialization costs. Each scheduled run spins up the compute cluster, runs the Spark job, and writes to storage. LowPriority VMs reduce cost. min_node_count = 0 ensures you pay nothing between runs.

Redis cost for online store. Standard Redis starts at ~$40/month. Premium with replication is ~$200/month. Skip online store in dev unless you're testing real-time inference.

Feature set versioning. Feature sets are versioned. Changing the transformation logic? Create version "2". This maintains backward compatibility for models still using version "1".

⏭️ What's Next

This is Post 3 of the Azure ML Pipelines & MLOps with Terraform series.

Post 1: Azure ML Workspace 🔬
Post 2: Azure ML Online Endpoints 🚀
Post 3: Azure ML Feature Store (you are here) 🗃️
Post 4: Azure ML Pipelines + Azure DevOps

Your features have a home. ADLS for offline training, Redis for online inference, Spark transformations that run the same code for both. No training-serving skew. Versioned feature sets with scheduled materialization, all provisioned with Terraform. 🗃️

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

DEV Community