DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Feature Store Bootstrap

Feature Store Bootstrap

Production-ready Feast feature store setup with offline and online serving. Stop re-computing features across teams — define once, serve everywhere. This kit gives you feature definitions, materialization pipelines, and serving infrastructure that works from day one.

Key Features

  • Feast project scaffolding — complete repository structure with registry, store config, and feature definitions
  • Offline feature serving — batch retrieval from data warehouses for training dataset generation
  • Online feature serving — low-latency Redis-backed serving for real-time inference
  • Feature engineering pipelines — reusable transformations with point-in-time correctness
  • Data quality validation — Great Expectations integration for feature value monitoring
  • Entity management — pre-built entity definitions for common domains (user, product, transaction)
  • Materialization automation — scheduled jobs to push features from offline to online stores

Quick Start

# 1. Copy the config
cp config.example.yaml config.yaml

# 2. Initialize the Feast project
feast init my_feature_store
cp templates/feature_repo/* my_feature_store/

# 3. Apply feature definitions
cd my_feature_store && feast apply

# 4. Materialize features to online store
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
Enter fullscreen mode Exit fullscreen mode
"""Retrieve features for model training."""
from feast import FeatureStore
import pandas as pd

store = FeatureStore(repo_path="my_feature_store/")

# Entity dataframe — what you want features for
entity_df = pd.DataFrame({
    "user_id": [1001, 1002, 1003],
    "event_timestamp": pd.to_datetime(["2026-01-15"] * 3),
})

# Fetch training data with point-in-time join
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_profile:age",
        "user_profile:signup_days",
        "user_activity:purchase_count_7d",
        "user_activity:avg_session_minutes",
    ],
).to_df()

print(training_df.head())
Enter fullscreen mode Exit fullscreen mode

Architecture

feature-store-bootstrap/
├── config.example.yaml              # Feast project + infra configuration
├── templates/
│   ├── feature_repo/
│   │   ├── feature_store.yaml       # Feast store config (provider, registry, online store)
│   │   ├── entities.py              # Entity definitions (user, product, etc.)
│   │   ├── feature_views.py         # Feature view definitions with schemas
│   │   ├── on_demand_features.py    # Real-time computed features
│   │   └── data_sources.py          # FileSource, BigQuerySource, etc.
│   ├── pipelines/
│   │   ├── materialization.py       # Offline → online materialization job
│   │   └── feature_engineering.py   # Raw data → feature transformations
│   └── validation/
│       └── feature_quality.py       # Data quality checks
├── docs/
│   └── overview.md
└── examples/
    ├── training_retrieval.py
    └── online_serving.py
Enter fullscreen mode Exit fullscreen mode

Data flows from raw sources → feature engineering → offline store (warehouse) → materialization → online store (Redis). Training reads from offline; inference reads from online.

Usage Examples

Online Feature Retrieval (Inference)

from feast import FeatureStore

store = FeatureStore(repo_path="my_feature_store/")

# Low-latency lookup for real-time inference
features = store.get_online_features(
    features=[
        "user_profile:age",
        "user_activity:purchase_count_7d",
    ],
    entity_rows=[{"user_id": 1001}],
).to_dict()

# Feed directly to model
prediction = model.predict([
    features["age"][0],
    features["purchase_count_7d"][0],
])
Enter fullscreen mode Exit fullscreen mode

On-Demand Feature Transforms

from feast import on_demand_feature_view, Field
from feast.types import Float32
import pandas as pd

@on_demand_feature_view(
    sources=["user_activity"],
    schema=[Field(name="activity_score", dtype=Float32)],
)
def user_activity_score(inputs: pd.DataFrame) -> pd.DataFrame:
    """Compute activity score at request time."""
    df = pd.DataFrame()
    df["activity_score"] = (
        inputs["purchase_count_7d"] * 0.6
        + inputs["avg_session_minutes"] * 0.4
    )
    return df
Enter fullscreen mode Exit fullscreen mode

Configuration

# config.example.yaml
project: my_feature_store
provider: local                    # local | gcp | aws

registry:
  path: data/registry.db           # Feature metadata store

online_store:
  type: redis                      # redis | sqlite | dynamodb
  connection_string: "localhost:6379"

offline_store:
  type: file                       # file | bigquery | redshift

entity_key_serialization_version: 2

materialization:
  schedule: "0 */4 * * *"          # Every 4 hours
  incremental: true                # Only process new data
  ttl_days: 30                     # Drop features older than 30 days
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Use point-in-time joins for training data — avoid data leakage by always specifying event_timestamp in entity dataframes
  2. Keep feature views narrow — group related features; don't create one giant view with 200 columns
  3. Set TTLs on all feature views — prevent stale data from being served in production
  4. Monitor materialization lag — alert if online store data is more than 2x your materialization interval behind
  5. Version feature definitions — treat feature_views.py like a schema migration; review changes in PRs

Troubleshooting

Problem Cause Fix
feast apply fails with registry error Corrupt or missing registry DB Delete data/registry.db and re-run feast apply
Online features returning None Features not materialized yet Run feast materialize-incremental and verify data source has records
Point-in-time join returns NaN Entity timestamps outside feature TTL Increase ttl in feature view or check timestamp alignment
Redis connection refused Online store not running Start Redis with redis-server or check connection_string in config

This is 1 of 10 resources in the ML Starter Kit toolkit. Get the complete [Feature Store Bootstrap] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire ML Starter Kit bundle (10 products) for $149 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)