Feature Store Bootstrap
Production-ready Feast feature store setup with offline and online serving. Stop re-computing features across teams — define once, serve everywhere. This kit gives you feature definitions, materialization pipelines, and serving infrastructure that works from day one.
Key Features
- Feast project scaffolding — complete repository structure with registry, store config, and feature definitions
- Offline feature serving — batch retrieval from data warehouses for training dataset generation
- Online feature serving — low-latency Redis-backed serving for real-time inference
- Feature engineering pipelines — reusable transformations with point-in-time correctness
- Data quality validation — Great Expectations integration for feature value monitoring
- Entity management — pre-built entity definitions for common domains (user, product, transaction)
- Materialization automation — scheduled jobs to push features from offline to online stores
Quick Start
# 1. Copy the config
cp config.example.yaml config.yaml
# 2. Initialize the Feast project
feast init my_feature_store
cp templates/feature_repo/* my_feature_store/
# 3. Apply feature definitions
cd my_feature_store && feast apply
# 4. Materialize features to online store
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
"""Retrieve features for model training."""
from feast import FeatureStore
import pandas as pd
store = FeatureStore(repo_path="my_feature_store/")
# Entity dataframe — what you want features for
entity_df = pd.DataFrame({
"user_id": [1001, 1002, 1003],
"event_timestamp": pd.to_datetime(["2026-01-15"] * 3),
})
# Fetch training data with point-in-time join
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"user_profile:age",
"user_profile:signup_days",
"user_activity:purchase_count_7d",
"user_activity:avg_session_minutes",
],
).to_df()
print(training_df.head())
Architecture
feature-store-bootstrap/
├── config.example.yaml # Feast project + infra configuration
├── templates/
│ ├── feature_repo/
│ │ ├── feature_store.yaml # Feast store config (provider, registry, online store)
│ │ ├── entities.py # Entity definitions (user, product, etc.)
│ │ ├── feature_views.py # Feature view definitions with schemas
│ │ ├── on_demand_features.py # Real-time computed features
│ │ └── data_sources.py # FileSource, BigQuerySource, etc.
│ ├── pipelines/
│ │ ├── materialization.py # Offline → online materialization job
│ │ └── feature_engineering.py # Raw data → feature transformations
│ └── validation/
│ └── feature_quality.py # Data quality checks
├── docs/
│ └── overview.md
└── examples/
├── training_retrieval.py
└── online_serving.py
Data flows from raw sources → feature engineering → offline store (warehouse) → materialization → online store (Redis). Training reads from offline; inference reads from online.
Usage Examples
Online Feature Retrieval (Inference)
from feast import FeatureStore
store = FeatureStore(repo_path="my_feature_store/")
# Low-latency lookup for real-time inference
features = store.get_online_features(
features=[
"user_profile:age",
"user_activity:purchase_count_7d",
],
entity_rows=[{"user_id": 1001}],
).to_dict()
# Feed directly to model
prediction = model.predict([
features["age"][0],
features["purchase_count_7d"][0],
])
On-Demand Feature Transforms
from feast import on_demand_feature_view, Field
from feast.types import Float32
import pandas as pd
@on_demand_feature_view(
sources=["user_activity"],
schema=[Field(name="activity_score", dtype=Float32)],
)
def user_activity_score(inputs: pd.DataFrame) -> pd.DataFrame:
"""Compute activity score at request time."""
df = pd.DataFrame()
df["activity_score"] = (
inputs["purchase_count_7d"] * 0.6
+ inputs["avg_session_minutes"] * 0.4
)
return df
Configuration
# config.example.yaml
project: my_feature_store
provider: local # local | gcp | aws
registry:
path: data/registry.db # Feature metadata store
online_store:
type: redis # redis | sqlite | dynamodb
connection_string: "localhost:6379"
offline_store:
type: file # file | bigquery | redshift
entity_key_serialization_version: 2
materialization:
schedule: "0 */4 * * *" # Every 4 hours
incremental: true # Only process new data
ttl_days: 30 # Drop features older than 30 days
Best Practices
-
Use point-in-time joins for training data — avoid data leakage by always specifying
event_timestampin entity dataframes - Keep feature views narrow — group related features; don't create one giant view with 200 columns
- Set TTLs on all feature views — prevent stale data from being served in production
- Monitor materialization lag — alert if online store data is more than 2x your materialization interval behind
-
Version feature definitions — treat
feature_views.pylike a schema migration; review changes in PRs
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
feast apply fails with registry error |
Corrupt or missing registry DB | Delete data/registry.db and re-run feast apply
|
Online features returning None
|
Features not materialized yet | Run feast materialize-incremental and verify data source has records |
| Point-in-time join returns NaN | Entity timestamps outside feature TTL | Increase ttl in feature view or check timestamp alignment |
| Redis connection refused | Online store not running | Start Redis with redis-server or check connection_string in config |
This is 1 of 10 resources in the ML Starter Kit toolkit. Get the complete [Feature Store Bootstrap] with all files, templates, and documentation for $39.
Or grab the entire ML Starter Kit bundle (10 products) for $149 — save 30%.
Top comments (0)