Part 5 of CrisisCore Build Log — Start here: https://dev.to/crisiscoresystems/testing-privacy-preserving-analytics-verifying-that-insights-dont-leak-identity-e37
How do you know your simulated distress signals actually correlate with real distress?
We've built elaborate crisis simulations. We generate panic attack navigation patterns. We simulate dissociation inactivity. We create sensory overload preference churning. Our test suites pass beautifully.
If you want privacy-first, offline health tech to exist without surveillance funding it: sponsor the build → https://paintracker.ca/sponser
But here's the uncomfortable question: are we testing against reality, or against our own assumptions about reality?
If our simulations are wrong—if real panic attacks don't actually look like our simulated panic attacks—then our passing tests give us false confidence. We're testing that our system detects what we think crises look like, not what crises actually look like.
This is the meta-problem of simulation testing: validating the validity of our validations.
The Ground Truth Problem
In most testing, ground truth is clear. A button either works or it doesn't. A calculation is either correct or incorrect.
For crisis detection, ground truth is murky:
- We can't ethically induce real crises for testing
- Self-reported crisis states are unreliable (people in crisis aren't filing accurate reports)
- Even clinical assessments happen after the fact, not during the crisis
So how do we validate our simulations against reality we can't directly observe?
interface ValidationChallenge {
// The core epistemological problem
whatWeWant: 'Simulations that match real crisis patterns';
whatWeHave: 'Simulations based on our assumptions about crisis patterns';
theGap: 'No direct observation of real-time crisis states';
// Indirect validation strategies
strategies: [
'Retrospective pattern matching',
'Anonymized aggregate analysis',
'Clinical correlation studies',
'User feedback loops',
'Signal fidelity metrics'
];
}
Retrospective Pattern Matching
We can't watch crises happen, but we can look at sessions that ended with crisis-related outcomes:
interface RetrospectiveValidation {
// Sessions where we KNOW crisis occurred (from user-reported outcomes)
knownCrisisSessions: SessionData[];
// Our simulated crisis sessions
simulatedCrisisSessions: SessionData[];
// How well do they match?
signalOverlap: number; // What % of signals appear in both?
signalDistribution: number; // How similar are the distributions?
temporalPattern: number; // Does timing match?
missingSignals: string[]; // What real signals are we not simulating?
extraSignals: string[]; // What are we simulating that doesn't appear in reality?
}
class RetrospectiveValidator {
async validateSimulations(): Promise<ValidationReport> {
// Get sessions that ended with crisis resource access or app abandonment
// during known difficult times (with user consent for research)
const knownDifficultSessions = await this.getConsentedDifficultSessions();
// Extract signal patterns
const realPatterns = this.extractSignalPatterns(knownDifficultSessions);
// Generate equivalent simulated sessions
const simulatedSessions = this.generateSimulatedSessions(
knownDifficultSessions.length
);
const simulatedPatterns = this.extractSignalPatterns(simulatedSessions);
// Compare
return {
signalOverlap: this.calculateOverlap(realPatterns, simulatedPatterns),
distributionMatch: this.compareDistributions(realPatterns, simulatedPatterns),
missingSignals: this.findMissingSignals(realPatterns, simulatedPatterns),
extraSignals: this.findExtraSignals(realPatterns, simulatedPatterns),
recommendations: this.generateRecommendations(realPatterns, simulatedPatterns)
};
}
private extractSignalPatterns(sessions: SessionData[]): SignalPatternProfile {
const profile: SignalPatternProfile = {
navigationPatterns: {},
inputPatterns: {},
temporalPatterns: {},
featureUsage: {},
exitPatterns: {}
};
for (const session of sessions) {
// Extract navigation entropy
const navEntropy = this.calculateNavigationEntropy(session.navigationEvents);
this.addToDistribution(profile.navigationPatterns, 'entropy', navEntropy);
// Extract input patterns
const inputChaos = this.calculateInputChaos(session.inputEvents);
this.addToDistribution(profile.inputPatterns, 'chaos', inputChaos);
// Extract timing patterns
const avgTimeBetweenActions = this.calculateAvgActionInterval(session);
this.addToDistribution(profile.temporalPatterns, 'interval', avgTimeBetweenActions);
// Extract feature usage
for (const feature of session.featuresAccessed) {
profile.featureUsage[feature] = (profile.featureUsage[feature] || 0) + 1;
}
// Extract exit patterns
this.addToDistribution(profile.exitPatterns, session.exitType, 1);
}
return profile;
}
private findMissingSignals(
real: SignalPatternProfile,
simulated: SignalPatternProfile
): string[] {
const missing: string[] = [];
// Find signals that appear in real data but not in simulations
for (const [signal, frequency] of Object.entries(real.featureUsage)) {
if (!simulated.featureUsage[signal] && frequency > real.totalSessions * 0.1) {
missing.push(`feature:${signal}`);
}
}
// Check for navigation patterns we're not simulating
for (const [pattern, distribution] of Object.entries(real.navigationPatterns)) {
if (!simulated.navigationPatterns[pattern]) {
missing.push(`navigation:${pattern}`);
}
}
return missing;
}
}
Signal Fidelity Scoring
We quantify how well our simulations match reality:
interface SignalFidelityScore {
overall: number; // 0-1 overall match quality
dimensions: {
signalPresence: number; // Do we simulate the right signals?
signalAbsence: number; // Do we correctly NOT simulate false signals?
distributionMatch: number; // Do signal intensities match?
temporalMatch: number; // Does timing match?
sequenceMatch: number; // Do signals occur in realistic order?
};
problematicAreas: {
signal: string;
issue: 'missing' | 'over-represented' | 'wrong-timing' | 'wrong-intensity';
realValue: number;
simulatedValue: number;
impact: 'low' | 'medium' | 'high';
}[];
}
class SignalFidelityAnalyzer {
calculateFidelity(
realPatterns: SignalPatternProfile,
simulatedPatterns: SignalPatternProfile
): SignalFidelityScore {
const presence = this.scoreSignalPresence(realPatterns, simulatedPatterns);
const absence = this.scoreSignalAbsence(realPatterns, simulatedPatterns);
const distribution = this.scoreDistributionMatch(realPatterns, simulatedPatterns);
const temporal = this.scoreTemporalMatch(realPatterns, simulatedPatterns);
const sequence = this.scoreSequenceMatch(realPatterns, simulatedPatterns);
const overall = (
presence * 0.25 +
absence * 0.15 +
distribution * 0.25 +
temporal * 0.20 +
sequence * 0.15
);
return {
overall,
dimensions: {
signalPresence: presence,
signalAbsence: absence,
distributionMatch: distribution,
temporalMatch: temporal,
sequenceMatch: sequence
},
problematicAreas: this.identifyProblems(realPatterns, simulatedPatterns)
};
}
private scoreDistributionMatch(
real: SignalPatternProfile,
simulated: SignalPatternProfile
): number {
// Use Kolmogorov-Smirnov test or similar to compare distributions
const scores: number[] = [];
for (const dimension of ['navigationPatterns', 'inputPatterns', 'temporalPatterns']) {
const realDist = this.normalizeDistribution(real[dimension]);
const simDist = this.normalizeDistribution(simulated[dimension]);
const ksScore = this.kolmogorovSmirnovTest(realDist, simDist);
scores.push(1 - ksScore); // Lower KS = better match
}
return scores.reduce((a, b) => a + b, 0) / scores.length;
}
private identifyProblems(
real: SignalPatternProfile,
simulated: SignalPatternProfile
): SignalFidelityScore['problematicAreas'] {
const problems: SignalFidelityScore['problematicAreas'] = [];
// Check each signal type
const allSignals = new Set([
...Object.keys(real.featureUsage),
...Object.keys(simulated.featureUsage)
]);
for (const signal of allSignals) {
const realFreq = real.featureUsage[signal] || 0;
const simFreq = simulated.featureUsage[signal] || 0;
const realNorm = realFreq / real.totalSessions;
const simNorm = simFreq / simulated.totalSessions;
if (realFreq > 0 && simFreq === 0) {
problems.push({
signal,
issue: 'missing',
realValue: realNorm,
simulatedValue: 0,
impact: realNorm > 0.3 ? 'high' : realNorm > 0.1 ? 'medium' : 'low'
});
} else if (Math.abs(realNorm - simNorm) > 0.2) {
problems.push({
signal,
issue: simNorm > realNorm ? 'over-represented' : 'wrong-intensity',
realValue: realNorm,
simulatedValue: simNorm,
impact: Math.abs(realNorm - simNorm) > 0.4 ? 'high' : 'medium'
});
}
}
return problems;
}
}
False Positive Calibration
Our simulations should produce false positive rates that match reality:
describe('False Positive Calibration', () => {
it('simulated false positive rate matches real-world rate', async () => {
// Get real-world false positive data (from user feedback)
const realWorldData = await getFalsePositiveFeedback();
const realFPR = realWorldData.falsePositives / realWorldData.totalDetections;
// Run simulations designed to NOT trigger crisis
const nonCrisisSimulations = generateNonCrisisSimulations(1000);
const simulatedDetections = nonCrisisSimulations.filter(
sim => crisisDetector.analyze(sim).detectedCrisis !== null
);
const simulatedFPR = simulatedDetections.length / 1000;
// Simulated FPR should be within 20% of real FPR
expect(Math.abs(simulatedFPR - realFPR) / realFPR).toBeLessThan(0.2);
});
it('false positive scenarios match real-world false positive patterns', async () => {
// Get patterns that caused real false positives
const realFalsePositives = await getRealFalsePositivePatterns();
// Check if our simulations include these patterns
const coveredPatterns: string[] = [];
const uncoveredPatterns: string[] = [];
for (const pattern of realFalsePositives) {
const simulation = findMatchingSimulation(pattern);
if (simulation) {
coveredPatterns.push(pattern.type);
} else {
uncoveredPatterns.push(pattern.type);
}
}
// Should cover at least 80% of real false positive patterns
const coverageRate = coveredPatterns.length / realFalsePositives.length;
expect(coverageRate).toBeGreaterThan(0.8);
// Log uncovered patterns for future simulation development
if (uncoveredPatterns.length > 0) {
console.log('Uncovered false positive patterns:', uncoveredPatterns);
}
});
});
class FalsePositiveCalibrator {
async calibrate(): Promise<CalibrationResult> {
const realData = await this.getRealWorldFalsePositiveData();
// Categorize real false positives
const categories = this.categorizePatterns(realData.falsePositives);
// Check simulation coverage for each category
const coverage: Map<string, CoverageReport> = new Map();
for (const [category, patterns] of categories) {
const simulations = this.getSimulationsForCategory(category);
const matchRate = this.calculateMatchRate(patterns, simulations);
coverage.set(category, {
realPatternCount: patterns.length,
simulationCount: simulations.length,
matchRate,
missingPatterns: this.findMissingPatterns(patterns, simulations),
recommendations: this.generateRecommendations(category, matchRate)
});
}
return {
overallCalibration: this.calculateOverallCalibration(coverage),
categoryReports: coverage,
prioritizedActions: this.prioritizeActions(coverage)
};
}
private categorizePatterns(falsePositives: FalsePositiveRecord[]): Map<string, Pattern[]> {
const categories = new Map<string, Pattern[]>();
for (const fp of falsePositives) {
const category = this.inferCategory(fp);
if (!categories.has(category)) {
categories.set(category, []);
}
categories.get(category)!.push(fp.pattern);
}
return categories;
}
private inferCategory(fp: FalsePositiveRecord): string {
// Categorize based on what caused the false positive
if (fp.userExplanation?.includes('new user')) return 'onboarding_exploration';
if (fp.userExplanation?.includes('accident')) return 'accidental_input';
if (fp.pattern.navigationEntropy > 0.7) return 'legitimate_fast_navigation';
if (fp.pattern.inactivityDuration > 300000) return 'intentional_pause';
if (fp.pattern.preferenceChanges > 5) return 'customization_session';
return 'unknown';
}
}
Missing Signal Detection
The scariest gap: signals we're not even looking for.
interface MissingSignalDetector {
// What signals appear in real crisis sessions that we don't simulate?
findUnknownSignals(realSessions: SessionData[]): UnknownSignal[];
// What patterns correlate with crisis outcomes that we haven't modeled?
discoverNewPatterns(realSessions: SessionData[]): DiscoveredPattern[];
}
class SignalDiscovery {
async discoverMissingSignals(): Promise<DiscoveryReport> {
const realCrisisSessions = await this.getConsentedCrisisSessions();
const realNonCrisisSessions = await this.getConsentedNonCrisisSessions();
// Extract ALL possible features from sessions
const allFeatures = this.extractAllFeatures(realCrisisSessions);
// Find features that differentiate crisis from non-crisis
const discriminativeFeatures = this.findDiscriminativeFeatures(
realCrisisSessions,
realNonCrisisSessions,
allFeatures
);
// Check which discriminative features we're already modeling
const modeledFeatures = this.getModeledFeatures();
const missingFeatures = discriminativeFeatures.filter(
f => !modeledFeatures.includes(f.name)
);
return {
discoveredSignals: missingFeatures,
signalImportance: this.rankByImportance(missingFeatures),
implementationPriority: this.prioritizeImplementation(missingFeatures),
exampleSessions: this.findExampleSessions(missingFeatures, realCrisisSessions)
};
}
private extractAllFeatures(sessions: SessionData[]): string[] {
const features = new Set<string>();
for (const session of sessions) {
// Standard features
features.add(`nav_entropy_${this.bucketize(session.navigationEntropy)}`);
features.add(`session_duration_${this.bucketize(session.duration)}`);
features.add(`input_rate_${this.bucketize(session.inputRate)}`);
// Derive new potential features
features.add(`time_of_day_${this.getTimeOfDayBucket(session.startTime)}`);
features.add(`day_of_week_${new Date(session.startTime).getDay()}`);
features.add(`session_since_last_${this.bucketize(session.timeSinceLastSession)}`);
features.add(`consecutive_days_${session.consecutiveDaysActive}`);
// Behavioral sequences
const sequences = this.extractActionSequences(session.actions);
for (const seq of sequences) {
features.add(`seq_${seq}`);
}
// Feature co-occurrence
for (const f1 of session.featuresUsed) {
for (const f2 of session.featuresUsed) {
if (f1 < f2) {
features.add(`cooccur_${f1}_${f2}`);
}
}
}
}
return Array.from(features);
}
private findDiscriminativeFeatures(
crisisSessions: SessionData[],
nonCrisisSessions: SessionData[],
allFeatures: string[]
): DiscriminativeFeature[] {
const results: DiscriminativeFeature[] = [];
for (const feature of allFeatures) {
const crisisRate = this.calculateFeatureRate(feature, crisisSessions);
const nonCrisisRate = this.calculateFeatureRate(feature, nonCrisisSessions);
// Calculate discrimination power
const rateRatio = crisisRate / (nonCrisisRate + 0.001);
const absoluteDiff = Math.abs(crisisRate - nonCrisisRate);
// Feature is discriminative if it appears much more (or less) in crisis sessions
if (rateRatio > 2 || rateRatio < 0.5 || absoluteDiff > 0.2) {
results.push({
name: feature,
crisisRate,
nonCrisisRate,
discriminationPower: Math.log(rateRatio),
statisticalSignificance: this.calculateSignificance(
crisisRate, crisisSessions.length,
nonCrisisRate, nonCrisisSessions.length
)
});
}
}
return results.sort((a, b) =>
Math.abs(b.discriminationPower) - Math.abs(a.discriminationPower)
);
}
}
Continuous Validation Loop
We build an ongoing process of simulation improvement:
interface ContinuousValidationLoop {
// Daily: Check simulation fidelity against recent real data
dailyFidelityCheck(): SignalFidelityScore;
// Weekly: Run false positive calibration
weeklyCalibration(): CalibrationResult;
// Monthly: Discover new signals from accumulated data
monthlySignalDiscovery(): DiscoveryReport;
// Quarterly: Full retrospective validation
quarterlyValidation(): ValidationReport;
}
class ValidationOrchestrator {
async runContinuousValidation(): Promise<void> {
// Schedule recurring validations
schedule.daily(async () => {
const fidelity = await this.checkFidelity();
if (fidelity.overall < 0.7) {
await this.alertTeam('Simulation fidelity dropped below threshold', fidelity);
}
await this.logMetric('simulation_fidelity', fidelity.overall);
});
schedule.weekly(async () => {
const calibration = await this.calibrateFalsePositives();
// Auto-generate test cases for uncovered patterns
for (const uncovered of calibration.uncoveredPatterns) {
await this.generateTestCase(uncovered);
await this.createSimulationTicket(uncovered);
}
});
schedule.monthly(async () => {
const discovery = await this.discoverNewSignals();
// Prioritize implementation of discovered signals
for (const signal of discovery.prioritizedSignals.slice(0, 5)) {
await this.createImplementationTicket(signal);
}
});
}
private async generateTestCase(pattern: UncoveredPattern): Promise<void> {
const testCase = `
it('detects ${pattern.name} pattern', () => {
const session = createSessionWithPattern({
${pattern.features.map(f => `${f.name}: ${f.exampleValue}`).join(',\n ')}
});
const result = crisisDetector.analyze(session);
// TODO: Determine expected behavior
// Pattern was associated with: ${pattern.associatedOutcome}
expect(result).toMatchExpectedBehavior();
});
`;
await this.addToTestBacklog(testCase, pattern);
}
}
The Validation Dashboard
We surface validation metrics for continuous monitoring:
function SimulationValidationDashboard() {
const [fidelityHistory, setFidelityHistory] = useState<FidelityDataPoint[]>([]);
const [latestDiscovery, setLatestDiscovery] = useState<DiscoveryReport | null>(null);
const [calibrationStatus, setCalibrationStatus] = useState<CalibrationResult | null>(null);
return (
<div className="p-6 space-y-8">
<h1 className="text-2xl font-bold">Simulation Validation Status</h1>
{/* Fidelity Trend */}
<section>
<h2 className="text-lg font-semibold mb-4">Simulation Fidelity Over Time</h2>
<FidelityTrendChart data={fidelityHistory} />
<div className="grid grid-cols-5 gap-4 mt-4">
{['signalPresence', 'signalAbsence', 'distributionMatch', 'temporalMatch', 'sequenceMatch'].map(dim => (
<MetricCard
key={dim}
label={formatDimensionLabel(dim)}
value={fidelityHistory[fidelityHistory.length - 1]?.dimensions[dim]}
trend={calculateTrend(fidelityHistory, dim)}
/>
))}
</div>
</section>
{/* Missing Signals Alert */}
{latestDiscovery?.discoveredSignals.length > 0 && (
<section className="bg-yellow-50 border border-yellow-200 rounded-lg p-4">
<h2 className="text-lg font-semibold text-yellow-800 mb-2">
⚠️ Discovered Unmolded Signals
</h2>
<p className="text-yellow-700 mb-4">
The following signals appear in real crisis sessions but aren't in our simulations:
</p>
<ul className="space-y-2">
{latestDiscovery.discoveredSignals.slice(0, 5).map(signal => (
<li key={signal.name} className="flex justify-between">
<span className="font-mono text-sm">{signal.name}</span>
<span className="text-yellow-600">
{(signal.crisisRate * 100).toFixed(1)}% of crisis sessions
</span>
</li>
))}
</ul>
</section>
)}
{/* Calibration Status */}
<section>
<h2 className="text-lg font-semibold mb-4">False Positive Calibration</h2>
<div className="grid grid-cols-3 gap-4">
<MetricCard
label="Real-World FPR"
value={`${(calibrationStatus?.realWorldFPR * 100).toFixed(1)}%`}
/>
<MetricCard
label="Simulated FPR"
value={`${(calibrationStatus?.simulatedFPR * 100).toFixed(1)}%`}
/>
<MetricCard
label="Calibration Delta"
value={`${(Math.abs(calibrationStatus?.delta) * 100).toFixed(1)}%`}
status={calibrationStatus?.delta < 0.05 ? 'good' : 'warning'}
/>
</div>
</section>
{/* Action Items */}
<section>
<h2 className="text-lg font-semibold mb-4">Validation Action Items</h2>
<ul className="space-y-2">
{generateActionItems(fidelityHistory, latestDiscovery, calibrationStatus).map((item, i) => (
<li key={i} className="flex items-center gap-2">
<PriorityBadge priority={item.priority} />
<span>{item.description}</span>
</li>
))}
</ul>
</section>
</div>
);
}
Testing the Tests
We write tests for our validation framework itself:
describe('Validation Framework Tests', () => {
describe('Signal Fidelity Calculator', () => {
it('correctly identifies missing signals', () => {
const realPatterns: SignalPatternProfile = {
featureUsage: {
'help_page': 50,
'crisis_button': 30,
'unknown_pattern': 25
},
totalSessions: 100
};
const simulatedPatterns: SignalPatternProfile = {
featureUsage: {
'help_page': 45,
'crisis_button': 28
// missing 'unknown_pattern'
},
totalSessions: 100
};
const analyzer = new SignalFidelityAnalyzer();
const result = analyzer.calculateFidelity(realPatterns, simulatedPatterns);
expect(result.problematicAreas).toContainEqual(
expect.objectContaining({
signal: 'unknown_pattern',
issue: 'missing'
})
);
});
});
describe('False Positive Calibrator', () => {
it('detects when simulation FPR diverges from reality', async () => {
// Mock real-world FPR of 5%
const mockRealWorldData = {
totalDetections: 1000,
falsePositives: 50
};
// Simulations producing 15% FPR (3x too high)
const mockSimulations = Array(1000).fill(null).map((_, i) => ({
triggeredCrisis: i < 150 // 15% trigger
}));
const calibrator = new FalsePositiveCalibrator();
const result = await calibrator.compare(mockRealWorldData, mockSimulations);
expect(result.isCalibrated).toBe(false);
expect(result.recommendation).toContain('reduce simulation sensitivity');
});
});
describe('Signal Discovery', () => {
it('finds discriminative features', () => {
const crisisSessions = generateSessionsWithFeature('late_night_usage', 0.7);
const nonCrisisSessions = generateSessionsWithFeature('late_night_usage', 0.1);
const discovery = new SignalDiscovery();
const result = discovery.findDiscriminativeFeatures(
crisisSessions,
nonCrisisSessions,
['late_night_usage', 'normal_feature']
);
expect(result[0].name).toBe('late_night_usage');
expect(result[0].discriminationPower).toBeGreaterThan(1);
});
});
});
Conclusion
Testing crisis simulations is meta-work that feels removed from "real" development. But it's some of the most important work we do. Every simulation that doesn't match reality is a crisis we might miss or a false alarm that erodes user trust.
Key takeaways:
Ground truth is indirect: We can't observe crises directly, so we validate through retrospective analysis and outcome correlation.
Quantify fidelity: Signal fidelity scores give us a measurable target for simulation quality.
Calibrate false positives: Our simulated FPR should match real-world FPR.
Hunt for missing signals: The signals we're NOT simulating are the scariest gaps.
Build continuous loops: Validation isn't a one-time activity—it's an ongoing process of refinement.
Test the tests: Our validation framework itself needs tests.
The goal isn't perfect simulation—that's impossible. The goal is knowable imperfection: understanding exactly where our simulations diverge from reality so we can make informed decisions about acceptable risk.
This is Part 10 of our series on building trauma-informed healthcare applications. Previous posts covered crisis detection, testing strategies, recovery testing, cultural contexts, and cross-crisis calibration.
Coming Next: "Testing for Co-Occurrence: When Multiple Crises Happen Simultaneously"
Tags: #validation #simulation #testing #healthcare #trauma-informed #quality-assurance #machine-learning #ground-truth
Support this work
- Sponsor the project (primary): https://paintracker.ca/sponser
- Star the repo (secondary): https://github.com/CrisisCore-Systems/pain-tracker
- Read the full series from the start: https://dev.to/crisiscoresystems/testing-privacy-preserving-analytics-verifying-that-insights-dont-leak-identity-e37
Top comments (0)