Turning 120+ Data Points Into One AI-Written Portrait — How the Engine Works

#ai #machinelearning #webdev

Most personality tools give you a type label and a paragraph. MBTI says you're INFJ, Big Five gives you a percentile, and that's it. We wanted something different with Origin Of You — a single written portrait that reads like it was written by someone who actually knows you, synthesized from five separate systems and over 120 data points.

This post walks through the engine architecture: how we collect, normalize, and orchestrate five frameworks into one coherent output.

The Five Systems and Their Data Shapes

Each framework produces a fundamentally different data structure:

MBTI — 4 binary axes (E/I, S/N, T/F, J/P) plus a strength score per axis. 8 values total. Good for cognitive function stack ordering, weak on emotional granularity.

Big Five (OCEAN) — 5 continuous scores from 0–100. Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism. Statistically robust but cold — nobody reads "Agreeableness: 72" and feels understood.

Enneagram — primary type (1–9) + wing + instinctual variant (sp/sx/so). Categorical, not continuous. Strong on motivation patterns, completely silent on cognitive style.

Human Design — type, authority, profile, defined/undefined centers. This one is the most structurally complex: the chart encodes 9 centers with 36 channels and 64 gates. We extract roughly 40 data points from a single chart.

Astrology (natal chart) — sun, moon, rising + planetary positions across 12 houses. Another 30+ data points. Adds a temporal-archetypal layer the other systems ignore entirely.

The total lands between 120 and 140 data points depending on the chart configuration. The challenge isn't collecting them — it's making them talk to each other.

The Normalization Layer

Raw outputs from five systems are incommensurable. You can't average a Big Five percentile with an Enneagram type. So we built a normalization layer that maps everything onto shared semantic dimensions.

We defined 12 internal dimensions: things like relational orientation, decision-making style, energy management, conflict response, creative expression. Each framework contributes to a subset of these dimensions through mapping functions we wrote by hand.

Example: the "conflict response" dimension pulls from Big Five Agreeableness + Neuroticism (weighted), Enneagram type (types 8, 9, and 1 map differently), MBTI's T/F axis, and Human Design authority type. The mapping isn't a formula — it's a lookup table with interpolation, built from cross-referencing the source literature of each framework.

This gives us a 12-dimensional vector that captures what all five systems agree on, where they diverge, and where the interesting tensions sit.

Prompt Orchestration: Not One Prompt, Three

We don't send 120 data points into a single prompt and pray. The portrait generation runs in three sequential stages:

Stage 1 — Synthesis memo. The normalized 12-dimension vector goes in, along with the raw framework results. The prompt asks the model to identify the 3–5 most defining patterns and any contradictions between systems. Output: a structured memo, roughly 400 tokens. This is never shown to the user.

Stage 2 — Portrait draft. The synthesis memo goes in as context, along with a style guide and the user's name. The prompt generates the actual portrait text: ~800–1200 words, second person ("you tend to..."), organized around the patterns identified in stage 1. No bullet points, no type labels in the output. The goal is prose that reads like a letter, not a report.

Stage 3 — Coherence check. The draft goes back through a review prompt that checks for: internal contradictions, unsupported claims (statements not grounded in any of the five systems), and tonal consistency. If the check flags issues, stage 2 reruns with the flags as constraints.

The three-stage approach costs more tokens than a single prompt, but the output quality difference is substantial. Single-prompt portraits tend to list framework results sequentially — "Your MBTI indicates... Your Enneagram suggests..." — instead of weaving them into a unified picture. The synthesis memo forces integration before writing begins.

Why Not Fine-Tune?

We considered fine-tuning on example portraits but decided against it for two reasons.

First, the input space is enormous. 120+ data points across five frameworks means the number of meaningfully distinct personality configurations is astronomical. A fine-tuned model would need thousands of high-quality example portraits to generalize well, and we don't have them.

Second, prompt orchestration lets us update the style and structure without retraining. When we adjusted the portrait tone from clinical to conversational (based on user feedback in the first two weeks), it was a prompt edit, not a training run. Same when we added a section on "where systems disagree about you" — that was a stage 1 prompt change that cascaded through stages 2 and 3 automatically.

What We Learned

Cross-system contradictions are the most interesting part. When Big Five says high Agreeableness but Enneagram says type 8, that tension is where the real insight lives. Early versions of the engine smoothed over contradictions. Current version highlights them explicitly — and users consistently flag these sections as the most accurate.

Astrology data is noisy but users care about it. From a pure signal perspective, MBTI + Big Five + Enneagram would cover most of the personality space. But users who complete the astrology inputs engage with the portrait 2.4x longer (measured by scroll depth and time on page). The data may be noisy, but the framing resonates.

The normalization layer is the bottleneck, not the LLM. Getting the mapping functions right took longer than any other part of the build. Each framework has its own theoretical tradition, and the mapping has to respect those traditions while finding common ground. We're still iterating on this — it's the part of the system where domain expertise matters most and automation helps least.

You can try the full flow at originofyou.com. The portrait takes about two minutes to generate after you complete all five inputs. We're curious whether the technical community finds the cross-system synthesis approach useful or if the framework purists will object to mixing paradigms.