DEV Community

Cover image for I built a phenotype generator for crested gecko genetics. Here's how I modeled a hobby that can't agree on its own rules.
Dusty Mumphrey
Dusty Mumphrey

Posted on

I built a phenotype generator for crested gecko genetics. Here's how I modeled a hobby that can't agree on its own rules.

Crested gecko morphs are one of the most commercially significant trait systems in the reptile hobby. Serious breeders run pairings worth thousands of dollars based on genetic predictions. And the community still actively debates how many of those traits actually work.

Breeders are manually constructing phenotype strings, getting them wrong, listing animals inaccurately, and making pairing decisions on bad information. Not because they're careless. Because the species is young, the documentation is inconsistent, and for some traits, scientific consensus simply doesn't exist yet.

I'm an active crested gecko breeder. I also built Geckistry, a full breeding management platform that runs my own operation. When I got to the genetics features, I had to solve a problem most developers never encounter: how do you build a rule engine when the domain experts disagree on the rules?

Here's what I built and how it works.

Code reference for this article: github.com/Dusttoo/reptile-genetics-engine


Why This Is Harder Than It Looks

Most people who've taken a biology class know the basics of Mendelian genetics. Dominant traits show in one copy. Recessive traits need two. Plug in the parents, predict the offspring. Done.

Crested geckos are not pea plants.

The system I built has to handle six distinct dominance patterns simultaneously:

export type DominancePattern =
  | 'DOMINANT'
  | 'RECESSIVE'
  | 'INCOMPLETE_DOMINANT'
  | 'CO_DOMINANT'
  | 'POLYGENIC'
  | 'FIXED'
  | 'UNKNOWN'
Enter fullscreen mode Exit fullscreen mode

That last one is worth stopping on. UNKNOWN is not a missing field or an error state. It's a first-class value in the enum. The system explicitly says: we don't know how this trait inherits, so we'll treat it conservatively like recessive until evidence says otherwise.

That design decision came directly from the hobby. There are traits breeders have been working with for years where the inheritance mechanism is still genuinely unclear. Pretending the system knows something it doesn't would produce confident wrong answers. UNKNOWN produces honest uncertain ones.

Polygenic traits go further. For those, Mendelian math gets thrown out entirely and replaced with probability estimates grounded in observation:

// Both parents express the trait
return { hom: 0.30, het: 0.50, absent: 0.20 }

// One parent expresses the trait
return { hom: 0.10, het: 0.40, absent: 0.50 }

// Neither parent expresses the trait — still possible
return { hom: 0.02, het: 0.08, absent: 0.90 }
Enter fullscreen mode Exit fullscreen mode

These numbers are not derived from theory. They're priors built from observation, designed to be corrected as real breeding data accumulates. More on that in a moment.


The Data Model

The genetic rule system lives in a set of database tables, not in application code. This was a deliberate architectural choice. The hobby's understanding of crested gecko genetics is still evolving. Hardcoding rules means touching code every time consensus shifts. Keeping rules in the database means a schema migration and a data update.

The core tables:

CREATE TABLE alleles (
  gene_locus_code   TEXT NOT NULL UNIQUE,  -- e.g., "LW", "PH", "y"
  common_name       TEXT NOT NULL,          -- e.g., "Lilly White", "Phantom", "Yellow"
  trait_category    allele_trait_category[] NOT NULL,
  dominance_pattern dominance_pattern NOT NULL,
  allele_notations  TEXT[],                -- e.g., ["LW+", "lw"]
  notes_evidence    TEXT,
  identification_tips TEXT,
  historical_notes  TEXT
);

CREATE TABLE allele_relationships (
  source_allele_id  UUID REFERENCES alleles(id),
  target_allele_id  UUID REFERENCES alleles(id),
  relationship_type allele_relationship_type NOT NULL,
  notes             TEXT,
  UNIQUE(source_allele_id, target_allele_id, relationship_type)
);

CREATE TABLE gecko_alleles (
  gecko_id      UUID REFERENCES geckos(id),
  allele_id     UUID REFERENCES alleles(id),
  is_homozygous BOOLEAN DEFAULT false,
  visibility    TEXT DEFAULT 'visual',   -- breeder override
  UNIQUE(gecko_id, allele_id)
);
Enter fullscreen mode Exit fullscreen mode

The relationship_type enum is where the interaction rules live. It currently supports SUPPRESSES, REQUIRES, ENHANCES, LETHAL_HOMOZYGOUS, and INTERACTS_WITH. Adding a new relationship type as community consensus forms is a single ALTER TYPE migration, not a code change.

The visibility field on gecko_alleles is the pragmatic escape hatch. When a gecko's visual expression doesn't match what the genetics predict, a breeder can set it to possible_het, 100%_het, not_visual, or unknown. This bypasses the genetic classification logic entirely for that gecko-allele pair:

if (visibility === 'not_visual') {
  suppressedTraits.push({ name, suppressedBy: 'manual override', ... })
  continue  // Skip genetics logic entirely
}
Enter fullscreen mode Exit fullscreen mode

The system knows when to get out of the way.


The Learning Layer

The static rule tables handle what we know. A separate set of tables handles what we're learning.

Every time a breeder records what an offspring actually turned out to be, a PostgreSQL trigger fires and updates allele_cross_statistics. The confidence score on any given prediction is calculated as:

-- Confidence: 1 - 1/(1 + n/10)
-- Reaches ~0.5 at 10 samples, ~0.75 at 30 samples
confidence_score = 1.0 - (1.0 / (1.0 + total_offspring::DECIMAL / 10.0))
Enter fullscreen mode Exit fullscreen mode

At zero samples, the system uses pure Mendelian math. As breeders record real offspring, observed ratios gradually displace theoretical ones. The confidence score reflects how much the prediction is grounded in actual data versus theory. A sigmoid that reaches roughly 0.5 at 10 samples and 0.75 at 30 is a reasonable curve for a species where most breeders are working with small sample sizes.


Three Cases That Show the Depth

Fatal Combinations: Lilly White

Lilly White is one of the most popular crested gecko morphs. It's also one of the most important to handle correctly: breeding two Lilly Whites together produces offspring where the homozygous form is lethal. Double Lilly White animals do not survive.

The rule is encoded as a LETHAL_HOMOZYGOUS relationship in allele_relationships. When the prediction engine encounters a pairing where both parents carry Lilly White, it checks for this relationship and surfaces a warning:

if (blendedRates.hom > 0) {
  const lethalRels = relationships.filter(
    r => r.relationship_type === 'LETHAL_HOMOZYGOUS' &&
         (r.source_allele_id === alleleId || r.target_allele_id === alleleId)
  )
  for (const rel of lethalRels) {
    warnings.push(
      `Homozygous ${alleleName} is lethal.${rel.notes ? ` ${rel.notes}` : ''} ` +
      `Offspring showing HOM for this allele will not survive.`
    )
  }
}
Enter fullscreen mode Exit fullscreen mode

The warning is non-blocking by design. It surfaces into a warnings: string[] in the result rather than throwing an error. The system does not prevent a breeder from entering a biologically impossible combination. It surfaces the consequence clearly and lets them decide. That's the right call for a tool used by people who know the domain.

True Allelic Traits: Sable and Cappuccino

Sable and Cappuccino are two distinct crested gecko morphs that occupy the same gene locus. An animal can carry one or the other, but not both. In classical genetics, these are called allelic variants of the same locus.

The current data model doesn't enforce a hard shared-locus constraint. Each allele has its own row. The mechanism for "can't have both" is a SUPPRESSES relationship in allele_relationships. The name engine checks for this relationship when both alleles are present on the same gecko:

const presentAlleleIds = new Set(safeAlleles.map(a => a.allele_id))

for (const rel of relationships) {
  if (!presentAlleleIds.has(rel.source_allele_id)) continue
  if (!presentAlleleIds.has(rel.target_allele_id)) continue
  if (rel.relationship_type === 'SUPPRESSES') {
    suppressionMap.set(rel.target_allele_id, sourceName)
  }
}
Enter fullscreen mode Exit fullscreen mode

The suppressed allele moves to suppressedTraits rather than appearing in the phenotype name. This is a reasonable approximation for "one masks the other" and it correctly handles the practical outcome for breeders, even without a hard locus constraint at the data layer. A future schema iteration could enforce this more strictly. The current model gets the phenotype output right, which is what matters for now.

Polygenic Traits: Where the Hobby Runs Out of Answers

Polygenic traits are the genuinely messy case, and the system handles them by not pretending otherwise.

When dominancePattern === 'POLYGENIC', the inheritance rate calculator switches to a completely different function. And in the phenotype name, polygenic traits get soft language that reflects the actual state of knowledge:

case 'POLYGENIC':
  return {
    type: 'visible',
    trait: {
      inheritanceLabel: isHom ? 'Strong' : 'Present',
      genotypeNotation: null,  // No HOM/HET notation for polygenic traits
    }
  }
Enter fullscreen mode Exit fullscreen mode

There's no PH/PH style genotype notation for polygenic traits. The meaningful output is how strongly the trait expresses, not how many copies the animal carries. That distinction isn't meaningful for polygenic inheritance given where the hobby's understanding currently sits. The system outputs "Strong" or "Present" rather than inventing precision it doesn't have.


Input and Output

Here's a concrete example of what the engine produces. The input is a set of gecko_alleles records:

Allele Code Pattern Homozygous
Yellow y DOMINANT false
Harlequin HR DOMINANT true
Empty Back EB RECESSIVE false
Phantom PH RECESSIVE false

What each allele resolves to:

  • y/+ is dominant and visible. It becomes "Yellow Based" in the phenotype name.
  • HR/HR is dominant and homozygous. It becomes "Harlequin." Dominant traits don't use HOM/HET labels.
  • EB/+ is recessive and heterozygous. It's not visible but gets carried as "Het Empty Back."
  • PH/+ is recessive and heterozygous. Same treatment: "Het Phantom."

Output:

phenotypeName:    "Yellow Based Harlequin"
genotypeNotation: "EB/+ PH/+"
carriedTraits:    ["Het Empty Back (EB/+)", "Het Phantom (PH/+)"]
visibleTraits:    ["Yellow (BASE_COLOR)", "Harlequin (PATTERN_MODIFIER, Homozygous)"]
warnings:         []
Enter fullscreen mode Exit fullscreen mode

Now add Lilly White as a heterozygous (LW/+, INCOMPLETE_DOMINANT):

phenotypeName:    "Yellow Based Lilly White Harlequin"
genotypeNotation: "LW/+ EB/+ PH/+"
visibleTraits:    [..., "Lilly White (SPECIAL_TRAIT, Het)"]
Enter fullscreen mode Exit fullscreen mode

Change Lilly White to homozygous (LW/LW):

phenotypeName:    "Yellow Based Super Lilly White Harlequin"
inheritanceLabel: "Super"
warnings:         [" Lilly White is lethal when homozygous (Super form)"]
Enter fullscreen mode Exit fullscreen mode

The System Also Runs in Reverse

The phenotype name engine goes from genotype to phenotype string. There's a companion system that goes the other direction.

phenotype-inference.ts takes what a breeder observes and works backward to infer genotype. It generates questions structured by dominance pattern: if a recessive trait is visible, the animal must be homozygous and the system says so with high confidence. If a dominant trait is absent, it's absent with certainty. If a polygenic trait is visible, zygosity is genuinely hard to determine and the system says that too.

It also maintains a list of het markers: visual characteristics that indicate carrier status even when the underlying trait isn't expressed. In crested geckos, only one has been proven:

const KNOWN_HET_MARKERS = [
  {
    characteristicName: 'Blush',
    description: 'Red/rosey colored cheeks — proven indicator of het red base (r/+). ' +
                 'One of the few reliable het markers in the species.',
    associatedAlleleCodes: ['r'],
    isProvenMarker: true,
  }
]
Enter fullscreen mode Exit fullscreen mode

The list being intentionally small, and the isProvenMarker boolean distinguishing proven from suspected, is a design decision worth naming. The system is honest about what the hobby doesn't know yet.


What It Unlocked

Before this system, constructing an accurate phenotype string meant holding the full rule set in your head. You'd have to remember the notation conventions, know which traits are dominant versus recessive, correctly apply every interaction, and not make a mistake across 50 animals with a dozen trait combinations each. I built this for my own collection first, and it immediately changed how I manage and list animals in Geckistry.

Right now the generator is live in Geckistry for my personal use. I'm in the process of bringing the same logic into ReptiDex, where it will be available to the broader keeper community. I'm also expanding the covered species with help from experts in other communities. Crested geckos have an unusually complex and well-documented morph library, which made them the right place to build and prove the system. But the architecture was designed from the start to support any species where the genetics are well enough understood to encode.

The deeper payoff as it scales is the learning layer. Every clutch outcome a breeder records makes the system's predictions more accurate. Over time it becomes a live dataset of observed genetics across species, correcting its own priors as the hobby's understanding grows.

The full code reference for this system, including the schema, resolution logic, and confidence formula, is available on GitHub at reptile-genetics-engine.

I run Built By Dusty, a software studio that builds custom applications for small businesses and animal breeders. If you're working on a domain with complex, contested rule systems, or you're a breeder who wants tooling like this for your own operation, I'd like to hear from you.

Top comments (0)