Hallmark: Stop AI-Generated UI Slop in One Command in 2026
TL;DR Summary
- AI coding agents default to the same predictable UI: Inter font, purple gradient, nested cards — because they trained on the same templates
- Hallmark is a 1.8k-star, MIT-licensed design skill that gives AI agents actual design taste through 4 verbs (Build, Audit, Redesign, Study) and 22 unique themes
- Install in one line:
npx skills add nutlope/hallmark— works with Claude Code, Cursor, and Codex - Every output runs through 65 slop-test gates plus a pre-emit self-critique — if an anti-pattern is detected, it regenerates
- Made by Together AI. Two pages for two different briefs feel like different sites, not color-swaps of the same template
Direct Answer Block
Hallmark is an open-source design skill (MIT license, 1.8k stars) that teaches AI coding agents to avoid the generic "AI slop" aesthetic — Inter font, purple gradients, nested cards. It provides four verbs: Build (generate unique UI from a brief), Audit (score existing code against 65 anti-patterns), Redesign (rebuild visual structure while keeping content), and Study (extract design DNA from screenshots or URLs).
Introduction
Every AI coding agent produces the same website. Inter font. Purple gradient hero. Three nested feature cards. Cookie-cutter testimonial section. It's not the model's fault — it's the training data. LLMs learned design from the same templates, the same Tailwind examples, the same "modern SaaS landing page" boilerplate. Hallmark breaks that default. It's a skill file — a behavioral constraint, not a library — that forces the agent through design-quality gates before output. One command installs it. Four verbs control it. Twenty-two themes style it. The result is genuine visual variety from the same AI.
Why do all AI-generated UIs look the same — and what is "AI slop" in design?
"AI slop" in UI design is not about quality — it's about uniformity. The generated interfaces are technically competent but visually indistinguishable. The Hallmark README identifies the source of this problem precisely: LLMs were trained on the same templates, the same component libraries, the same Tailwind examples. The "on-distribution defaults" produce a convergent aesthetic — every generation gravitates toward the same patterns.
The specific anti-patterns are consistent across agents:
- Inter font as the default typeface (overwhelmingly represented in training data)
- Purple-to-blue gradients in hero sections (the most common SaaS template trope)
- Nested card layouts with icons on top, heading, description (the default component pattern)
- AI-default box-shadows and spacing values (consistent CSS defaults)
- Predictable information architecture (hero → features → testimonials → CTA)
The Hallmark approach: "Hallmark picks a macrostructure for the brief, dresses it in one of twenty-two themes, runs sixty-five slop-test gates plus a pre-emit self-critique, and refuses the on-distribution defaults every LLM was trained into."
Hallmark was created by Together AI and is explicitly described as an "anti-AI-slop design skill." It's not a UI library or a component system — it's a behavioral instruction file (SKILL.md) that constrains the agent's design choices.
How do you install Hallmark with one command — and how does it work across Claude Code, Cursor, and Codex?
Installation is a single command:
npx skills add nutlope/hallmark
Re-run to update. The skill installs as a behavioral rule that your coding agent references during design generation. For manual installation or non-npx environments, the README provides paths for each agent:
| Agent | Install Path |
|---|---|
| Claude Code |
~/.claude/skills/hallmark/ (copy SKILL.md + references/) |
| Cursor |
.cursor/rules/hallmark.mdc (body of SKILL.md, no frontmatter) |
| Codex |
~/.codex/skills/hallmark/ (personal) or .codex/skills/hallmark/ (project) |
The mechanism is a SKILL.md file containing behavioral directives and anti-pattern rules. When the agent generates UI code, Hallmark's constraints are active in the agent's context — the agent sees the design rules alongside your prompt and adjusts its output accordingly.
Unlike a component library (which provides pre-built components) or a CSS framework (which provides utility classes), Hallmark works at the instruction level. It doesn't add code to your project — it changes what code the agent generates. This makes it compatible with any stack, any framework, any agent.
How do the four verbs (Build, Audit, Redesign, Study) solve different stages of the design problem?
Hallmark's four verbs map to distinct stages of the design process:
Build (default)
Generates new UI from a brief. Picks a macrostructure appropriate for the content type, applies one of 22 themes, runs the 65 slop-test gates, and returns validated output. This is the default verb — no prefix needed.
Example briefs from the README's gallery: "SaaS product page" (gets modern-minimal theme), "Travel booking site" (gets atmospheric theme), "Coffee subscription" (gets bold, earthy theme). Same brief structure, different visual DNA.
Audit (hallmark audit <target>)
Scores existing code against the 65 anti-patterns. Produces a punch list — no edits. This is for evaluating AI-generated UIs you've already built and want to check for generic design patterns.
The audit output flags specific violations: "Purple gradient detected (pattern #12)", "Inter font — try pairing (#3)", "Nested cards — generic AI pattern (#18)". Each flag includes severity and the specific anti-pattern it matches.
Redesign (hallmark redesign <target>)
Throws out the visual structure but preserves the content (copy, information architecture, brand elements). Rebuilds with a different macrostructure and theme while keeping the semantic elements intact. This is for refreshing an existing UI without rewriting content.
Study (hallmark study <screenshot | URL>)
Extracts the design DNA from a source you admire. It identifies three elements: macrostructure (the page's layout pattern), type-pairing (font combinations), and color anchor (the dominant color scheme). It refuses pixel-clones and paid templates — the output is a portable design.md file that can be handed to any AI tool, not a copied design.
"Study extracts the DNA from a design you admire — macrostructure, type-pairing, colour anchor. Refuses pixel-clones and paid templates. Optionally emits a portable design.md for handoff to other AI tools." — Hallmark README
How do the 22 themes and 65 slop-test gates prevent generic output without sacrificing speed?
The 22 themes provide structural variety. From the README's gallery: "modern-minimal" (Tally SaaS), "atmospheric" (Wayfare travel), "playful" (BananaStudio), "editorial" (Anya Reis portfolio), "fashion-brand" (NAJM), "ceramics-studio" (Søroe), "dev-infrastructure" (Hyperlane). Each theme implies different typography pairings, color palettes, spacing rhythms, and component treatments.
The 65 slop-test gates are specific anti-pattern checks that run before output. The README describes these as quality assurance: "runs sixty-five slop-test gates plus a pre-emit self-critique." If a generated design triggers an anti-pattern (purple gradient, Inter-only fonts, cookie-cutter card layout), Hallmark regenerates that portion before the user sees it.
The self-critique is the final layer: before handing back output, the agent reviews its own work against Hallmark's rules and identifies anything that looks generic. This catches patterns the gate system might miss — edge cases, novel anti-patterns, or combinations of otherwise-acceptable elements that together produce a generic result.
The README emphasizes that this process doesn't meaningfully slow down generation: the gates are binary checks (pattern matches, not LLM calls), and the self-critique is a single review pass.
How does Study mode extract design DNA from a screenshot or URL — and produce a portable design.md?
Study mode is Hallmark's most innovative verb. It addresses a specific problem: "I like how that site looks. Make mine look like that." Without Study, the agent either copies the design (pixel-clone, which Hallmark refuses) or produces something unrelated.
The Study workflow extracts three dimensions of design DNA:
Macrostructure: The page's layout pattern — hero layout, content flow, section ordering, navigation style. This is not the visual styling but the structural skeleton.
Type-pairing: Font combinations used on the source design — heading font, body font, accent font, and the typographic hierarchy (sizes, weights, spacing).
Color anchor: The dominant color scheme — primary, secondary, accent, background, and text colors extracted from the source, not color-picked exactly but analyzed for the palette's intent (warm/cool, saturated/muted, high/low contrast).
The output is a portable design.md file: plain markdown describing the extracted design DNA, tool-agnostic and handoff-ready. This file can be dropped into any project and used by any AI coding agent — not just Hallmark-enabled ones.
The README's key constraint: Hallmark "refuses pixel-clones and paid templates." If the source is a commercial template or the extraction would produce a too-close copy, Hallmark declines and suggests alternative approaches.
How does Audit mode score existing AI-generated UI against 65 anti-patterns and produce a punch list?
Audit mode (hallmark audit <target>) is Hallmark's code review verb for design. It reads existing HTML/CSS/JSX code and runs it through the same 65 slop-test gates used during generation.
The output is a punch list — a structured report of detected anti-patterns with:
- Pattern ID: which of the 65 anti-patterns was triggered
- Severity: how much the violation impacts the generic appearance
- Location: where in the code the violation occurs
- Suggestion: what to use instead (e.g., "Inter font → try pairing with a display font for headings")
The README describes Audit as producing "Score existing code against the anti-patterns. Punch list, no edits." It's a read-only analysis — it tells you what's wrong without changing anything.
This is valuable for teams that have already generated UI code with AI agents and want to check for design uniformity before shipping. It's also useful for evaluating different AI agents' design output — run the same brief through Claude Code, Cursor, and Codex, then run Hallmark Audit on each to see which agent produces the most varied output.
Frequently Asked Questions
Q: Does Hallmark work with any tech stack?
Yes. Hallmark is a behavioral skill file, not a library. It doesn't add code to your project — it changes what the AI agent generates. It works with any stack (React, Vue, vanilla HTML, Next.js, etc.) because the agent generates stack-appropriate code that follows Hallmark's design constraints.
Q: Can I create my own theme?
The 22 themes are defined in Hallmark's references/ directory. Since the project is MIT-licensed and open source, you can fork it and add your own themes. A theme defines typography pairings, color palettes, spacing rhythms, and component preferences — all in plain skill-instruction format.
Q: Does Hallmark slow down code generation?
Minimally. The 65 slop-test gates are pattern-matching checks, not additional LLM calls. The pre-emit self-critique is a single review pass. The README doesn't report specific latency numbers but indicates the process is designed to be fast.
Q: How is Hallmark different from using a design system or component library?
Design systems and component libraries provide pre-built components with consistent styling. Hallmark changes what the AI agent generates by constraining its behavior at the instruction level. You can use Hallmark alongside a design system — the design system handles consistency, Hallmark handles distinctiveness.
Q: Can I use Hallmark for non-web UIs (mobile, desktop)?
Hallmark's 65 anti-patterns and themes are designed for web UIs. The Study verb extracts web-agnostic design DNA (type-pairing, color anchor) that could inform any platform, but the generation and audit verbs target HTML/CSS output.
Q: Who made Hallmark?
Hallmark was created by Together AI (the company behind the Together AI inference platform and open-source model releases). It's maintained on GitHub under the nutlope organization with 115 commits and active development.
Glossary
- AI slop: Uniform, generic output from AI models that conforms to the most common patterns in training data — in UI design, characterized by Inter font, purple gradients, and nested card layouts
-
Skill file: A behavioral instruction file (typically
SKILL.md) that AI coding agents read to constrain their behavior — tells the agent how to do something, not what to build - Macrostructure: A page's layout skeleton — the structural pattern of sections (hero, features, testimonials, CTA) independent of visual styling
- Slop-test gate: A binary anti-pattern check that runs before output — if a pattern is detected (e.g., purple gradient), the output regenerates
- Design DNA: The extracted essence of a design's visual identity — macrostructure, type-pairing, and color anchor — abstracted from a specific implementation
- Pre-emit self-critique: A final review pass where the AI agent evaluates its own output against design rules before presenting it to the user
Author
Ramsis Hammadi — AI/ML engineer specializing in GenAI, LLM engineering, and automation. Full bio →


Top comments (0)