DEV Community

Cover image for Rubric: Exploring a DSL that plays nice with AI
Fernanda Graciolli
Fernanda Graciolli

Posted on

Rubric: Exploring a DSL that plays nice with AI

I've been building prototypes with AI for a while now (think GPT-3 days, not vibe coding days). It's powerful at spinning up functional, aesthetically-pleasing apps really fast. On a good day, I can write my first prompt at 9am and have a one- or two-feature web app deployed by dinner.

Sounds amazing, right? Well...

Now, let's say I add a third or fourth feature, or modify some UI components, or better yet, introduce a third-party API — all hell breaks loose.

What I had to admit was that just because an application looks nice and works in the browser, doesn't mean the underlying code is any good. In fact, the underlying code, when written mostly by LLMs, is reliably unmaintainable. This becomes painfully evident when you try to extend the code, modify it, debug it, refactor it.

So, what's the solution?

Structure.

LLMs have entirely too many options and degrees of freedom when implementing a solution.

People have proposed fixes: detailed PRDs, extensive AI rules, feeding documentation beforehand, etc (and I do mean many etcs).

My issue with these approaches is that they feel like duct-tape. And everyone seems to be finding their own "best practices." Which is fine, to an extent.

Personally, I want a unified solution that obviates all the one-off, ad-hoc quick-fixes that require a ton of upfront work.

Rubric (an LLM-native language)

A disclaimer

Before I go on, I want to caveat that Rubric is in its earliest days. I'm writing about it to get feedback, ideas, and criticism.

Written for AI consumption, readable by the average human

The idea behind Rubric is that it's written for AI consumption, and can be easily understood and modified by humans (even not-so-technical humans).

To start experimenting with it, I constrained it to UI component generation (a personal frustration of mine and a low-hanging, easily testable, fruit).


The Technical Details

The rubric system uses four main files:

The AI reads these as documentation and applies the requirements during generation. No parsing, compilation, or training needed - it's just structured text the AI can understand and humans can easily modify.


Experiments (Full Repo)

I ran a simple experiment using Cursor with Claude-4-Sonnet.

Generate the same product card component using two approaches:

Approach 1: Just a prompt

Create a vanilla html/css/js (3 separate files) product card with image, description, and button. Use accessibility, performance, and security best practices.
Enter fullscreen mode Exit fullscreen mode

Approach 2: Prompt + Rubric

Create a vanilla html/css/js (3 separate files) product card with image, description, and button. use /.rubric/*
Enter fullscreen mode Exit fullscreen mode

For each approach, I let AI generate 5 iterations of each to see what patterns emerged.

You can find the full results in the repo, but here's the gist:

Code Quality Patterns

Looking at the generated code, some clear patterns emerged:

Accessibility: Both approaches included accessibility features, but the rubric version was more systematic. The prompt-only version had good accessibility but inconsistent/spotty implementation. The rubric version had standardized ARIA patterns, consistent focus management, and proper semantic structure across all iterations.

CSS & Classes: Rubric asks for a tokens.css and BEM methodology, which can then be used (theoretically) to keep design consistency and maintainability across the entire app.

Prompt-only

  • Hyper-specific classes: product-card or product-title
  • No room for variants

Where it gets interesting is when I requested a "user card" component. AI generated a similarly styled card (credit to it), but entirely new set of classes, user-card, user-title with repeated values. (This isn't included in the experiment files yet).

Prompt + Rubric

  • Base .card component
  • Variants: card--product or card--user
  • Mixes: card__image or card__title

When I asked it to create a "user card," it simply created a variant, card--user maintaining all of the base card styling and only modifying/adding what was specific to a user card.

As someone who has spent way too much of my life looking for which class is being applied where to what and when, this was a nice little moment of delight for me.

JavaScript Architecture: Both approaches produced functional code, but the rubric version had more consistent error handling, security practices, and accessibility announcements. The prompt-only version was more varied in approach - sometimes object-oriented, sometimes functional, sometimes mixed.

Comments and Documentation: The rubric version included validation comments showing compliance:

<!-- RUX COMPLIANCE: Using semantic HTML structure -->
<!-- Required by: card.rux > ProductCard > product name as heading -->
Enter fullscreen mode Exit fullscreen mode

Honestly, it added too many comments. This is something I will need to refine.

But, in principle, this made the generated code self-documenting, and self-checking in a way. Once, I caught it commenting that it did not meet the RUX requirement, and it seemed to then go back and rewrite the code in a way that did meet it. (Definitely worth exploring further).

What I Learned

The Good Stuff

  1. Consistency: Unsurprisingly, the rubric approach produced much more consistent results across iterations. While the prompt-only version varied in implementation details, the rubric version followed similar Rubric-defined patterns.

  2. Systematic Thinking: The AI seemed to "think" more systematically with Rubric. Instead of just adding accessibility features, it implemented comprehensive accessibility patterns.

  3. Design System Maintainability: The creation of a design system (via tokens.css and BEM methodology) early on allows for much easier modifications and extensibility as well much cleaner, easier-to-follow code.

  4. Better Error Handling: The Rubric version had more robust error handling and edge case management.

The Challenges

  1. Over-Engineering Risk: Sometimes the Rubric version generated more complexity than needed for simple use cases. I think that for quick prototypes and explorations, a prompt-only approach might be preferable. But, if working with a code base that is expected to grow in features, require team collaboration, and meet some security and accessibility standards, introducing Rubric might reduce a lot of headaches later on.

  2. AI Model Dependency: This experiment was done with Claude-4-Sonnet within Cursor (along with some standard .cursorrules). Results might vary with different AI models, and in different environments.

  3. Design System Complexity: The tokens.css file is currently over-engineered. With Rubric, AI created hundreds of tokens, most of which are not used.

Is This Actually Useful?

A lot more experimentation and development are needed to answer this question.

So far, my conclusion is:

For one-off components or quick prototypes, it's probably not worth it. The prompt-only approach works fine and is much faster.

For code bases that will need to scale, Rubric shows promise.

What's Next?

There is a lot I can do from here, but in the immediate-term, this is what I'll explore:

  • Multi-Component Analysis: Test Rubric effectiveness/consistency across different component types (forms, navigation, data tables)
  • Modification Impact: Evaluate how user-led modifications affect Rubric adherence and component quality
  • Refine Design System: Tweak the design system requirements to not introduce more complexity than necessary

Try It Yourself

The full experiment is on GitHub if you want to poke around or contribute. The Rubric files are designed to be readable, so you can see exactly what instructions the AI was following.

The most interesting part to me isn't the specific syntax or approach - it's the idea that we can guide AI toward more systematic thinking by providing structured context. Whether that's through a DSL like this or some other approach, I think there's something here worth exploring.

What do you think? Have you experimented with non-prompt structured approaches to AI guidance? I'd love to hear about other experiments in this space.


Follow Along & Contribute

You can follow/connect with me on LinkedIn or Github.

Contribute: Run your own experiments, or write your own Rubric files and share them!

I'll be posting updates here to start. If this gains traction, at some point I'll publish a real blog/docs.

Top comments (0)