DevUnionX

Posted on Mar 1

5 Things AI Can't Do, Even in Tailwind css

#react #css #tailwindcss

This analysis examines five critical areas where AI appears capable but proves unreliable in practice, specifically within Tailwind, a styling technology where AI tools can generate output particularly quickly thanks to the utility-first approach. Recent Tailwind versions like 4.2.1, listed as latest on npm, offer fast compilation, CSS-first configuration, automatic content detection, and the ability to use design tokens as CSS variables. These features make prototyping easier for AI, but when full accuracy, scaled architectural fit, brand language, accessibility, and quality assurance come into play, human oversight remains decisive.

The five main findings summarize as follows. First, because Tailwind classes are atomic, interfaces emerge from combining dozens of utilities. While AI often produces class sequences that look correct, visual output can remain inconsistent with targets due to actual content, component hierarchy, variant interaction, and conflicting utilities. Moreover, since Tailwind resolves conflicts through stylesheet production order rather than class attribute order, AI's intuitive assumption that the last-written wins frequently breaks down. Second, while Tailwind's strength lies in embedding design systems through constraints in code, this requires project-specific tokens, semantic naming, and component variant decisions. AI typically leans on default palettes like sky and blue, cannot consistently use brand tokens defined through @theme like color-brand-asterisk, or pushes teams toward semi-functional solutions with wrong version or configuration advice. Third, regarding aesthetics and product voice tone, AI outputs frequently remain generic. Nielsen Norman Group field observations emphasize that without detailed visual specifications, AI prototyping tools tend toward uniform, generic, minimal outputs and often default to widespread frameworks like Tailwind or component libraries.

Fourth, Tailwind's compilation model of source scanning and generation penalizes dynamic class name generation. Since Tailwind detects classes through plain text scanning, classes produced through string concatenation or templating often never enter CSS. AI frequently makes exactly this mistake of generating dynamic classes. Though v4 allows source registration and safelisting through source and source inline directives, this can balloon bundles and create maintenance burden when scope is large. Additionally, some scenarios like injecting class strings from servers might make practical safelisting impossible. Fifth, accessibility and cross-device quality assurance cannot be solved through mechanical generation of Tailwind classes. The CodeA11y study published at CHI 2025, working with developers lacking accessibility training, reports that AI assistants proved insufficiently helpful when not specifically prompted for accessibility, critical manual steps like replacing placeholder alt attributes with real content got skipped, and compliance couldn't be verified. The W4A 2025 study explicitly states that current AI tools are inadequate for generating fully accessible code and human expertise is required.

This analysis provides technical explanation, AI's concrete failure modes, real-world examples, short Tailwind code snippets, and mitigation strategies for developers under each topic. The final section includes two comparison tables and a mermaid diagram showing AI plus developer collaborative workflow.

The research approach uses Tailwind documentation as primary sources, especially on class detection, responsive variants, theme tokens, and directives, blog announcements from Tailwind Labs about releases like v4.0 and v4.1, GitHub issues and discussions in the Tailwind repository, plus WCAG 2.2 standard text and Understanding guides published by W3C for accessibility. On the AI side, two levels of sources were used: primary research measuring AI code assistant productivity and limitations, like arXiv studies evaluating Copilot's performance on real projects, and field research on AI UI/UX prototyping output nature, like Nielsen Norman Group observations. Additionally, academic publications like CHI and W4A focusing on accessibility-centered AI assistants proved especially important for discussing Tailwind output accessibility risks.

The technical features making Tailwind attractive for AI form the foundation for the rest of the discussion. Tailwind fundamentally scans utility classes used in your project from source files and generates only the needed CSS. This approach reduces file size and enables flexible features like arbitrary values. With v4, the configuration model became largely CSS-first: you can start with @import "tailwindcss" and define tokens with directives like theme, plus v4 leverages modern CSS features and comes with a faster engine. Consequently, v4's browser target is more modern too. The official upgrade guide explicitly states v4 targets Safari 16.4 plus, Chrome 111 plus, Firefox 128 plus instead of older browsers.

In summary, Tailwind's scanning-generation model and token/variant system make rapid apparent UI production easy for AI. But this very model enlarges AI's error margin regarding dynamic class generation, version mismatches, plugin-specific utilities, and accessibility. The first major issue concerns contextual design nuance and visual validation absence. In Tailwind, styling mostly doesn't follow a one component equals one class logic but rather composes many small utilities: flex, items-center, gap-4, px-6, text-sm, bg-asterisk, hover-colon-asterisk, md-colon-asterisk, dark-colon-asterisk and so on. Tailwind's own documentation describes this approach as styling by combining many single-purpose utility classes directly in markup.

This composite structure produces two critical consequences. First, visual accuracy isn't a single correct state. As content, language like long or short text, data density, user interaction, and screen size change, the same utility set behaves differently. Tailwind's responsive documentation supports this variety through arbitrary breakpoint variants like min-bracket and max-bracket plus container query variants. The system itself is designed not with a single screen assumption but with context-dependent variants. Second, conflicts and precedence can be unintuitive. Tailwind states that when two utilities targeting the same CSS property are used together, the rule appearing later in the stylesheet wins, and shows through examples that this can be independent of class attribute order.

AI struggles here because Tailwind appears easy, since class names consist of relatively memorable patterns. But AI's typical failure modes enter at this point. AI produces a class sequence it believes is done without seeing the actual render output or without testing all project variants if it does see output. This proves especially fragile with combinations of responsive variants like md-colon-asterisk, container, min-bracket and state variants like hover-colon, focus-visible-colon, disabled-colon. Tailwind's state variant documentation shows variants can stack on top of each other, enabling behavior definition under multiple conditions like dark plus md plus hover. This power simultaneously enlarges the error surface.

Additionally, AI tends to generate conflicting utilities: both grid and flex, both w-full and w-64, both p-2 and p-6 in the same component. Human developers notice and fix conflicts. AI mostly tries suppressing problems by adding extra classes. Because the actual result of class conflicts depends not on class attribute order but on Tailwind's generated CSS order. Tailwind documentation's example concretizes a mistake AI frequently makes: seeing someone write class equals "grid flex", they might think the last flex wins. But Tailwind breaks this intuition by saying the rule appearing later in the stylesheet wins.

The following example shows the style-conflict pattern AI often falls into when generating a card list:

<!-- AI's typical mistake: utilities that void each other -->
<div class="grid flex grid-cols-3 flex-col gap-4">
  ...
</div>

Though this code appears to work, behavior isn't deterministic. During the development process, the component can be perceived sometimes as grid, sometimes as flex. In reality, which display rule wins depends on Tailwind's generation order. The most effective way to reduce errors in this class involves positioning AI not as a one-time UI writer but as a draft-generating assistant, automating validation. First, standardize class sequence readability. Tailwind's official Prettier plugin automatically sorts classes according to recommended order. Tailwind Labs announces this as the official solution to end class ordering debates.

Second, use merge tools like tailwind-merge to manage conflict situations in code. These tools merge conflicting classes according to rules, generally with the expectation that last conflicting class wins, making output deterministic. Third, adopt visual regression tests like Storybook/Chromatic or at least a critical screenshots per PR approach for visual validation. Since AI's failures are mostly visual-contextual, they cannot be caught through lint or typecheck alone. This recommendation rests on AI's general limitation: studies report AI struggles with complex, multi-file contexts.

The second major limitation involves semantic intent, design tokens, and naming problems. Tailwind's utility-first approach doesn't need BEM or semantic class naming from classic CSS, but this doesn't mean semantic needs disappear. In large projects, semantics get provided through design tokens, component variants, and design systems. Tailwind v4 documentation defines the theme variables concept as custom CSS variables determining what values utility classes can have. For instance, defining a token like color-mint-500 makes classes like bg-mint-500 and text-mint-500 available. The v4 blog additionally emphasizes that design tokens are also usable as CSS custom properties by default, facilitating scenarios like runtime theme switching through var.

In this architecture, the fundamental work humans do is translating design-product decisions like what is this product's primary color, what contrast should danger buttons have, what is our spacing scale into tokens that become the team's shared language. What AI learns isn't the Tailwind class list but your project's design vocabulary. When AI doesn't fully see existing theme tokens, team internal semantic naming decisions, Figma/Design ops processes, and brand guidelines, it produces a general solution with the strongest probabilities.

Another difficult point: some JavaScript configuration options in Tailwind v4 are no longer supported. The official functions and directives documentation notes that though legacy JS config can be loaded, options like corePlugins, safelist, and separator aren't supported in v4.0, and source inline must be used for safelisting. AI suggestions derived from old blog posts, v3 examples, or training data like let's add safelist to tailwind.config.js can directly mislead v4 projects.

AI's typical output resembles this:

<!-- AI's typical output: leans on default palette -->
<button class="bg-sky-600 text-white px-4 py-2 rounded-lg hover:bg-sky-700">
  Continue
</button>

This works but is mostly wrong for brand identity: corporate color might not be sky, your button radius scale might be rounded-md not rounded-lg, hover darkness might differ. In the human design system, this kind of approach gets targeted, using Tailwind v4 token logic:

@import "tailwindcss";

@theme {
  --color-brand-500: oklch(0.62 0.20 265);
  --color-brand-600: oklch(0.56 0.22 265);
  --radius-component: 0.75rem;
}

<button class="bg-brand-600 text-white rounded-[var(--radius-component)] hover:bg-brand-500 px-4 py-2">
  Continue
</button>

This approach preserves brand semantics through tokens and aligns with product language. Tailwind's theme variable documentation explicitly explains that when tokens are defined, related utilities become available. The most practical error in this category involves AI confusing configuration differences between Tailwind v3 and v4. For instance, while content detection became automated in v4, as the v4 blog emphasizes automatic content detection, AI might still give directives like add content array. This directive is sometimes harmless, sometimes misdirects your project layout.

Success in this class comes from making AI not a class generator but a token-compliant generator. First, manage tokens from a single source. Tailwind v4's theme-based token approach is designed to meet this single-source need. Additionally, v4 making tokens accessible as CSS variables facilitates managing brand colors at runtime too. Second, consciously establish the semantic layer in your team. Evil Martians recommends practical principles for preventing chaos in Tailwind projects like group and semantically name design tokens and define variant sets instead of randomly overriding classes. This approach also controls AI generating a new class list in every example.

Third, subject AI output to design system testing: create a design policy that catches forbidden things like palette rules like sky, forbidden radius, forbidden arbitrary values through linting. Without exceeding this report's scope, the goal here isn't trusting AI's good intentions but building guardrails that automatically reject wrong output. Software assistant studies reporting AI struggles with complex/project-specific contexts support this guardrail need.

The third limitation concerns creative and aesthetic boundaries. Tailwind aims to keep design decisions within a constrained scale and token set. This is good for consistency, but good design isn't just consistency. Elements like typographic rhythm, whitespace, information hierarchy, micro-interactions, visual metaphors, and brand tone don't automate through pure utility combination. Therefore, a common pathology exists when using Tailwind with AI: rapid prototype plus insufficient differentiation. AI produces, the screen looks modern, but the product's distinctive voice doesn't form.

Nielsen Norman Group notes that without detailed visual specification, AI prototyping tools slide toward similar, generic outputs, related to tendency to lean on common patterns in training data. They also emphasize these tools frequently select frameworks like Tailwind or popular component libraries as default solutions. For Tailwind specifically, this means two things. AI easily produces patterns like hero plus card plus CTA because much of the internet is full of these patterns. But your product's unique value proposition, like more serious, warmer, more trustworthy, doesn't automatically transfer with these patterns. AI can copy design language with arbitrary values or eyeball spacing, but this copy often conflicts with brand identity. The power of Tailwind's responsive/variant system simultaneously encourages AI's behavior of adding extra classes to every problem.

Developer blogs frequently share experiences like I generated Tailwind classes with ChatGPT and quickly made UI. Such posts show AI works especially for tasks like arrange three cards in a row or make login form. But these same texts generally focus on quickly working UI rather than building a design system or brand language. This difference matters: success at prototype level isn't success at production quality. The following represents the type of modern but looks-like-everywhere hero block AI frequently produces:

<section class="bg-white py-16">
  <div class="mx-auto max-w-5xl px-6">
    <h1 class="text-4xl font-bold tracking-tight text-gray-900">
      A faster flow for your product
    </h1>
    <p class="mt-4 text-lg text-gray-600">
      A short description text.
    </p>
    <div class="mt-8 flex gap-3">
      <a class="rounded-lg bg-sky-600 px-5 py-3 text-white hover:bg-sky-700" href="#">
        Get Started
      </a>
      <a class="rounded-lg border border-gray-200 px-5 py-3 text-gray-900 hover:bg-gray-50" href="#">
        Learn More
      </a>
    </div>
  </div>
</section>

This code's problem isn't being wrong. It's that thousands of products could use the same code. Aesthetic differentiation comes with token and component language. AI doesn't invent this spontaneously. NNG's generic visual style observation explains this result. Three applicable mitigation strategies stand out in this section. First, give AI a design framework. NNG's promptframes approach argues that prompts given to AI should contain target/context/requirements like a wireframe. Not make hero but requirements like H1 hierarchy, CTA priority, content goal, visual tone should be part of the prompt.

Second, tie aesthetic decisions to tokens. Give the color/typography/spacing system defined with theme to AI as a dictionary and conform output to this dictionary. Tailwind v4's token system purpose is exactly to carry design system at code level. Third, try measuring visual quality automatically but leave final decision to humans. NNG's observation implies AI prototypes can be good from far, weak up close. Therefore removing final control by human designer/dev eye from process is risky.

The fourth limitation concerns scaled architecture and modularity: JIT, scanning, safelist, and plugin ecosystem. Tailwind's generation model, especially with v4, places the compile-time scanning idea even more centrally. Core documentation explains Tailwind scans source files and generates CSS according to utilities used. The direct consequence: since Tailwind does class detection through plain text scanning, it cannot understand language features like string concatenation/interpolation. Documentation specifically emphasizes not generating dynamic classes and using classes as complete, unsplit strings. Otherwise relevant CSS never gets generated.

Tailwind v4 additionally offers the source directive for source registration, answering needs like setting base path in structures like monorepo, excluding specific folders, or scanning external libraries. Safelist need exists in v4 too but its form has changed. V4 documents show source inline can be used to generate classes not appearing in content, even allowing many classes to be generated with variants and ranges. AI's most dramatic Tailwind failures include producing solutions that build class strings at runtime. Because AI mostly well imitates the className equals template literal pattern from React/Vue world but doesn't account for Tailwind's scanning model.

The second failure mode involves file scanning limits in scaled structures like monorepo, external UI packages, or CMS-originating class fields. Though Tailwind v4 eases content detection, explicit registration via source might be needed for external sources or packages in gitignore. AI can skip this requirement without seeing project build chain and source paths. The third failure mode is the plugin ecosystem. Through Tailwind's plugin API, methods like addComponents, addUtilities, addVariant allow adding custom utility/component/variant. Without knowing these plugins, AI either invents project-specific classes through hallucination or conversely never uses existing custom utilities. Layer/precedence behaviors of plugin API, like plugin utility order in output, also increase this complexity.

A current discussion in the Tailwind repository describes a situation where class strings from server get written to DOM but related CSS doesn't apply: component receives values like class equals my-4 sm:my-5, but because these classes aren't found as complete strings in source scanning, they might not enter production output. Moreover, values are user-configurable and unpredictable, making safelisting impractical. This example represents a class where AI thinks it will work looking at code but explodes in production: you see class in DOM but no corresponding CSS file entry. Similarly, Tailwind documentation explicitly says don't do this anti-pattern:

function Button({ color, children }) {
  return (
    <button className={`bg-${color}-600 hover:bg-${color}-500 ...`}>
      {children}
    </button>
  );
}

And recommends using complete class names. AI's typical fragile output:

const Badge = ({ tone }) => (
  <span className={`bg-${tone}-100 text-${tone}-800 px-2 py-1 rounded`}>
    Label
  </span>
);

Solution fitting Tailwind's scanning model using lookup table:

const toneMap = {
  success: "bg-emerald-100 text-emerald-800",
  warning: "bg-amber-100 text-amber-800",
  danger:  "bg-red-100 text-red-800",
};

const Badge = ({ tone }) => (
  <span className={`${toneMap[tone]} px-2 py-1 rounded`}>
    Label
  </span>
);

This approach makes class names appear in source as complete strings. Tailwind's rationale about dynamic class names is that plain text scanning can't understand string concatenation. V4 documentation shows doing safelist with @source inline and that this can generate many classes through brace expansion:

@import "tailwindcss";

/* safelist underline with hover and focus variants */
@source inline("{hover:,focus:,}underline");

/* bg-red scale with hover variants */
@source inline("{hover:,}bg-red-{50,{100..900..100},950}");

This escape hatch works, but practical warnings exist that safelisting can increase bundle size and should only be used when truly dynamic/external configuration is needed. For instance, Hyvä documentation emphasizes safelisting grows CSS output and other methods should be preferred when possible. The most practical strategies to reduce errors in this category: first, adopt dynamic value instead of dynamic class approach. Since Tailwind v4 makes tokens usable as CSS variables, a pattern like bg-bracket-var parenthesis brand closing bracket is often safer than bg-dollar-brace-color-500 for runtime theming. The v4 blog specifically emphasizes tokens being accessible as CSS variables.

Second, in monorepo/external package scenarios, process source registration with source. Tailwind documentation explains scanning external libraries, setting base path, and excluding paths in detail with source. Third, make the plugin layer visible: maintain single source of truth documentation and examples for custom utility/variants. Give this dictionary when prompting AI. Otherwise AI won't know project-specific classes. Plugin API function is explained clearly in official docs. Fourth, monitor bundle size. Tailwind's class scanning approach aims to keep CSS small. Safelists can reverse this goal. Therefore keeping safelist exceptional prevents AI from easily resorting to abandoning to safelist.

The fifth limitation concerns accessibility and cross-device concerns. Accessibility is broader than Tailwind class generation: it encompasses semantic HTML, ARIA, correct focus management, adequate contrast, keyboard usability, touch target sizes, error message clarity. W3C's WCAG 2.2 standard normatively defines criteria like keyboard focus visibility and focus appearance adequacy. Understanding pages explain these criteria's purposes and implementation logic. On the Tailwind side, accessibility doesn't come automatic. Only necessary CSS tools like focus-visible-colon-asterisk, ring-asterisk, outline-asterisk, forced-colors-colon-asterisk become easy. Tailwind documentation shows state variants like hover/focus can be used and multiple variants can be applied on top of each other.

Two critical technical facts exist in cross-device dimension. First, hover behavior differs on touch devices. The CSS produced by source inline examples in Tailwind v4 documentation shows hover variants generate under media parenthesis hover colon hover closing parenthesis, meaning limitation to devices with real hover capability. This strengthens the idea that design shouldn't lean only on hover. Second, Tailwind v4 focuses on modern browsers. The official upgrade guide notes v4 won't work on old browsers due to reliance on modern CSS features. It recommends staying on v3.4 if old browser support is needed. AI's assumption that v4 classes work everywhere breaks here.

AI's struggle with accessibility is two-layered. First, AI assistants don't take accessibility as a default goal. The CodeA11y study published at CHI 2025, in formative work with 16 developers lacking accessibility training, reports three obstacles to AI assistants improving accessibility: developers not specifically prompting AI for accessibility, critical manual steps AI suggests like converting placeholder alt to real alt getting skipped, and compliance cannot be verified. Second, AI produces correct-looking but user-need-failing solutions. The W4A 2025 article very explicitly states current AI tools are inadequate for generating fully accessible code and human expertise is required.

A third Tailwind-specific dimension also exists: AI can misuse utilities critical for accessibility. For instance, spreading a class like outline-none everywhere to look nice can cause keyboard users to lose focus. Moreover, some focus/outline override pattern breakage got reported in Tailwind v4 betas. An issue discusses that the pattern of overriding outline-none base style with focus-visible colon outline-asterisk broke in v4 betas and notes a specific UI kit uses this pattern. The following two button differences show that in Tailwind, accessibility is not about class count but correct state strategy.

AI's frequently produced risky pattern:

<button class="outline-none bg-sky-600 text-white px-4 py-2 rounded">
  Save
</button>

This can make focus invisible, removes browser default outline, and conflicts with WCAG focus visibility intent. WCAG 2.2 aims for focus indicator to be visible and have adequate perceptibility. Human-crafted safer pattern:

<button
  class="bg-brand-600 px-4 py-2 text-white
         focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-brand-500
         rounded-md"
>
  Save
</button>

Here even if outline gets removed, a distinct ring replaces it as focus indicator with design-system-appropriate tokens. This example manages not the don't use outline-none rule but the risk of losing focus indicator. The most effective mitigation in this area isn't writing better prompts to AI but adding accessibility to process design. First, automate accessibility testing and make it a gate. CodeA11y research emphasizes developers struggle verifying compliance. This increases value of automatic tests like axe-based checks, Lighthouse, design review.

Second, teach AI accessibility defaults but keep control in human. W4A 2025's finding that human expertise is required shows accessibility responsibility cannot be delegated in regulated sectors like banking, government, health, despite Tailwind plus AI speed. Third, test cross-device. Instead of leaning on hover, target interaction design covering states like focus-visible, active, disabled. Tailwind's variant system facilitates these states, but which state is critical gets determined not by AI but by product and user research. Fourth, manage browser matrix. Tailwind v4 targets modern browsers. Staying on v3.4 might be necessary depending on customer/country/device profile. If AI's produced solution comes with v4 modern CSS assumption, interface breaks on old devices.

Tailwind makes rapid UI production easy for AI because class names are poster-like clear and the utility-first system is very widespread on the internet. But the five themes in this report, contextual design nuance, semantic intent and token discipline, aesthetic differentiation, scaled build/architecture fit, and accessibility plus cross-device quality assurance, by Tailwind's nature require human decision and process control. This finding aligns both with Tailwind's own technical constraints like scanning, conflict, version difference and with research on AI assistants emphasizing struggling in complex contexts and multi-file work.

The following tables prepare to compare the same UI problem from AI quick output versus human-with-design-system-written output perspectives. Values aren't quantitative measurements. They aim to show how failure modes and mitigations explained throughout the report reflect on practical quality dimensions.

Output type	Correctness target behavior	Accessibility WCAG risk	Maintainability readability/reuse	Bundle size scanning/safelist effect	Brand fidelity token alignment
AI-generated typical	Medium: visually good but responsive/state coverage can be incomplete	Low-Medium: patterns like outline-none can lose focus visibility; risk grows if manual verification is missed	Low: long class lists, conflicts, different writing styles in team; without Prettier/sorting, PR chaos increases	Medium: can sometimes bloat with unnecessary arbitrary value/safelist suggestions	Low: default palette sticking widespread
Human-designed design system	High: states and breakpoints defined according to product requirements	High: focus/contrast/labeling as gate in process; testing and governance exist	High: token-first, variant sets, class sorting standard	High good sense: scanning model preserved; safelist minimal	High: based on theme tokens, brand language preserved

Top comments (1)

Muhammad Afsar Khan • Mar 2

Liquid syntax error: Variable '{{% raw %}' was not properly terminated with regexp: /\}\}/