DEV Community

Cover image for We tried to generate a compliance course with AI. It didn’t go well.
CPDForge
CPDForge

Posted on

We tried to generate a compliance course with AI. It didn’t go well.

We started off trying to build a compliance course.

We ended up building the system required to trust one.

Turns out they’re not the same thing.

That’s when everything changed.


🧪 The First Version (Looked Fine… Until It Didn’t)

The initial idea was simple:

Use AI to generate a compliance training course.

Pick a topic like:

  • risk assessment
  • workplace safety
  • ESG fundamentals

Feed it into a model, get a structured course out.

And technically — that worked.

We got:

  • modules
  • lessons
  • headings
  • even quizzes

On the surface, it looked decent.

But once you actually read it properly…


❌ What Was Broken

Shallow Content

It explained things, but didn’t really teach anything.

No depth. No real-world context. No edge cases.


Inconsistent Structure

Some lessons were detailed. Others felt like placeholders.

No consistency across the course.


No Instructional Flow

It wasn’t designed — it was assembled.

Content chunks, not a learning journey.


And the Big One: Reliability

In compliance training, “almost correct” isn’t acceptable.

It’s a risk.


⚠️ The Realisation

We assumed the problem was:

“How do we generate better content?”

It wasn’t.

The real problem was:

How do we make that content consistent, reliable, and safe to use?

AI was doing exactly what it’s good at:

  • producing plausible output
  • filling gaps convincingly
  • sounding right

But that’s not the same as being trustworthy.


🔧 What Broke First

Our original pipeline looked something like:

Prompt → LLM → Output course

And for a moment, that felt like enough.

Until we started testing it properly.

  • Sections contradicted each other
  • Concepts repeated in different ways
  • Terminology drifted across lessons
  • Some parts were strong, others clearly weak

You could generate a course.

You just couldn’t rely on it.


🧱 What We Had to Build Instead

The moment things changed was when we stopped treating this as a generation problem.

We started treating it as a system problem.

The pipeline evolved into something more like:

Input
→ Structured Generation
→ Validation Layer
→ Targeted Rewriting
→ Enrichment (quizzes, scenarios, examples)
→ Compliance Checks
→ Output

Each layer existed for a reason.

Because every time we skipped one — something failed.


🧩 The Hard Parts (That Don’t Show Up in Demos)

Structure Enforcement

We had to stop the model from improvising.

That meant:

  • fixed lesson frameworks
  • defined section types
  • controlled outputs

Targeted Improvement (Not Regeneration)

Regenerating everything just moved the problem around.

Instead:

  • identify weak sections
  • rewrite only those
  • preserve what already works

Cross-Course Consistency

This was harder than expected.

We needed to deal with:

  • duplicated concepts
  • mismatched terminology
  • uneven difficulty

Which meant introducing:

  • internal rules
  • pattern checks
  • consistency constraints

Compliance Awareness

This is where most tools fall down.

We needed:

  • alignment with recognised frameworks
  • the ability to adapt as guidance evolves
  • detection of weak or risky content

🧠 The Shift

At some point, we stopped thinking in prompts.

We started thinking in systems.

AI became one part of the process — not the solution.


🛠️ If You’re Building with AI

It’s very easy to focus on:

  • better prompts
  • better outputs

But the real leverage is in:

  • constraints
  • validation
  • iteration
  • control

Because generation is easy.

Making it usable is not.


🚀 Where This Landed

What started as “generate a course” became:

  • structure
  • validation
  • rewriting
  • enrichment
  • compliance
  • delivery

Not because we wanted more features —

but because without them, none of it worked.


That was the real lesson.

AI doesn’t remove complexity.

It just hides it — until it matters.

Top comments (1)

Collapse
 
cpdforge profile image
CPDForge

Curious how others are handling this.

If you're using AI to generate content (courses, docs, etc.) — how are you dealing with:

  • consistency across outputs
  • reliability / “almost correct” risk
  • maintaining structure at scale

Feels like most tools focus on generation… but not what happens after.

Would be great to hear how others are solving it.