Crew with Gemma-4 in Colab

#ai #agents #programming

Let me walk you through something I have been experimenting with lately, running a local AI agent that detects hate speech, powered by Google's Gemma 4 model and the CrewAI framework, all without calling any paid API. No keys, no credits. Just a local model and python. Let me explain how I put this together.

Why Gemma + CrewAI?

Most CrewAI tutorials you will find online default to GPT-4 or Claude. That is fine, but what if you want to run everything locally, maybe for privacy or cost reasons, or just to understand the stack at a deeper level?

Well, I could have gone with Meta's Llamma3.2 3B model, that is a good opensource alternative as well, but Gemma 4 is new and I just wanted to try it, but you can use any opensource model available on Hugging face.

The model I used is google/gemma-4-E2B-it, a 2-billion parameter instruct-tuned version of Google's Gemma 4. It is light enough to run on Colab GPU without any resource crunches, and it loads via Hugging Face transformers library.

The Problem: CrewAI Doesn't Talk to Transformers

CrewAI Agent class expects an LLM object, but out of the box, it only works well with OpenAI-compatible APIs. So I had to build a custom LLM class by inheriting BaseLLM from CrewAI.

Gemma4CrewAILLM class which extends BaseLLM overrides the call() method to handle message formatting and generation manually.

class Gemma4CrewAILLM(BaseLLM):
    def call(self, messages, tools=None, ...):
        # Convert messages to Gemma's format
        # Apply chat template via processor
        # Run model.generate()
        # Strip the prompt from output and return clean text

The key insight here is that Gemma uses AutoProcessor for both tokenization and chat templating, and I had to be careful to set enable_thinking=False to skip the "thinking tokens" that the model might generate.

Defining the Agent and Task

Once the Gemma4CrewAILLM was ready, plugging it into CrewAI was straightforward. I defined a Hate Speech Detection Specialist agent with a detailed role, goal, and backstory, because CrewAI uses these as system-prompt context to guide how the agent reasons.

The task I gave it was to analyze a piece of text and produce a structured 3-bullet report:

Verdict: Yes / No / Uncertain with a reason
Detected Content: Specific phrases flagged, plus the targeted group
Severity and Recommendation: A numeric rating out of 10, plus a recommended action (Allow / Flag for Review / Remove)

This structure forces the model to be accurate rather than giving summaries, which is exactly what you want in a content moderation pipeline.

Here is my gist for your reference Link.

What should I do Next?

This setup is good for trying out new things or development while saving your pockets from expensive tokens. But here are some improvements/extensions that could be done:

Add more agents: a second agent for sentiment or intent classification working in parallel
Try function calling / tools: Gemma 4/Llamma 3.2 supports structured outputs which could make the crew much more powerful
Wrap it in a FastAPI endpoint: so you can POST text and get back a structured moderation report over HTTP

The whole point for me was to prove that we can build a capable, agentic AI pipeline with zero API costs, running entirely on open-weights models. And honestly, for a 2B parameter model, Gemma 4 handles this task well. With models like Llamma-3.2 7B or Gemma-4 4B we can get better results.

If you have been trying local LLMs for agent workflows, I hope you will find this post helpful. Drop your questions or thoughts in the comments, I would love to hear what use cases you are/were thinking about agents.

DEV Community

Crew with Gemma-4 in Colab

Why Gemma + CrewAI?

The Problem: CrewAI Doesn't Talk to Transformers

Defining the Agent and Task

What should I do Next?

Top comments (0)