DEV Community

Cover image for Crew with Gemma-4 in Colab
Aman Kr Pandey
Aman Kr Pandey

Posted on

Crew with Gemma-4 in Colab

Let me walk you through something I have been experimenting with lately, running a local AI agent that detects hate speech, powered by Google's Gemma 4 model and the CrewAI framework, all without calling any paid API. No keys, no credits. Just a local model and python. Let me explain how I put this together.

Why Gemma + CrewAI?

Most CrewAI tutorials you will find online default to GPT-4 or Claude. That is fine, but what if you want to run everything locally, maybe for privacy or cost reasons, or just to understand the stack at a deeper level?

Well, I could have gone with Meta's Llamma3.2 3B model, that is a good opensource alternative as well, but Gemma 4 is new and I just wanted to try it, but you can use any opensource model available on Hugging face.

The model I used is google/gemma-4-E2B-it, a 2-billion parameter instruct-tuned version of Google's Gemma 4. It is light enough to run on Colab GPU without any resource crunches, and it loads via Hugging Face transformers library.

The Problem: CrewAI Doesn't Talk to Transformers

CrewAI Agent class expects an LLM object, but out of the box, it only works well with OpenAI-compatible APIs. So I had to build a custom LLM class by inheriting BaseLLM from CrewAI.

Gemma4CrewAILLM class which extends BaseLLM overrides the call() method to handle message formatting and generation manually.

class Gemma4CrewAILLM(BaseLLM):
    def call(self, messages, tools=None, ...):
        # Convert messages to Gemma's format
        # Apply chat template via processor
        # Run model.generate()
        # Strip the prompt from output and return clean text
Enter fullscreen mode Exit fullscreen mode

The key insight here is that Gemma uses AutoProcessor for both tokenization and chat templating, and I had to be careful to set enable_thinking=False to skip the "thinking tokens" that the model might generate.

Defining the Agent and Task

Once the Gemma4CrewAILLM was ready, plugging it into CrewAI was straightforward. I defined a Hate Speech Detection Specialist agent with a detailed role, goal, and backstory, because CrewAI uses these as system-prompt context to guide how the agent reasons.

The task I gave it was to analyze a piece of text and produce a structured 3-bullet report:

  • Verdict: Yes / No / Uncertain with a reason
  • Detected Content: Specific phrases flagged, plus the targeted group
  • Severity and Recommendation: A numeric rating out of 10, plus a recommended action (Allow / Flag for Review / Remove)

This structure forces the model to be accurate rather than giving summaries, which is exactly what you want in a content moderation pipeline.

Here is my gist for your reference Link.

What should I do Next?

This setup is good for trying out new things or development while saving your pockets from expensive tokens. But here are some improvements/extensions that could be done:

  • Add more agents: a second agent for sentiment or intent classification working in parallel
  • Try function calling / tools: Gemma 4/Llamma 3.2 supports structured outputs which could make the crew much more powerful
  • Wrap it in a FastAPI endpoint: so you can POST text and get back a structured moderation report over HTTP

The whole point for me was to prove that we can build a capable, agentic AI pipeline with zero API costs, running entirely on open-weights models. And honestly, for a 2B parameter model, Gemma 4 handles this task well. With models like Llamma-3.2 7B or Gemma-4 4B we can get better results.

If you have been trying local LLMs for agent workflows, I hope you will find this post helpful. Drop your questions or thoughts in the comments, I would love to hear what use cases you are/were thinking about agents.

Top comments (0)