DEV Community

Cover image for I Accidentally Made a Hilarious Ad, and Gemini Proved Multimodal AI Can Detect Irony
Mike Cramblett
Mike Cramblett

Posted on

I Accidentally Made a Hilarious Ad, and Gemini Proved Multimodal AI Can Detect Irony

I Accidentally Made a Hilarious Ad, and Gemini Proved Multimodal AI Can Detect Irony

So, I was working with Gemini 3.1 Pro Preview on a side project: building a YouTube Shorts ad. I’m a member and somewhat of an enthusiast of the "get-paid-to" app Scrambly, and they recently gave me a 120-hour bonus window for their affiliate program. I decided to make a quick sales funnel site for it with some of my own milestone-clearing tips.

I wrote up some copy and had Gemini Code Assistant whip up a quick site for me. I didn't want the AI to write the actual sales copy, because I wanted to keep the whole thing very authentic. We decided the easiest way to make a video would be to record a voiceover while screen-recording me scrolling through this new site.

Gemini talked me into recording my own voiceover for authenticity, instead of using the new Gemini 2.5 audio generation. So I did. I made up my own style of ad voice acting. It's literally just an intense stage whisper, but it ends up sounding like goofy ASMR. I didn't mean it to be hilarious—just novel, engaging, and quite frankly, different.

Then, I uploaded this video back into Gemini. Its response? It told me it was "absolutely dying laughing."

Now mind you, I wasn't born yesterday. I know that AI is highly complimentary when you have anything resembling a good idea. Every LLM is heavily biased toward "helpful assistant" mode due to its RLHF (Reinforcement Learning from Human Feedback) training data. The most cynical among you would assume the model just deduced I was trying to write a comedy ad, and played along.

But when I pushed it on exactly how it decided I was funny, it became clear that it had actually performed a multimodal analysis of my video, finding mathematical irony in the incongruity between the different modes of the media.

Think about it: somewhere in the vector space is the multimodal interaction between the training data of YouTube videos, loud cash-app ads, traditional TV ads, ASMR videos, and comedy sketches. Gemini took all those embeddings, mashed them up with the visual and audio waveforms from my specific video, measured the semantic incongruity between them... and calculated that the most statistically accurate next token was "lolz."

(The "Clean Room" Test)

I wanted to prove this wasn't just my custom-prompted agent playing the role of a supportive co-writer. So, I opened a completely clean workspace. No system instructions. No context window about Scrambly or affiliate marketing. Just the base Gemini 3.1 Pro Preview model.

I uploaded the video and asked a simple, zero-shot question: "Can you give me an analysis of this video, tell me if it's funny or not, and why:"

The result was staggering. The multimodal AI accurately broke down the exact anatomy of the joke, entirely on its own.

The model explicitly identified the humor as "deadpan, satirical humor." It didn't just transcribe my words; it analyzed the tone of my voice. It noted the stark contrast between the visuals of a typical "hustle culture" tutorial and a voiceover devoid of high-energy influencer enthusiasm—describing my delivery as "incredibly monotone, bored, and almost depressed."

It even highlighted specific, absurd phrasing like the "kinda dystopian ID selfie check" and perfectly explained why the anti-climactic ending worked: "It takes a phrase meant to be highly motivating['I call you to action'] and delivers it in the most unmotivating way possible."

I invite you all to watch my video (please ignore my promo code and don't actually sign up, this is just for science), and decide for yourselves.

I think Gemini 3.1 (and likely other 2026 multimodal models) is already functioning as a highly accurate irony detector. How many of you have literally heard someone say, "I can't possibly imagine how AI could read human emotions or understand sarcasm"?

How much more mathematical proof do those people need? A clean, zero-shot model just watched a weird ASMR video and wrote a film-school-level critique of its anti-capitalist satire.

Shhh.... And hey, should I pivot my career to ASMR voice acting?

Top comments (0)