I Tested Every Major AI Tool Here's What I Actually Found

Over the past few months I have been using pretty much every major AI tool available right now. Not for research and not for a project, but just as part of my daily student life. Assignments, coding, building apps, and asking questions. Normal usage.
Across all of them, I noticed patterns. Things each one does well, and things each one consistently gets wrong. This is just what I found.
Gemini — Strong at Brainstorming, Weak at Memory
Gemini is genuinely good at exploring ideas. If you need to think through a concept, map something out, or generate options, it does that well. It makes connections quickly and keeps up with creative thinking.
The issue is memory. As a conversation gets longer, Gemini starts losing context. You find yourself re-explaining things you already covered. It's not a dealbreaker for short sessions, but for anything that requires building on earlier parts of the conversation, it becomes a problem.
There's also a context switching issue I noticed. I shared two separate HTML files in the same conversation: first a fuel tracking app, then a completely different marketplace app. Gemini reviewed the marketplace and described it as a great fuel app. It had locked onto the first file and couldn't update its understanding when I introduced something entirely different. It wasn't reading what was in front of it anymore, just what it had already decided.
There is also a practical issue with long conversations. If you leave a lengthy chat and come back later to continue, Gemini sometimes fails to load it entirely. The conversation just doesn't open. So not only does it lose context within a session, it can also lose the session itself.
The takeaway: Good for brainstorming at the start of a fresh session. Less reliable for anything that requires continuity.
Grok — Useful for Research, Wrong Tone for Academic Work
Grok has solid web access and does reasonably well when you need current information or want to pull together what's being said about a topic online. For research purposes it's a decent starting point.
Where it falls short is tone. When you use it for academic work like assignments, structured writing, or formal documents, the output comes out too casual. It writes in a relaxed, conversational style that doesn't fit an academic context. You'd need to heavily rewrite anything it produces before submitting it.
I also tested something out of curiosity. I asked Grok to communicate directly with Gemini. It cannot do this. No AI model can contact another AI model. But Grok produced a detailed account of a conversation it claimed to have had with Gemini, including specific things Gemini supposedly said. All of it was fabricated. It presented made-up information as if it had actually happened.
That's worth knowing before you rely on anything it tells you about external sources.
Claude — Good at Code, Needs Work on Visual Design
Claude handles coding tasks well. Logic, debugging, explaining technical concepts, and building out structured systems are all tasks where it's reliable. It stays consistent even on complex problems.
One feature worth mentioning is its memory system. Claude remembers details across conversations, which is genuinely useful and more reliable than what most other tools offer. However, there is a pattern I noticed. If you have been working on a project or learning something specific, such as a coding assignment or a deep technical topic, and you later ask something that even slightly relates to it, Claude tends to pull that previous context back in. Sometimes that's helpful. Other times you're asking something new and it keeps connecting it back to what you were doing before, which isn't always what you need.
The gap I've noticed is in visual design. When you ask Claude to build something that needs to look professional, like a proper web interface or a clean layout, the first result usually looks functional but not polished. It has more of a developer tool or coding app feel to it rather than something that looks finished and professional.
Google's AI Studio is a good example of what a clean, professional AI interface looks like, but Claude's HTML outputs don't naturally land there on the first attempt. It usually gets there after a few rounds of feedback, but the starting point for anything visual is something to be aware of.
Summary: What I Actually Took Away From This
Each of these tools has something it does better than the others, and something it consistently gets wrong. None of them are the complete solution they're presented as. What I found works better is using them for specific things:
Gemini for early brainstorming before the conversation gets too long.
Grok for pulling together research on current topics.
Claude for writing code and working through technical problems.
The tools are still developing. But right now, knowing what each one is actually good at, based on real usage and not marketing, makes a bigger difference than picking one and hoping it handles everything.
That's just what I found.
Based on personal day-to-day usage as a CS student.

ai #programming #productivity #beginners

DEV Community

I Tested Every Major AI Tool Here's What I Actually Found

ai #programming #productivity #beginners

Top comments (0)