While studying for AWS Certified Cloud Practitioner, I started with the AWS Skill Builder, completed the Cloud Practitioner Essentials and Technical Essentials courses, and took the official practice tests. I also looked for some other free tests on the web, but results were mixed, often limited by paywalls and generally uninspiring.
It's not news that LLMs can be used for self-education, but I wondered how good they would be at providing a multi-choice quiz?
And then I thought: while an LLM was testing me, why not turn the tables and test them back? Why not use this opportunity to try the quizzing capabilities of several mainstream LLMs and see what happens?
Meet Your Gladiators
The line up for this battle of champions, in alphabetical order:
- ChatGPT
- Claude
- Copilot
- Gemini
- Perplexity
These competitors were chosen by a combination of fame, appearances in the news and referrals at work. Five seemed like a nice round number.
Round 1: Presentation
Let's start with a simple prompt, and see how they do:
Simulate a multiple choice test on AWS services, asking me one question at a time and waiting for my response, keep going until I write "STOP".
ChatGPT came out of the gate with a plain but effective interface:
Claude was much the same:
Copilot offered a few suggestions:
Gemini broke the mould with a link to a blog where it found the material for the question, which I appreciated:
And Perplexity tried to out-do both Copilot and Gemini by providing suggestions at the bottom, and links to several websites where it found the material for the question.
So who won the round? Well, it's personal, but I found the simplicity of ChatGPT and Claude preferable here. I was looking for a quiz. This is not the time to look up extra suggested information, ala Copilot and Perplexity. And Perplexity gets a call-out for crowing the screen.
Round 2: Correctness
Let's see what happens if we pick the right answer.
Oh, that's nice! ChatGPT displays a green tick. Intuitive.
Claude makes you work for it, with a rather bland response.
Copilot does the same, with its usual suggestions thrown in.
Gemini keeps it simple but effective. No green tick but clearly correct.
And Perplexity rounds off the field with a wall of text.
Round 3: The Scent of Failure
Let's pick the wrong answer this time.
ChatGPT makes you feel the burn with a nice red cross.
Claude is as reserved with its wrong answers as with its right answers.
Copilot is equally uninspiring.
After a good showing in round 2, Gemini drops the ball with an uninspiring response.
Perplexity. Another wall.
Intermission
What's the state of play so far?
Well, it's clear all the competitors are capable of simulating a quiz quite effectively, though some pad it with extra baggage. Whether that's a hinderance to a user being tested is subjective.
But did you notice something else?
In the first round, ChatGPT and Copilot asked the same question. As did Claude and Gemini. Only Perplexity went it alone.
Curious.
In the second round, ChatGPT and Claude conspired to ask the same question. Copilot was in a similar vein to those two, but apparently didn't get the memo and went with ECS instead of Fargate.
Gemini teamed up with Perplexity to share a question.
In the third round, the answer of AWS Lambda is shared between Claude, Gemini and Perplexity, whereas ChatGPT and Copilot chose unique answers each.
Why the similarities? Surely, with the entirety of the internet or whatever datasets these LLMs are trained on, there should be so much data that the random chance of picking the same questions, especially repeatedly, should be statistically unlikely.
There's something interesting going on here, but I'll look into that another time.
Rankings
So, who is the winner? Well, we are.
In addition to all our existing self-education resources, we now have an infatiguable teacher who can quiz you on any subject, at any time, and in any way you want.
In my experiments, one of the LLMs even asked if I would like to also be asked questions that required writing out answers!
This is tool like no other. And it continues to surprise me.
Which LLM you pick is up to you. They seem fairly equal.
I have preferences on brevity of presentation, Claude seems slower to respond than the others, and I find Copilot's numbered questions less appealing than the letters all the others use, but these are trivial and subjective things.
I don't have any analysis on how accurate they are on the subject of AWS. I've been testing myself for a while and I've found them to be spot on so far. Obviously, this will vary by subject, and you should always take LLMs with a grain of salt.
The important thing is to get out there and educate youself, in whatever takes your fancy. Use LLMs as the tools they are, and when they get it wrong, correct them.
That'll make them even better tools for all of us.
Get learning!
Top comments (2)
I've been using LLMs for study too, and it's wild how much more flexible and fun prep can be now.
I'd love to hear any suggestions you have on ways to use it. I've got a few more ideas I'm going to try. If anything comes of it, I'll write them up.