If you live on the internet (which is practically everyone these days), you likely come across a CAPTCHA often - those somewhat irritating puzzles asking you to pick traffic lights to prove you’re not a robot. Have you ever wondered how these things work and why they’re needed? Let’s take a deep dive into it. (P.S. There's a bonus section containing bad CAPTCHA examples at the end).
What's a CAPTCHA
CAPTCHA (which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”) is a tool used by sites to identify whether an internet user is a human or a bot.
Around 1997, AltaVista (a primitive search engine of that decade) was having a tough time fixing the high number of auto URL assets that were hampering its website ranking process severely. To solve this issue, the then-chief scientist of AltaVista, Andrei Broder, created an algorithm that later became famous as CAPTCHA.
Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, CAPTCHAs are sometimes described as reverse Turing tests.
Why is CAPTCHA needed?
Tl;dr - To make sure some sensitive resource is not exploited using bots
CAPTCHAs are used by any website that wishes to restrict usage by bots. For instance, if you’re running a hotel reservation site, there’s a chance that someone might try to take all the bookings using a bot and legit customers will not be left with any slots. Or you might be running an online poll and want to make sure that votes are not manipulated using bots. CAPTCHA comes in handy in these scenarios making sure only human users can perform some actions blocking access to bot users.
How it works
There are different types of CAPTCHAs available in the market these days, each a little different in its workings than the last one. But all of them work on the same principle - ask users to perform some actions that are trivial for a human to do but almost impossible to automate.
Here are some common CAPTCHA flavours and how they work -
Classic CAPTCHA (Text-based CAPTCHA)
These are the oldest variants of CAPTCHAs, as the name suggests. Classic CAPTCHAs work by asking a user to identify words. These words are shown in a distorted blurry manner with different fonts and handwriting which makes it very difficult for bots to identify them using OCR but it’s still trivial for human users to decipher.
ReCAPTCHA
Google came up with a new way to identify human users which doesn’t require users to enter anything. It just asks you to click on a checkbox.
How can you figure out if a user is a human or a bot by just looking at a checkbox click, you may wonder. The answer is not in the click but in what happens before you click the checkbox. Google has a risk analysis engine that looks at things like how the cursor moved on its way to the checkbox (organic random path/acceleration vs cursor moving in a straight line), which part of the checkbox was clicked (random places, or dead on center every time), browser fingerprint, Google cookies & contents, click location history tied to your fingerprint or account if it detects one, etc. to differentiate between organic and automated clicks.
Image-recognition CAPTCHA
For an image-recognition CAPTCHA, users are presented with square images of common objects like bikes, buses, traffic lights, etc. The images may all be from the same large image, or they may each be different. A user has to identify the images that contain certain objects. If their response matches the responses from most other users who have submitted the same test, the answer is considered "correct" and the user passes the test.
Bonus: 5 examples of terrible CAPTCHAs that will make you pull out your hair
_You need a physics degree to solve CAPTCHAs now?_
_Good luck deciphering this_
_If "you shall not pass" was an image_
_Speechless_
_Maybe the robots should take over_
Top comments (0)