What is a CAPTCHA? More importantly, why is it? What does it stand for? Who invented it? Who’s working on this tech? So many questions? If you looked, you probably found an endless rabbit hole of information. How far did you go? How many questions did you ask?
So Many BOTs
These CAPTCHA solutions were created to combat internet bots. Bots (as in robots) are automated software programs that perform repetitive tasks, such as downloading all the content on a site, a.k.a web crawlers.
More than half of internet traffic is from these little bastards. Bots are commonly used to scan for email addresses on networks. These email addresses are then used to engage in email spamming campaigns or maybe orchestrate social engineering hacks. Alternatively, they could be used to look for vulnerable sites to hack.
Hackers will use vulnerabilities to take control of devices and use it for their goals, such as:
- Make a device part of their botnet a.k.a scrumping
- Use a devices cpu to mine for bitcoin
- Perform DDoS attacks
- Run crawlers
- Steal Data
- Send spam
- Host elicit content
When it comes to google’s robots, you can control what’s indexable by using a robots.txt file on your server.
When google indexes your site, that’s good because that will lead to human visitors, at least one can hope.
However, when a thief steals all your content to repurpose it on their site, that’s bad.
You’ve probably seen those annoying things on a some website. Basically, you have to stop what you’re doing and enter some text before you can get to what you want.
They’ve gone through different versions. You’ve probably seen one that uses a combo of letters and numbers, probably distorted/warped a bit.
You might have seen an image divided into a grid and been asked to click on the street signs, or solve a puzzle.
Sometimes you just need to just to click inside a box.
It’s called a C.A.P.T.C.H.A.
Yes, it is an acronym. It stands for a Completely Automated Public Turing test to tell Computers and Humans Apart.
To learn about the inventors:
visit the CAPTCHA wiki article.
These creators used this concept back in 1997. This was the start of the synthetically distorted text image that requires the user to enter it into a textbox before proceeding.
Computers started to get pretty good at figuring this out and the text had to get more and more distorted.
In 2014, Google did some testing on their state of the art machines to predict what the future will be like, and they saw that their algos could defeat any synthetically distorted text 99.8% of the time.
ReCAPTCHA by Google
Fast forward to today’s tech of just clicking in a box. This is called reCAPTCHA and it’s a product by Google.
So how does this work, it’s just a click, how can this stop bots?
Well, the trick is that the js code sends info to google’s servers. If you want to look at the code used, here’s the url for viewing Google’s code in the browser:
- The referrer
- The sitekey which is created with google when the reCaptcha is registered
- The cookie
- User created info, in this case it’s just a click, or if there is a puzzle, the solution to the puzzles
- Google’s Servers will look at
- Rendering of the canvas element
- Canvas rendering is a known way to fingerprint users and google does this by drawing something, putting it into base64 and sending that to the servers.
- Behavior of many browser-specific functions and CSS rules
- Screen resolution
- It’s auto flagged as suspicious, if the browser is outdated, if the user-agent is inaccurate (running mozilla but saying it’s chrome),and if it’s at all malformed.
- Execution time, timezone
- Number of click/keyboard/touch actions
- Rendering of the canvas element
Google’s ReCAPTCHA widget requires a call to recaptcha.anchor.Main.init and passes two base64 encoded parameters. You can easily look at this by opening your chrome developer tools next time you see a ReCAPTCHA.
Google’s servers process this data, very quickly, and determine if entry should be granted. There are different challenge types:
- Just the checkbox
- Image based
- Text based
- probably others in the works, I saw Binance using a draggable widget which had to be put into the correct place, which is a new type of CAPTCHA
Google puts data from a user’s session into their risk analysis engine. This system is designed to detect suspicious browser attributes or behavior. It uses a broader set of cues so that the challenge is still important but not the only determining factor.
Someone took a couple of days to really delve into the code and decompiled the bytecode to figure out what’s going on under the hood. If you’re real curious you can find out more at the Inside Google’s ReCAPTCHA Github.
If you wanna go Black Hat you can read the research paper on how to break the CAPTCHA.
This chinese cryptocurrency exchange implements a CAPTCHA technique that I haven’t seen in other places. It involves a specific user interaction that combines cursor movement, clicking, holding, and releasing at the appropriate time. In other words it uses multiple human skills, visual detection and motion.
After you create an account, and login, you see this.
Here it is in action, mid motion.
It’ll only pass if you release the click at just the right time.
If someone wants to get past a copyright infringement algo, they might alter the audio and video to fall outside the parameters of an algorithm. People are constantly look for a way to game algorithms, especially if there’s money involved, but sometimes just for the thrill of it.
The whole reason ReCAPTCHA exists is because people work hard to defeat computers. Often times these activities are unethical, and often illegal, but still earn money.
People share copyrighted content, sell illegally gained info, etc… This is the world of the hacker. Now not all hackers are bad, some are innovators and help improve human life. Some are ethical hackers who detect system vulnerabilities and increase the security of governments, businesses, institutions, and private homes.
No matter what kind of hacker you look at, their culture is fascinating.
Early 1980s hacker invented variation on character usage to confused computers and prevent their communications from being scanned and matched by various flagged keywords.
That way search engines wouldn’t be able to link to their forum convo’s, comments, etc…
Why the word leet? Where does that come from? Well it’s cuz when you defeat a computer, you’re part of the elite.
Here’s a list of some popular ones.
- Pwnd = owned
- Noob = Newbie
- Pr0n or n0rp = Porn
- @$$ = ass
- $#!+ = shit (please pardon my language, just thought it interesting to include)
- Haxor or suxxor or suxorz = Hacker
- W00t (with zeros) = We owned the other team (or just a shout of joy)
There are some rules to doing this, and you can look at these recurring forms in the morphology section of the Leet Wiki Page.
Really the above isn’t true leet, it’s just half@$$ed leet.
The real one is called 1337, and there are 26 rules which only uses numbers and non alphanumeric chars.
- A is 4
- B is |3
- C is (
- D is |)
- E is 3
- F is |=
- G is 6
- H is |-|
- I is |
- J is 9
- K is |<
- L is 1
- M is |v|
- N is |/|
- forward slash is used since it’s famous for not showing up on some sites
- O is 0
- P is |*
- Q is 0
- R is |2
- S is 5
- T is 7
- U is |_|
- V is |/
- W is |/|/
- X is ><
- Y is `/
- Z is 2
There you go, now you’re a hacker, just kidding, that’ll take years of hard work. Even then, you might not have the chops.
Not trying to sound high and mighty, I’ve never maliciously hacked any tech, not my goal.
Back to captcha.
Future of CAPTCHA
Anyway, have you thought about why this type of tech will always be necessary, right of course you did, but what might that look like in the future?
With the advent of cloud computing and just plain old faster processors, computers are getting better at behaving and thinking like humans. Due to this we’ll likely see new solutions coming out that address newly found flaws in existing systems.
Seen any interesting and innovative CAPTCHA tech? Please share.