Telling computers, humans apart
Dear PropellerHeads: I purchased tickets to a Salvador Dali exhibition on the Web and the order form asked me to "enter the code below." Underneath were some random letters written in a wavy font. Even for a Dali site, this seemed weird. What gives?
A: Obviously, the site was making sure you had what it takes to enjoy a Dali exhibition. Are you sure the letters were just wavy and not really melting off the side of the page?
Actually, what you ran across is called a CAPTCHA. That's a Completely Automated Public Turing Test to Tell Computers and Humans Apart, and probably the sloppiest acronym in all of computerdom. Personally, I find them so entertaining the ends of my moustache curl up every time I see one.
Named after the father of modern computer science (Alan Turing), a Turing test is a way of distinguishing programs from people by engaging them in a conversation. In other words, a Turing test by definition already separates the circuit-based from the carbon-based. Thanks to The Department of Redundancy Department for providing the acronym.
The purpose of a CAPTCHA is to ensure that a human filled out the page instead of an automated program. They're used mostly on sign-up pages for free e-mail accounts, social networking sites and any place you might buy tickets on the Web. For example, Google employs them to prevent spamming software from creating accounts on their Gmail service.
Online order forms, Web-based polls and "contact us" pages have recently started using CAPTCHAs also. Even some blog sites require users to answer simple math questions ("What is 2+2?") before allowing them to post comments. The questions change every time, defeating programs designed to log in and post Viagra ads.
Most CAPTCHAs consist of random letters or words and resemble a Dali-inspired word jumble. The letters drip down the side of the image, or words appear on top of other words. Sometimes a multi-colored spiral or gradient is used for the background, or a grid is superimposed on the image.
The idea is to make the words hard for a computer program to read, but easy for a person. To be effective, the images must change every time or cycle through an undetectable pattern.
Many of the CAPTCHAs found on the Web are powered by reCAPTCHA (google.com/recaptcha), a Google-owned system that is being used to digitize over a hundred years' worth of New York Times archives. The source material is scanned into a computer, which uses Optical Character Recognition (OCR) software to convert the scanned images into searchable text.
Words that cannot be "read" by the OCR software are used as CAPTCHAs on sites like CNN and Ticketmaster, effectively farming the tough work out to us suckers when we try to complete transactions on various websites.
ReCAPTCHA usually displays two words instead of one, making it harder for automated programs to bypass the check. Type "funny captchas" into your favorite search engine to see screenshots of some absurd and amusing (and vulgar) combinations, like "flushing economy" or "professional cannibal."
ReCAPTCHA provides a "refresh" button that fetches a new pair of words from the Web, just in case the pair it serves up the first time is too difficult to make out. Alternatively, an audio option lets you play a sound file, then type the words you hear into the box to proceed.
For more background on how spammers made your Web-surfing more surreal, check out http://bit.ly/cj5ulz. Of course, it won't be long before the Viagra-pushers figure out a way around these safeguards. Oh, The Persistence of Spammers.
When the PropellerHeads at Data Directions aren't busy with their IT projects, they love to answer questions on business or consumer technology. E-mail them to questions@askthepropellerheads.com or contact us at Data Directions Inc., 8510 Bell Creek Road, Mechanicsville, VA 23116. Visit our website at www.askthepropellerheads.com.