CAPTCHA’s and digitizing old books, who wudda thunk?
Tiny URL: http://tinyurl.com/6zhyf5If you thought that CAPTCHA’s were only used to fight spam, think again. reCAPTCHA is a free CAPTCHA service that helps to digitize old print media! A captcha is a program to decode whether the user in questions is a human or a computer bot. It’s common to see captcha’s in forms to ensure that an automated bot is not filling out forms and creating fake accounts. This is what a typical captch looks like
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.
reCAPTCHA is making positive use of this human effort. To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then transformed into text using “Optical Character Recognition” (OCR). The problem is OCR is not perfect. Lot’s of text in images cannot be decoded correctly. So, OCR images are included in reCPATCHA images for human’s to decode.
The obvious question that arises is if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
So the next time you get annoyed at seeing a CAPTCHA, just go along with it. It’s time and effort spent towards a good cause! I for one will be viewing captcha’s very differently from now on.



Add New Comment
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks
(Trackback URL)