Google Acquires reCAPTCHA

Google has acquired reCAPTCHA, one of top providers of CAPTCHAs. If you don’t know, a CAPTCHA is a string of letter or words that you may have to enter while signing up for a web service. These are used to block spammers from creating large numbers of accounts.

Google is not interested in reCAPTCHA’s CAPTCHAs. It is looking for ways to improve the optical character recognition software that it is using to scan texts for its Google Books and Google News Archive Search projects.

According to Google, more than 100,000 websites currently use reCAPTCHA to prevent spam. The reCAPTCHA team is currently based at Carnegie Mellon University and will shortly join Google.

ReCAPTCHA uses scans of old document which are either damaged, poorly printed, or use obscure fonts, as a source for CAPTCHA authentication images. reCATPCHA has come up with a cool way to crowdsource book transcriptions. The CAPTCHA box provides users with two words. One is the control word i.e. one that the OCR software is certain about. The second word is uncertain i.e. the OCR software doesn’t know for sure what it is. Once a certain number of users have solved the uncertain word, the OCR software comes to know what it is and the word then becomes a control word itself. Because of this mechanism, reCAPTCHA is better than other CAPTCHA services or scripts.

Google will now use the above mentioned technology to improve its OCR projects. Google has made 1 million out-of-copyright books available for download through its Google Books project. The books that are available have not been edited and so may contain many OCR mistakes.

After the acquisition of reCAPTCHA, Google could bring down the error rate and make Google Books even more useful.

Leave a Reply