But von Ahn said OCR doesn’t always work on text that is older, faded or distorted. In those cases, often the only way to digitize the works is to manually type them into a computer.
Von Ahn is working with the Internet Archive, which runs several book-scanning projects, to use CAPTCHAs for this instead. Internet Archive scans 12,000 books a month and sends von Ahn hundreds of thousands of files that are images that the computer doesn’t recognize. Those files are downloaded onto von Ahn’s server and split up into single words that can be used as CAPTCHAs at sites all over the Internet.