5 August 2015 — This blog post shares some lessons learned about batch optical character recognition on PDF documents. There were 3 challenges: deciding whether OCR is necessary for a document, choosing an OCR package, and assessing OCR results. Background In 2014 I drew the assignment of extracting text from over 20,000,000 pages in about…