Tag Archives: python

Bulk Access to OCR for 1 Million Books

The Internet Archive provides bulk access to the over one million public domain books in its Texts Collection. The entire collection is over 0.5 petabytes, which includes raw camera images, cropped and skewed images, PDFs, and raw OCR data. The Internet Archive is scanning 1000 books/day, so this collection is always growing.   The OCR data […]

Posted in Bulk Access, OCR | Also tagged , | Comments closed
  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives