Category Archives: OCR

Bulk Access to OCR for 1 Million Books

The Internet Archive provides bulk access to the over one million public domain books in its Texts Collection. The entire collection is over 0.5 petabytes, which includes raw camera images, cropped and skewed images, PDFs, and raw OCR data. The Internet Archive is scanning 1000 books/day, so this collection is always growing.   The OCR data […]

Also posted in Bulk Access | Tagged , , | Comments closed
  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives