Time travel through millions of historic Open Library images

The BBC has an article about Kalev Leetaru’s project to extract images from millions of Open Library pages.

You can read about how it works…

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text. As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format. The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book. Each Jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.

“I think one of the greatest things people will do is time travel through the images,” Mr Leetaru said.

… or just check out some of the results. Images plus citations plus metadata! We couldn’t be happier. Free to use with no restrictions.

Image from page 301 of "The New England magazine" (1887)

Image from page 788 of "St. Nicholas [serial]" (1873)

Image from page 210 of "Farmington, Connecticut, the village of beautiful homes" (1906)

Image from page 1121 of "The Saturday evening post" (1839)

Image from page 368 of "New England; a human interest geographical reader" (1917)

Image from page 902 of "Canadian grocer July-December 1896" (1889)

Image from page 249 of "Gleanings in bee culture" (1874)

Image from page 411 of "The Canadian druggist" (1889)

I even found a photo of my house!

Image from page 75 of "A text book of the geography, history, constitution and civil government of Vermont; also Constitution and civil government of the U. S., a publication expressly prepared to comply with Vermont's state school laws" (1915)

Read more details at the Internet Archive’s blog or on Flickr’s “Welcome to the Commons” post.

2 thoughts on “Time travel through millions of historic Open Library images

Comments are closed.