The BBC has an article about Kalev Leetaru’s project to extract images from millions of Open Library pages.
You can read about how it works…
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text. As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format. The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book. Each Jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.
“I think one of the greatest things people will do is time travel through the images,” Mr Leetaru said.
… or just check out some of the results. Images plus citations plus metadata! We couldn’t be happier. Free to use with no restrictions.
I even found a photo of my house!
Read more details at the Internet Archive’s blog or on Flickr’s “Welcome to the Commons” post.
Open Library will be down from 6:00PM to 8:00PM SF Time (PDT, UTC/GMT -7 hours) on August 19, 2014 due to a scheduled hardware maintenance.
We’ll post updates here and on @openlibrary twitter.
Thank you for your cooperation.
UPDATE 6:45PM PST: The hardware maintenance is complete and openlibrary.org is back online!
The Internet Archive had a booth at Wikimania in London. The booth was in the Community Village section of the conference. We hope you stopped by and said hello, grabbed a sticker or a handout, and learned a bit more about our book scanning projects and told us what you were up to. If you’d like to pick up digital copies of our handouts, PDFs are here.
We also went to a lot of programs that were really worthwhile, the free/open culture vibe was palpable and exciting with 2500+ people all getting together to find ways to share more content in more ways. A few other documents we picked up that might be interesting to other folks.
For people who like working on Wikipedia but are often flustered by paywalls, you should know about the Wikipedia Library which has a project to help editors access reliable sources. The Wikipedia Loves Libraries project is gearing up for a month of wiki-workshops and edit-a-thons at libraries around Open Access Week in October/November.
Amazon’s “Kindle Unlimited” announcement has been helping raise awareness of Open Library.
Last week, Amazon informed us that for ten dollars per month, Kindle users can have unlimited access to over six hundred thousand books in its library. But it shouldn’t cost a thing to borrow a book, Amazon, you foul, horrible, profiteering enemies of civilization. For a monthly cost of zero dollars, it is possible to read six million e-texts at the Open Library, right now. On a Kindle, or any other tablet or screen thing.
Don’t forget our easy to use interface or downloading with your choice of device or software!
Posted in News
Tagged kindle, lending, news
Open Library will be down from 5:00PM to 7:00PM SF time (PDT, UTC/GMT -7 hours) on July 8, 2014 due to scheduled hardware maintenance. We’ll post updates here and on @openlibrary twitter. Thank you for your cooperation.
UPDATE: 5:50PM PDT – the hardware maintenance is complete and openlibrary.org is back online.