Celebrating 20 Years of the Internet Archive with an Open Library Feature Bonanza

To celebrate the Internet Archive’s 20th anniversary, the Open Library team has added pages for 200,000 new modern works and rolled out a brigade of fixes and features to improve our user experience.

screen shot of book reader

Over the past year, Open Library’s digital librarian Jessamyn West and lead engineer Brenton Cheng have worked tirelessly with the engineering team and volunteer community to outline a roadmap for revitalizing Open Library and address the issues most affecting our users. We’re proud to announce progress on several fronts, including social sharing, improved book lending, a mobile-optimized book reader, full-text search, a new developer tool, and the addition of thousands of new modern works.

Continue reading

Towards better EPUBs at Open Library and the Internet Archive

Screen Shot 2016-06-23 at 17.26.54

You may have read about our recent downtime. We thought it might be a good opportunity to let you know about some of the other behind the scenes things going on here. We continue to answer email, keep the FAQ updated and improve our metadata. Many of you have written about the quality of some of our EPUBs. As you may know, all of our OCR (optical character recognition) is done automatically without manual corrections and while it’s pretty good, it could be better. Specifically we had a pernicious bug where some books’ formatting led to the first page of chapters not being part of some books’ OCRed EPUB. I personally had this happen to me with a series of books I was reading on Open Library and I know it’s beyond frustrating.

To address this and other scanning quality issues, we’re changing the way EPUBs work. We’ve improved our OCR algorithm and we’re shifting from stored EPUB files to on-the-fly generation. This means that further developments and improvements in our OCR capabilities will be available immediately. This is good news and has the side benefit of radically decreasing our EPUB storage needs. It also means that we have to

  • remove all of our old EPUBs (approximately eight million items for EPUBs generated by the Archive)
  • put the new on-the-fly EPUB generation in place (now active)
  • do some testing to make sure it’s working as expected (in process)

We hope that this addresses some of the EPUB errors people have been finding. Please continue to give us feedback on how this is working for you. Coming soon: improvements to Open Library’s search features!

Not just scanning – Thoreau’s Cape Cod

It makes no odds what it is you carry, so long as you carry the truth along with you. – intro to 1893 edition

There are many good responses to “Why do we still have libraries when everything is online?” My favorite one has to do with the importance of finding people to curate and sort and sift through the enormous bulk of online material to create knowledge and wisdom from what is merely just data. Small projects which do not scale. Henry David Thoreau went to Cape Cod in the mid 1800s and wrote about the experience. His writings on Cape Cod were published in 1865 and reprinted many times after that. The text can be found any number of places, but actually flipping through the books reveals a lot more about the cultural history of this book and the text it contains. Just the covers alone are lovely to look at.

cover of Cape Cod featuring windmill

Cover featuring the Eastham Windmill

Continue reading

25,000 emails in three years

25,000

A slightly more personal note here… it’s been a little over three years since I started working at Open Library and just this past week we hit a milestone of 25,000 emails sent. That’s slightly lower than the number of emails we get because some are just saying “Thank you!” and some we forward to other departments and yes, a few are spam. But the rest–the tech support, the early book returns, the reference questions, the merge requests–have been answered by me and Michelle and Laurel.

Continue reading

February 1-5 is #ColorOurCollections Week

There are a lot of neat public domain images in our collections. We’ve highlighted them in the past and continue to encourage people to use, remix and share our content. This week for the #ColorOurCollections event, we’ve pulled out some especially colorable images and made them into PDFs that you can print out and color. We’ve created a few pairs of images we think you’ll like. Here are the images and links to the books where you can find and download even more. If you just want to download a zip file of all eight images, click here.

apollos_genii nuptial_bath

Continue reading