You may have read about our recent downtime. We thought it might be a good opportunity to let you know about some of the other behind the scenes things going on here. We continue to answer email, keep the FAQ updated and improve our metadata. Many of you have written about the quality of some of our EPUBs. As you may know, all of our OCR (optical character recognition) is done automatically without manual corrections and while it’s pretty good, it could be better. Specifically we had a pernicious bug where some books’ formatting led to the first page of chapters not being part of some books’ OCRed EPUB. I personally had this happen to me with a series of books I was reading on Open Library and I know it’s beyond frustrating.
To address this and other scanning quality issues, we’re changing the way EPUBs work. We’ve improved our OCR algorithm and we’re shifting from stored EPUB files to on-the-fly generation. This means that further developments and improvements in our OCR capabilities will be available immediately. This is good news and has the side benefit of radically decreasing our EPUB storage needs. It also means that we have to
- remove all of our old EPUBs (approximately eight million items for EPUBs generated by the Archive)
- put the new on-the-fly EPUB generation in place (now active)
- do some testing to make sure it’s working as expected (in process)
We hope that this addresses some of the EPUB errors people have been finding. Please continue to give us feedback on how this is working for you. Coming soon: improvements to Open Library’s search features!
It makes no odds what it is you carry, so long as you carry the truth along with you. – intro to 1893 edition
There are many good responses to “Why do we still have libraries when everything is online?” My favorite one has to do with the importance of finding people to curate and sort and sift through the enormous bulk of online material to create knowledge and wisdom from what is merely just data. Small projects which do not scale. Henry David Thoreau went to Cape Cod in the mid 1800s and wrote about the experience. His writings on Cape Cod were published in 1865 and reprinted many times after that. The text can be found any number of places, but actually flipping through the books reveals a lot more about the cultural history of this book and the text it contains. Just the covers alone are lovely to look at.
Cover featuring the Eastham Windmill
Cover featuring cranberry motif
Looking through the many copies Open Library has, there’s a lot of marginalia and other interesting things to peek at. One version appears to have been purchased for a dollar while another may have cost upwards of thirty.
The book was frequently given to libraries as a gift. Sometimes by people you may have heard of.
Some of these versions have beautiful and unusual illustrations and some have photographs.
Some have illustrations nearly obliterated by low quality scanning (not ours).
And some have little mysteries. What does “By transfer The White House” mean? What did the War Department think of this book?
All of these are aspects of the book–one work,many editions–that surface through close inspection, with human eyes.
The Concord MA library has scanned, assembled and anotated a set of images of Thoreau’s surveys which is another wonderfully curated set of digitized ephemera that help us understand our world..
A slightly more personal note here… it’s been a little over three years since I started working at Open Library and just this past week we hit a milestone of 25,000 emails sent. That’s slightly lower than the number of emails we get because some are just saying “Thank you!” and some we forward to other departments and yes, a few are spam. But the rest–the tech support, the early book returns, the reference questions, the merge requests–have been answered by me and Michelle and Laurel.
It’s been very gratifying to help keep Open Library’s ebook lending library open and thriving and very interesting to watch the ebook environment changing around us since we first opened in a much more limited fashion in 2005. Here’s to ten more years of free ebook lending and a continually improving ebook reader experience in the next ten years!
Next week is the third annual Aaron Swartz Day (2013, 2014), a celebration and Hackathon which takes place at the Internet Archive on November 7th. Please consider joining us. More information about this year’s events can be found here. We have a lot of good news on our end.
My name is Jessamyn and I’ve been working for Open Library for the past few years after being inspired by Aaron Swartz Day 2013. I work with Giovanni Damiola and Michelle Krasowski and many of the other wonderful people at the Archive to keep this valuable resource up and running.