BookReader Work Sprint at NYPL Labs

By mang

We had a really fantastic code/work sprint for the BookReader organized by the most excellent NYPL Labs.  The sprint was designed to bring together organizations that have an interest in the BookReader as a way to foster the sharing of interest, code and expertise.

New York Public Library

We started by making a list of desired features and prioritizing them.  High on the list was to make the code more modular and easier to understand, reuse and extend.  We made great progress towards that goal by creating a new plugin architecture that allows new views of the book to be added cleanly to the existing code.  For example, it will be possible to create a book view that uses the <canvas> tag or other advanced web technologies and have it automatically included in the BookReader application simply by including that plugin’s JavaScript file.

Looking down into the stacks

Another highly desired feature is making it easier for people to use their own books with the BookReader application.  Doug Reside from NYPL Labs contributed a “book loader” (our new term for the piece of code that connects the BookReader to the underlying images and metadata for a book to display) that allows you to specify the images for a book directly inside an HTML file.  This new loader provides a simple way to use the BookReader for your own books.

The new code is currently on the codesprint branch of the BookReader github repository.  We plan to integrate the new plugin system once the code has been polished and tested. Updated documentation is also coming. You can subscribe to the bookreader-announce mailing list to be notified when the code is released. You can also find more information about developing and using the BookReader in our developer resources.

Mitch Brodsky with his BookReader customized for the NY Philharmonic

This works sprint hosted by NYPL Labs marks an exciting new milestone in the development of the BookReader. We’re setting the foundation for greater re-use and collaboration around the BookReader. Many thanks to Doug Reside, David Riordan and Ben Vershbow of NYPL Labs for organizing the sprint and the fantastic attendees who contributed ideas and code commits!

BookReader Sprinters

“We’re Re-Tribalising”

By George Oates

“It’s no longer one thing at a time, but everything all at once.”

Happy Birthday, Marshall McLuhan. (Via @josettemelchor on Twitter.)

Our Search Engine Was Hurting

By George Oates

Sorry to say, but our search engine is all kinds of weird this morning (San Francisco time). Lots of the pages you see around the site, like a Work page or a Subject page, or indeed the Search Results page are driven largely by search.

So, while we’re working on fixing the problem, please excuse the various gaps you might encounter around the place as we resolve it.

Update, 12:50pm PST: We’ve tried running Lucene CheckIndex on the work search index, but it came back with “No problems were detected with this index.” Hmm. Next, we’ll try restoring from backup. Apologies again for the continued oddness around the site!

Update, 2:30pm PST, 7/13/11: OK. We’ve just removed the site alert that you might have noticed across the top of every page, because we think the search is back up online. It turned out that we had to rebuild the whole index from a 2-day-old backup, and then process the last 2 days of changes made on the site into the new index. Phew! If you continue seeing weirdness or results that look less correct than usual, please let us know.

Apologies again for the road bump – and a word of advice? Be sure to keep an eye on how and where your log files are being stored, and accumulated. Turns out it was a huge blob of log files is what ended up choking our search engine, stopping it from being able to be updated. We’ve since modified where we store logs, and how often they are cleaned out.

The Challenges of Getting to Mars: Landing Day, Nerves and Joy

By George Oates

I’ve been spending a bit more time on lately. Not only exploring the nearly 3 million scanned texts, but also our massive video collection. I uncovered this documentary showing earthlings landing something called “Phoenix” on the surface of Mars in June of 2008. Here’s what happened:

Go, humans! And, NASA!

Heads up! Little confirmation email glitch in play

By George Oates

Last week we made some upgrades to the way account management on Open Library works. We’ve been hearing through our contact form that some people have had trouble with their confirmation emails not working. Specifically, clicking on the link to confirm your email address from the email we send you lands you on a page that throws an error.

Just a note to let you know that if this happens to you, there are 2 ways to try to resolve it:

  1. Just try clicking through again on the verification link in the email you got from us, or
  2. Try logging in with your account again. If your verification hasn’t gone through yet, you’ll see a screen that can resend a verification email, and that new link should work.

Sorry for the glitch! It should sort itself out within a day or two.

Announcing a new Read API

By George Oates

One of the goals of Open Library is to make it easy to share bibliographic data. While we’ve had various APIs available from the very beginning and have made bulk data dumps available since forever, there is always room for improvement.

We’re working on 2 new APIs at the moment, and today, we released a tiny baby version of our new Read API. The upcoming Import API was also released for internal use only, deployed as a replacement part for the process Open Library uses to discover new books (and their accompanying MARC records) that are scanned each day by the Internet Archive. (More on the Import API later.)

The Read API
Similar to the way our existing Books API mirrors and is compatible with the Google Books Dynamic Links API, the Read API is very much inspired by, and partially compatible with, the Hathi Trust Bibliographic API.

The idea is, you can hit the Read API with an identifier or a series of identifiers or an array of identifiers, and it will tell whether there is a readable or borrowable version available through Open Library. As you render a page in your own bookish website, you can paint links into Open Library based on the response.

Traversing Works and Editions
The Read API will try to match your identifiers to an OL edition record, and will then return its work and then other editions of that work which also have readable or borrowable resources if the one you’re looking for doesn’t have an available eBook. That way, you can at least point people to a similar version of what they were looking for if the initial query doesn’t find something to read.

I find myself wondering whether this functionality might be useful for other things, like reconciling works data across different systems, or comparing edition fidelity/duplication.

We were thrilled to bits to meet Dan Scott a little while ago when he came to visit us at 300 Funston. He’s a hacker on the Evergreen ILS system, and by day works at Laurentian University. Evergreen’s already been using the OL API for showing covers and tables of contents within their UI, but it was somewhat laborious, needing to blend two of our APIs together to get the desired output. It was great to meet Dan, and we actually ended up designing the Read API response together over the course of an afternoon, specifically to remove that double-step process. Dan has written about this too: The Wonderful New Open Library Read API and Evergreen Integration. The super thing about working with Dan is, once we’ve dotted the Is and crossed the Ts on this, it can be deployed to any and all instances of Evergreen that want it. (Hello, Koha? I’ll be in touch shortly!)

So, there are some initial Read API docs in the Developer’s area, and see a working demonstration of it that Dan & Mike hooked up in a flurry of late night emails and tweaking (which was a pleasure to observe). Head over to the Open Library skin on the Laurentian University’s library catalog to see very young API in action!

The obvious caveat is — as Dan notes — “working code wins,” which is another way of saying we haven’t optimized or scaled anything up for a bazillion hits yet, so results will be a little slow for now. But still! Books! In your catalogs! If you are from a large system that would probably send us a bunch of requests per second, it would be nice if you could give us a head’s up if you’re going to use the Read API. A good place to do that, or ask questions is on our ol-tech mailing list.

By the way, as you may have noticed, a few weeks ago, we mentioned that oclcBot has updated the Open Library with about 4 million OCLC IDs too, which means that if you speak OCLC, you can hit the Read API with your OCLC ID to look for things to read or borrow through Open Library on your site. downtime (resolved)

By George Oates

Apologies for the service interruption. Our new Virtual Machine Environment hiccuped this morning, and we’re shaking out the process we need to go through to restart everything smoothly. Covers should be being served normally from using ISBN or other supported identifiers.

I’ll update here if there’s relevant news – hopefully it won’t be down much longer.

Update, 4:00PM PST: OK. We’re stable again now. Damn race conditions in Linux kernels!

Update, 10:00PM PST: The website is down again.

Update, May 17 2:00AM PST: We’re back online now. But, is offline, which unfortunately means that external sites accessing Open Library covers using ISBNs is also offline. Covers on are being served from an alternate location. (We need to wrangle some DNS settings to get this fixed, hopefully within a few hours.)

Internet Archive is launching a Physical Archive

By George Oates

[Reposted from the main Internet Archive blog.]

Everyone is welcome to the open-house and launch of the new Physical Archive of the Internet Archive in Richmond, California on Sunday June 5th from 4-8pm.

The new Internet Archive facility in Richmond, California, click to enlarge

After 2 years of prototyping and testing a new design for sustainable long-term preservation of physical books records and movies, we are starting with over 300,000 books and gearing up for millions.

Come if you:

– love books, records, or movies
– are concerned about the future of open access and preservation
– want to have something fun to talk about over the water cooler on Monday….

Then, invest an hour with us on a Sunday – Drinks, food, good people.

What you will see:

– A high density, modular system for storing books, video and audio
– A temp controlled environment for long-term preservation
– Our new logistics facility that will catalog and coordinate large collections of books, records and movies.

Who you will meet:

– The Internet Archive Board, Founder, Management Team
– Friends and supporters of the Internet Archive
– Colleagues and leaders from the Library community

Please come! Bring friends and family.

– Secure free parking
2512 Florida Avenue, Richmond, California, 30 minutes north of San Francisco and Berkeley, 415 561 6767.

RSVP to, or just come!

Library of Congress National Jukebox

By George Oates

Wonderful stuff! The Library of Congress has just released more than 10,000 digitized 78s from the Victor Talking Machine Company as a National Jukebox. The recordings are lovely and crackly, as if you’re listening on a gramaphone.

Here’s “Cradle Song 1915”, recorded in Camden, New Jersey:

Nice to see pretty URLs for things too, like Enrico Caruso or things recorded on May 18. Go LC!

5/13: Interesting postscript at Public Knowledge.

The Little Bot That Could

By George Oates

homebuying is lots and lots of paperwork

Meet oclcBot. He was written by Bruce Washburn at OCLC Research to help connect Open Library records to He’s just finished updating almost 4 million Open Library editions with links! No metadata exchange at all, except these identifiers. Tiny, but powerful, because that lets systems that “speak OCLC” communicate directly with Open Library without knowing any Open Library IDs. As Anand mentioned in his recent post about Coverstore Improvements, we’ve also made the system for displaying covers externally using other types of identifiers more efficient.

There was a bit of a bumpy start to oclcBot’s updates, and Bruce and I thought it might be good to hear what it was like in the trenches. From Bruce:

This project was essentially very simple: find corresponding Open Library and OCLC WorldCat records by a shared attribute (ISBN), and update the Open Library record with the corresponding OCLC number. Once OCLC had generated a list of OCLC numbers and their corresponding ISBNs, it seemed to be a simple matter of using the very robust Open Library API to look for matching records, check to see if they already included an OCLC number, and update the record accordingly. Complications arose, related to scale. There were about 90 million ISBNs to check from the OCLC list, and checking them one at a time via the API was projected to take a very long time. So we used a data dump of all the Open Library records to identify those with ISBNs, and also built a very fast index of the OCLC list to check against. With that we were able to produce a new list of Open Library records and corresponding new OCLC numbers. And a batch update facility in the Open Library API made it possible to send API requests 1,000 records at a time. The pre-processing and the batch process both yielded some additional lists that will require more scrutiny to process (records associated with multiple ISBNs, API exceptions for individual records), but the great majority of records were updated via the oclcBot without any further effort.

So, it’s still early days with our Bot operations, but we’re looking for external developers who might be interested to try to do these “surgical strike” style updates to loads of Open Library records at once. If you’re curious, please visit our Writing Open Library Bots in the Open Library Developers area.

Thank you, Bruce!

(And thanks to Solo for the CC BY-NC-SA 2.0 oclcBot photo.)