Announcing a new Read API

By George Oates

One of the goals of Open Library is to make it easy to share bibliographic data. While we’ve had various APIs available from the very beginning and have made bulk data dumps available since forever, there is always room for improvement.

We’re working on 2 new APIs at the moment, and today, we released a tiny baby version of our new Read API. The upcoming Import API was also released for internal use only, deployed as a replacement part for the process Open Library uses to discover new books (and their accompanying MARC records) that are scanned each day by the Internet Archive. (More on the Import API later.)

The Read API
Similar to the way our existing Books API mirrors and is compatible with the Google Books Dynamic Links API, the Read API is very much inspired by, and partially compatible with, the Hathi Trust Bibliographic API.

The idea is, you can hit the Read API with an identifier or a series of identifiers or an array of identifiers, and it will tell whether there is a readable or borrowable version available through Open Library. As you render a page in your own bookish website, you can paint links into Open Library based on the response.

Traversing Works and Editions
The Read API will try to match your identifiers to an OL edition record, and will then return its work and then other editions of that work which also have readable or borrowable resources if the one you’re looking for doesn’t have an available eBook. That way, you can at least point people to a similar version of what they were looking for if the initial query doesn’t find something to read.

I find myself wondering whether this functionality might be useful for other things, like reconciling works data across different systems, or comparing edition fidelity/duplication.

We were thrilled to bits to meet Dan Scott a little while ago when he came to visit us at 300 Funston. He’s a hacker on the Evergreen ILS system, and by day works at Laurentian University. Evergreen’s already been using the OL API for showing covers and tables of contents within their UI, but it was somewhat laborious, needing to blend two of our APIs together to get the desired output. It was great to meet Dan, and we actually ended up designing the Read API response together over the course of an afternoon, specifically to remove that double-step process. Dan has written about this too: The Wonderful New Open Library Read API and Evergreen Integration. The super thing about working with Dan is, once we’ve dotted the Is and crossed the Ts on this, it can be deployed to any and all instances of Evergreen that want it. (Hello, Koha? I’ll be in touch shortly!)

So, there are some initial Read API docs in the Developer’s area, and see a working demonstration of it that Dan & Mike hooked up in a flurry of late night emails and tweaking (which was a pleasure to observe). Head over to the Open Library skin on the Laurentian University’s library catalog to see very young API in action!

The obvious caveat is — as Dan notes — “working code wins,” which is another way of saying we haven’t optimized or scaled anything up for a bazillion hits yet, so results will be a little slow for now. But still! Books! In your catalogs! If you are from a large system that would probably send us a bunch of requests per second, it would be nice if you could give us a head’s up if you’re going to use the Read API. A good place to do that, or ask questions is on our ol-tech mailing list.

By the way, as you may have noticed, a few weeks ago, we mentioned that oclcBot has updated the Open Library with about 4 million OCLC IDs too, which means that if you speak OCLC, you can hit the Read API with your OCLC ID to look for things to read or borrow through Open Library on your site.

openlibrary.org downtime (resolved)

By George Oates

Apologies for the service interruption. Our new Virtual Machine Environment hiccuped this morning, and we’re shaking out the process we need to go through to restart everything smoothly. Covers should be being served normally from covers.openlibrary.org using ISBN or other supported identifiers.

I’ll update here if there’s relevant news – hopefully it won’t be down much longer.

Update, 4:00PM PST: OK. We’re stable again now. Damn race conditions in Linux kernels!

Update, 10:00PM PST: The website is down again.

Update, May 17 2:00AM PST: We’re back online now. But, covers.openlibrary.org is offline, which unfortunately means that external sites accessing Open Library covers using ISBNs is also offline. Covers on openlibrary.org are being served from an alternate location. (We need to wrangle some DNS settings to get this fixed, hopefully within a few hours.)

Internet Archive is launching a Physical Archive

By George Oates

[Reposted from the main Internet Archive blog.]

Everyone is welcome to the open-house and launch of the new Physical Archive of the Internet Archive in Richmond, California on Sunday June 5th from 4-8pm.

The new Internet Archive facility in Richmond, California, click to enlarge

After 2 years of prototyping and testing a new design for sustainable long-term preservation of physical books records and movies, we are starting with over 300,000 books and gearing up for millions.

Come if you:

– love books, records, or movies
– are concerned about the future of open access and preservation
– want to have something fun to talk about over the water cooler on Monday….

Then, invest an hour with us on a Sunday – Drinks, food, good people.

What you will see:

– A high density, modular system for storing books, video and audio
– A temp controlled environment for long-term preservation
– Our new logistics facility that will catalog and coordinate large collections of books, records and movies.

Who you will meet:

– The Internet Archive Board, Founder, Management Team
– Friends and supporters of the Internet Archive
– Colleagues and leaders from the Library community

Please come! Bring friends and family.

– Secure free parking
2512 Florida Avenue, Richmond, California, 30 minutes north of San Francisco and Berkeley, 415 561 6767.

RSVP to rsvp@archive.org, or just come!

Library of Congress National Jukebox

By George Oates

Wonderful stuff! The Library of Congress has just released more than 10,000 digitized 78s from the Victor Talking Machine Company as a National Jukebox. The recordings are lovely and crackly, as if you’re listening on a gramaphone.

Here’s “Cradle Song 1915″, recorded in Camden, New Jersey:

Nice to see pretty URLs for things too, like Enrico Caruso or things recorded on May 18. Go LC!

5/13: Interesting postscript at Public Knowledge.

The Little Bot That Could

By George Oates

homebuying is lots and lots of paperwork

Meet oclcBot. He was written by Bruce Washburn at OCLC Research to help connect Open Library records to Worldcat.org. He’s just finished updating almost 4 million Open Library editions with links! No metadata exchange at all, except these identifiers. Tiny, but powerful, because that lets systems that “speak OCLC” communicate directly with Open Library without knowing any Open Library IDs. As Anand mentioned in his recent post about Coverstore Improvements, we’ve also made the system for displaying covers externally using other types of identifiers more efficient.

There was a bit of a bumpy start to oclcBot’s updates, and Bruce and I thought it might be good to hear what it was like in the trenches. From Bruce:

This project was essentially very simple: find corresponding Open Library and OCLC WorldCat records by a shared attribute (ISBN), and update the Open Library record with the corresponding OCLC number. Once OCLC had generated a list of OCLC numbers and their corresponding ISBNs, it seemed to be a simple matter of using the very robust Open Library API to look for matching records, check to see if they already included an OCLC number, and update the record accordingly. Complications arose, related to scale. There were about 90 million ISBNs to check from the OCLC list, and checking them one at a time via the API was projected to take a very long time. So we used a data dump of all the Open Library records to identify those with ISBNs, and also built a very fast index of the OCLC list to check against. With that we were able to produce a new list of Open Library records and corresponding new OCLC numbers. And a batch update facility in the Open Library API made it possible to send API requests 1,000 records at a time. The pre-processing and the batch process both yielded some additional lists that will require more scrutiny to process (records associated with multiple ISBNs, API exceptions for individual records), but the great majority of records were updated via the oclcBot without any further effort.

So, it’s still early days with our Bot operations, but we’re looking for external developers who might be interested to try to do these “surgical strike” style updates to loads of Open Library records at once. If you’re curious, please visit our Writing Open Library Bots in the Open Library Developers area.

Thank you, Bruce!

(And thanks to Solo for the CC BY-NC-SA 2.0 oclcBot photo.)

Mike Matas: A next-generation digital book (TED)

By George Oates

Coverstore Improvements

By Anand Chitipothu

We have done some improvements to coverstore, the Open Library book covers service, recently.

Now it is possible to access book covers by all the available identifiers. For example:

http://covers.openlibrary.org/b/goodreads/6383507-M.jpg
http://covers.openlibrary.org/b/librarything/8071257-M.jpg

Accessing covers by ISBNs is insensitive to hyphens now. For example, all the following URLs point to the same cover and this works even if the ISBN is specified with hyphens in the edition record.

http://covers.openlibrary.org/b/isbn/1-59286-793-6-M.jpg
http://covers.openlibrary.org/b/isbn/1592867936-M.jpg

Please refer to the Open Library Covers API for more details.

We have built a secondary database for storing edition identifiers to cover ID mapping to make the accessing covers faster. Because of this there is some delay between adding an identifier to an edition record and accessing the cover using the newly added identifier. The delay is usually couple of seconds.

Please note that this API is intended for displaying covers on public facing websites and not for bulk download. To download the book covers in bulk, please refer to Bulk Access section of the API documentation.

We have recently noticed that some bots are downloading book covers by ISBNs at very high rate and that effected the performance of the system badly. We have added rate-limiting to limit the number of requests per IP address. The current allowed limit is 100 requests per IP for every 5 minutes. The limit is applicable only for cover accesses by various identifiers and there is no limit of accessing covers by cover ID.

This limit should be good enough for linking covers on public facing websites. Please consider using Open Library Books API if your website demands more. Since the Books API provides book cover URLs using cover IDs, the rate-limit won’t be applicable.

Please get in touch with us if you need any assistance in using this API to show book covers on your website.

pystatsd & 5,000 Lists!

By George Oates

We’re working hard to improve Open Library’s general stability and performance, after a few harrowing weeks moving our hardware infrastructure around. We’re beginning to measure more stuff across the site, from general activity levels (about 40,000 catalog edits every month!) to quite specific actions (like, seeing that every second, 1-3 people open up our BookReader).

We’ve begun using a super awesome, real-time stats processing package called pystatsd, a Python implementation of Etsy’s statsd server. My favourite bit is a program that sits on top of that called graphite which takes all the stats we collect with pystatsd and renders them as graphs in a browser. Suddenly, we can see the system in a new and useful way.

We’re also looking hard at improving our memcached configuration, recently introducing another 4 memcached machines into our pool. Now that we can measure memcached hits and misses using pystatsd and graphite, we’ll be able to tell when our caching stuff is actually improving. Yay!

Memcached hits & misses

Another tweak you might find interesting… it used to be that lists would only show up on the main Lists page if they contained at least 3 seeds. The other day, Raj and I upped that to at least 5 seeds, and that immediately produced a selection of arguably more interesting lists, most of which settle around a subject area. Here’s a small selection:

Have you made a great list, or found someone else’s? Let us know in the comments!

Alice: The On-Line Catalog

By George Oates

Ohio University's Alden Library Alice Catalog, 1983

So awesome. That’s 1983 in Ohio, folks.

New Titles in Lending Library!

By George Oates

Our little lending library is continuing to grow, this time with 90 new titles purchased directly from two fabulous eBook publishers: A Book Apart & Smashwords.

3 titles from A Book Apart are all must-reads for any discerning web professional…

Thanks to Mandy, Jeffrey and Jason at A Book Apart for joining in the fun. (Incidentally, Mandy’s blog, A Working Library, is a great read.)

There are also 87 ePubs from Smashwords, by authors like Amanda Hocking, Ruth Ann Nordin and Gerald M. Weinberg

Thanks to @markcoker at Smashwords for working with us to get these new titles online.

Loans through the Open Library are exclusive to one Open Library account holder at once, for up to two weeks. For most titles, you can access the eBooks in one of three ways: directly in your web browser (using our BookReader), as a PDF or ePub (downloaded into Adobe Digital Editions). The new Smashwords titles are a bit different – they’re only available in ePub format, so only downloadable and readable in Adobe Digital Editions.

The Internet Archive (and Open Library) is actively seeking publishers who’d like us to buy their eBooks and make them available in the Lending Library. If you are a publisher interested to sell us your wares, please get in touch!

While that Lending Library — available to anyone with an Open Library account — is growing, we’re also working to expand the collection for our “In-Library” loans, currently at about 85,000 eBooks. This special In-Library program is a bit different, because it requires patrons to literally be inside a participating library’s network. Once that’s the case, patrons can see all the books available in the In-Library collection on Open Library, from all the libraries in the In-Library pool, currently around 150 North American libraries.