Archive for May, 2011

openlibrary.org downtime (resolved)

By George Oates

Apologies for the service interruption. Our new Virtual Machine Environment hiccuped this morning, and we’re shaking out the process we need to go through to restart everything smoothly. Covers should be being served normally from covers.openlibrary.org using ISBN or other supported identifiers.

I’ll update here if there’s relevant news – hopefully it won’t be down much longer.

Update, 4:00PM PST: OK. We’re stable again now. Damn race conditions in Linux kernels!

Update, 10:00PM PST: The website is down again.

Update, May 17 2:00AM PST: We’re back online now. But, covers.openlibrary.org is offline, which unfortunately means that external sites accessing Open Library covers using ISBNs is also offline. Covers on openlibrary.org are being served from an alternate location. (We need to wrangle some DNS settings to get this fixed, hopefully within a few hours.)

Internet Archive is launching a Physical Archive

By George Oates

[Reposted from the main Internet Archive blog.]

Everyone is welcome to the open-house and launch of the new Physical Archive of the Internet Archive in Richmond, California on Sunday June 5th from 4-8pm.

The new Internet Archive facility in Richmond, California, click to enlarge

After 2 years of prototyping and testing a new design for sustainable long-term preservation of physical books records and movies, we are starting with over 300,000 books and gearing up for millions.

Come if you:

– love books, records, or movies
– are concerned about the future of open access and preservation
– want to have something fun to talk about over the water cooler on Monday….

Then, invest an hour with us on a Sunday – Drinks, food, good people.

What you will see:

– A high density, modular system for storing books, video and audio
– A temp controlled environment for long-term preservation
– Our new logistics facility that will catalog and coordinate large collections of books, records and movies.

Who you will meet:

– The Internet Archive Board, Founder, Management Team
– Friends and supporters of the Internet Archive
– Colleagues and leaders from the Library community

Please come! Bring friends and family.

– Secure free parking
2512 Florida Avenue, Richmond, California, 30 minutes north of San Francisco and Berkeley, 415 561 6767.

RSVP to rsvp@archive.org, or just come!

Library of Congress National Jukebox

By George Oates

Wonderful stuff! The Library of Congress has just released more than 10,000 digitized 78s from the Victor Talking Machine Company as a National Jukebox. The recordings are lovely and crackly, as if you’re listening on a gramaphone.

Here’s “Cradle Song 1915″, recorded in Camden, New Jersey:

Nice to see pretty URLs for things too, like Enrico Caruso or things recorded on May 18. Go LC!

5/13: Interesting postscript at Public Knowledge.

The Little Bot That Could

By George Oates

homebuying is lots and lots of paperwork

Meet oclcBot. He was written by Bruce Washburn at OCLC Research to help connect Open Library records to Worldcat.org. He’s just finished updating almost 4 million Open Library editions with links! No metadata exchange at all, except these identifiers. Tiny, but powerful, because that lets systems that “speak OCLC” communicate directly with Open Library without knowing any Open Library IDs. As Anand mentioned in his recent post about Coverstore Improvements, we’ve also made the system for displaying covers externally using other types of identifiers more efficient.

There was a bit of a bumpy start to oclcBot’s updates, and Bruce and I thought it might be good to hear what it was like in the trenches. From Bruce:

This project was essentially very simple: find corresponding Open Library and OCLC WorldCat records by a shared attribute (ISBN), and update the Open Library record with the corresponding OCLC number. Once OCLC had generated a list of OCLC numbers and their corresponding ISBNs, it seemed to be a simple matter of using the very robust Open Library API to look for matching records, check to see if they already included an OCLC number, and update the record accordingly. Complications arose, related to scale. There were about 90 million ISBNs to check from the OCLC list, and checking them one at a time via the API was projected to take a very long time. So we used a data dump of all the Open Library records to identify those with ISBNs, and also built a very fast index of the OCLC list to check against. With that we were able to produce a new list of Open Library records and corresponding new OCLC numbers. And a batch update facility in the Open Library API made it possible to send API requests 1,000 records at a time. The pre-processing and the batch process both yielded some additional lists that will require more scrutiny to process (records associated with multiple ISBNs, API exceptions for individual records), but the great majority of records were updated via the oclcBot without any further effort.

So, it’s still early days with our Bot operations, but we’re looking for external developers who might be interested to try to do these “surgical strike” style updates to loads of Open Library records at once. If you’re curious, please visit our Writing Open Library Bots in the Open Library Developers area.

Thank you, Bruce!

(And thanks to Solo for the CC BY-NC-SA 2.0 oclcBot photo.)