KohaCon 2011

By Noufal

Anand and I attended the Koha Conference in Thane, Mumbai earlier in November and spoke about Open Library. The conference took place from Oct 31 till 2 November. There was a hackfest following the event from 4th to 6th.

We missed the first day and presented our talk on the second day of the event. The first day had a number of interesting talks mainly about libraries shifting to Koha and about deployment issues. We spent our free time speaking to Robin Sheat, Dobrica Pavlinu i and Ian Walls among others about ways to tie up the Open Library data along with Koha installations. While the audience was somewhat small, it was truly international. There were folks from Kenya, Nigeria, France, the States, New Zealand, Australia, Croatia and of course various parts of India. We also met Savitra who apart from being a Koha developer, runs a Bangalore based company called OSS labs that provides hosted Koha instances for libraries.

We presented on the last day. Our slides are available at http://internetarchive.github.com/kohacon2011-presentation/. It was an introduction to Open Library, the data we have and some discussions on the API. There were a few questions mainly about copyright issues and about the classification system we use on the website. The conference was attended by many librarians and two of them (The Institute of Management Studies Library at VPM Thane and the University of Zagreb Faculty of Humanities and Social Sciences Library, Croatia) have applied to join the Open Library Lending Library program.

After the presentations, November 3rd was a day off and we spent it wandering around the older parts of Mumbai. On November 4th, we went back and spent the morning brainstorming about ideas to implement. We came up with a few

The first is a simple database update that presents OL as a search option when a book is not found while searching in a Koha installation. It’s been done and signed off.

The second was a simple Javascript change that fetches covers and borrow information from Open Library and then presents it when searches are done on Koha. This has been implemented as well.

The third is the most involved part and we have started work on an API to upload covers to OL which can be used by any external program. We have also started work on an API for Koha to search our records to see if the book being added is already in our database (in which case, it can auto complete the details for them). The search will also return the cover if it exists. On our end, if the koha side agrees, we can populate our database with the catalogue record being searched for and if a cover is uploaded, we can get a copy of that as well. This means that if a Koha instance in one library has uploaded a cover, other libraries will be able to use it. On the Koha side, Robin has a private branch that contains the work in progress. Details are in the bugzilla entry.

We’re following up on the bugs and the lending library requests to join. On the overall, it was a wonderful event and one that benefited Open Library as well as Koha.

BookReader Work Sprint at NYPL Labs

By mang

We had a really fantastic code/work sprint for the BookReader organized by the most excellent NYPL Labs.  The sprint was designed to bring together organizations that have an interest in the BookReader as a way to foster the sharing of interest, code and expertise.

New York Public Library

We started by making a list of desired features and prioritizing them.  High on the list was to make the code more modular and easier to understand, reuse and extend.  We made great progress towards that goal by creating a new plugin architecture that allows new views of the book to be added cleanly to the existing code.  For example, it will be possible to create a book view that uses the <canvas> tag or other advanced web technologies and have it automatically included in the BookReader application simply by including that plugin’s JavaScript file.

Looking down into the stacks

Another highly desired feature is making it easier for people to use their own books with the BookReader application.  Doug Reside from NYPL Labs contributed a “book loader” (our new term for the piece of code that connects the BookReader to the underlying images and metadata for a book to display) that allows you to specify the images for a book directly inside an HTML file.  This new loader provides a simple way to use the BookReader for your own books.

The new code is currently on the codesprint branch of the BookReader github repository.  We plan to integrate the new plugin system once the code has been polished and tested. Updated documentation is also coming. You can subscribe to the bookreader-announce mailing list to be notified when the code is released. You can also find more information about developing and using the BookReader in our developer resources.

Mitch Brodsky with his BookReader customized for the NY Philharmonic

This works sprint hosted by NYPL Labs marks an exciting new milestone in the development of the BookReader. We’re setting the foundation for greater re-use and collaboration around the BookReader. Many thanks to Doug Reside, David Riordan and Ben Vershbow of NYPL Labs for organizing the sprint and the fantastic attendees who contributed ideas and code commits!

BookReader Sprinters

“We’re Re-Tribalising”

By George Oates

“It’s no longer one thing at a time, but everything all at once.”

Happy Birthday, Marshall McLuhan. (Via @josettemelchor on Twitter.)

Our Search Engine Was Hurting

By George Oates

Sorry to say, but our search engine is all kinds of weird this morning (San Francisco time). Lots of the pages you see around the site, like a Work page or a Subject page, or indeed the Search Results page are driven largely by search.

So, while we’re working on fixing the problem, please excuse the various gaps you might encounter around the place as we resolve it.

Update, 12:50pm PST: We’ve tried running Lucene CheckIndex on the work search index, but it came back with “No problems were detected with this index.” Hmm. Next, we’ll try restoring from backup. Apologies again for the continued oddness around the site!

Update, 2:30pm PST, 7/13/11: OK. We’ve just removed the site alert that you might have noticed across the top of every page, because we think the search is back up online. It turned out that we had to rebuild the whole index from a 2-day-old backup, and then process the last 2 days of changes made on the site into the new index. Phew! If you continue seeing weirdness or results that look less correct than usual, please let us know.

Apologies again for the road bump – and a word of advice? Be sure to keep an eye on how and where your log files are being stored, and accumulated. Turns out it was a huge blob of log files is what ended up choking our search engine, stopping it from being able to be updated. We’ve since modified where we store logs, and how often they are cleaned out.

The Challenges of Getting to Mars: Landing Day, Nerves and Joy

By George Oates

I’ve been spending a bit more time on archive.org lately. Not only exploring the nearly 3 million scanned texts, but also our massive video collection. I uncovered this documentary showing earthlings landing something called “Phoenix” on the surface of Mars in June of 2008. Here’s what happened:

Go, humans! And, NASA!

Heads up! Little confirmation email glitch in play

By George Oates

Last week we made some upgrades to the way account management on Open Library works. We’ve been hearing through our contact form that some people have had trouble with their confirmation emails not working. Specifically, clicking on the link to confirm your email address from the email we send you lands you on a page that throws an error.

Just a note to let you know that if this happens to you, there are 2 ways to try to resolve it:

  1. Just try clicking through again on the verification link in the email you got from us, or
  2. Try logging in with your account again. If your verification hasn’t gone through yet, you’ll see a screen that can resend a verification email, and that new link should work.

Sorry for the glitch! It should sort itself out within a day or two.

Announcing a new Read API

By George Oates

One of the goals of Open Library is to make it easy to share bibliographic data. While we’ve had various APIs available from the very beginning and have made bulk data dumps available since forever, there is always room for improvement.

We’re working on 2 new APIs at the moment, and today, we released a tiny baby version of our new Read API. The upcoming Import API was also released for internal use only, deployed as a replacement part for the process Open Library uses to discover new books (and their accompanying MARC records) that are scanned each day by the Internet Archive. (More on the Import API later.)

The Read API
Similar to the way our existing Books API mirrors and is compatible with the Google Books Dynamic Links API, the Read API is very much inspired by, and partially compatible with, the Hathi Trust Bibliographic API.

The idea is, you can hit the Read API with an identifier or a series of identifiers or an array of identifiers, and it will tell whether there is a readable or borrowable version available through Open Library. As you render a page in your own bookish website, you can paint links into Open Library based on the response.

Traversing Works and Editions
The Read API will try to match your identifiers to an OL edition record, and will then return its work and then other editions of that work which also have readable or borrowable resources if the one you’re looking for doesn’t have an available eBook. That way, you can at least point people to a similar version of what they were looking for if the initial query doesn’t find something to read.

I find myself wondering whether this functionality might be useful for other things, like reconciling works data across different systems, or comparing edition fidelity/duplication.

We were thrilled to bits to meet Dan Scott a little while ago when he came to visit us at 300 Funston. He’s a hacker on the Evergreen ILS system, and by day works at Laurentian University. Evergreen’s already been using the OL API for showing covers and tables of contents within their UI, but it was somewhat laborious, needing to blend two of our APIs together to get the desired output. It was great to meet Dan, and we actually ended up designing the Read API response together over the course of an afternoon, specifically to remove that double-step process. Dan has written about this too: The Wonderful New Open Library Read API and Evergreen Integration. The super thing about working with Dan is, once we’ve dotted the Is and crossed the Ts on this, it can be deployed to any and all instances of Evergreen that want it. (Hello, Koha? I’ll be in touch shortly!)

So, there are some initial Read API docs in the Developer’s area, and see a working demonstration of it that Dan & Mike hooked up in a flurry of late night emails and tweaking (which was a pleasure to observe). Head over to the Open Library skin on the Laurentian University’s library catalog to see very young API in action!

The obvious caveat is — as Dan notes — “working code wins,” which is another way of saying we haven’t optimized or scaled anything up for a bazillion hits yet, so results will be a little slow for now. But still! Books! In your catalogs! If you are from a large system that would probably send us a bunch of requests per second, it would be nice if you could give us a head’s up if you’re going to use the Read API. A good place to do that, or ask questions is on our ol-tech mailing list.

By the way, as you may have noticed, a few weeks ago, we mentioned that oclcBot has updated the Open Library with about 4 million OCLC IDs too, which means that if you speak OCLC, you can hit the Read API with your OCLC ID to look for things to read or borrow through Open Library on your site.

openlibrary.org downtime (resolved)

By George Oates

Apologies for the service interruption. Our new Virtual Machine Environment hiccuped this morning, and we’re shaking out the process we need to go through to restart everything smoothly. Covers should be being served normally from covers.openlibrary.org using ISBN or other supported identifiers.

I’ll update here if there’s relevant news – hopefully it won’t be down much longer.

Update, 4:00PM PST: OK. We’re stable again now. Damn race conditions in Linux kernels!

Update, 10:00PM PST: The website is down again.

Update, May 17 2:00AM PST: We’re back online now. But, covers.openlibrary.org is offline, which unfortunately means that external sites accessing Open Library covers using ISBNs is also offline. Covers on openlibrary.org are being served from an alternate location. (We need to wrangle some DNS settings to get this fixed, hopefully within a few hours.)

Internet Archive is launching a Physical Archive

By George Oates

[Reposted from the main Internet Archive blog.]

Everyone is welcome to the open-house and launch of the new Physical Archive of the Internet Archive in Richmond, California on Sunday June 5th from 4-8pm.

The new Internet Archive facility in Richmond, California, click to enlarge

After 2 years of prototyping and testing a new design for sustainable long-term preservation of physical books records and movies, we are starting with over 300,000 books and gearing up for millions.

Come if you:

– love books, records, or movies
– are concerned about the future of open access and preservation
– want to have something fun to talk about over the water cooler on Monday….

Then, invest an hour with us on a Sunday – Drinks, food, good people.

What you will see:

– A high density, modular system for storing books, video and audio
– A temp controlled environment for long-term preservation
– Our new logistics facility that will catalog and coordinate large collections of books, records and movies.

Who you will meet:

– The Internet Archive Board, Founder, Management Team
– Friends and supporters of the Internet Archive
– Colleagues and leaders from the Library community

Please come! Bring friends and family.

– Secure free parking
2512 Florida Avenue, Richmond, California, 30 minutes north of San Francisco and Berkeley, 415 561 6767.

RSVP to rsvp@archive.org, or just come!

Library of Congress National Jukebox

By George Oates

Wonderful stuff! The Library of Congress has just released more than 10,000 digitized 78s from the Victor Talking Machine Company as a National Jukebox. The recordings are lovely and crackly, as if you’re listening on a gramaphone.

Here’s “Cradle Song 1915″, recorded in Camden, New Jersey:

Nice to see pretty URLs for things too, like Enrico Caruso or things recorded on May 18. Go LC!

5/13: Interesting postscript at Public Knowledge.