The Little Bot That Could

homebuying is lots and lots of paperwork

Meet oclcBot. He was written by Bruce Washburn at OCLC Research to help connect Open Library records to He’s just finished updating almost 4 million Open Library editions with links! No metadata exchange at all, except these identifiers. Tiny, but powerful, because that lets systems that “speak OCLC” communicate directly with Open Library without knowing any Open Library IDs. As Anand mentioned in his recent post about Coverstore Improvements, we’ve also made the system for displaying covers externally using other types of identifiers more efficient.

There was a bit of a bumpy start to oclcBot’s updates, and Bruce and I thought it might be good to hear what it was like in the trenches. From Bruce:

This project was essentially very simple: find corresponding Open Library and OCLC WorldCat records by a shared attribute (ISBN), and update the Open Library record with the corresponding OCLC number. Once OCLC had generated a list of OCLC numbers and their corresponding ISBNs, it seemed to be a simple matter of using the very robust Open Library API to look for matching records, check to see if they already included an OCLC number, and update the record accordingly. Complications arose, related to scale. There were about 90 million ISBNs to check from the OCLC list, and checking them one at a time via the API was projected to take a very long time. So we used a data dump of all the Open Library records to identify those with ISBNs, and also built a very fast index of the OCLC list to check against. With that we were able to produce a new list of Open Library records and corresponding new OCLC numbers. And a batch update facility in the Open Library API made it possible to send API requests 1,000 records at a time. The pre-processing and the batch process both yielded some additional lists that will require more scrutiny to process (records associated with multiple ISBNs, API exceptions for individual records), but the great majority of records were updated via the oclcBot without any further effort.

So, it’s still early days with our Bot operations, but we’re looking for external developers who might be interested to try to do these “surgical strike” style updates to loads of Open Library records at once. If you’re curious, please visit our Writing Open Library Bots in the Open Library Developers area.

Thank you, Bruce!

(And thanks to Solo for the CC BY-NC-SA 2.0 oclcBot photo.)

This entry was posted in Uncategorized and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted May 3, 2011 at 7:45 pm | Permalink

    Just curious, would you be able to make the isbn/oclc mapping available?

  2. Posted May 4, 2011 at 12:02 am | Permalink

    That’s really a question for Bruce, Ed. He’s the man who made the various mapping files. I’d love to see that in the wild though, for what it’s worth.

  3. Posted May 4, 2011 at 9:46 pm | Permalink

    Bruce is an amazing and resourceful colleague and I’m pleased you’ve had the opportunity to work with him!

  4. Posted May 10, 2011 at 1:09 am | Permalink

    Hmm, Open Library data is bulk-downloadable, isn’t it? So for the portion of OCLCnums that were succesfully attached to OL records…. the OCLCnum to ISBN mapping is already available, derivable from available OL records, yeah? Is this true?

  5. Posted May 10, 2011 at 5:15 pm | Permalink

    Jonathan: I’m not certain that the dumps are produced regularly yet. It may be that the latest oclcBot updates aren’t represented yet. Besides, working with the bulk dumps is far too unwieldy for lots of people… Seems to me it might be useful to produce a “minimum viable record” set of the system. A dataset that contained (something like) Title, Author(s), Subject(s), Date, Identifier(s) for everything…

One Trackback

  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives