Archive for April, 2011
Coverstore Improvements
We have done some improvements to coverstore, the Open Library book covers service, recently.
Now it is possible to access book covers by all the available identifiers. For example:
http://covers.openlibrary.org/b/goodreads/6383507-M.jpg
http://covers.openlibrary.org/b/librarything/8071257-M.jpg
Accessing covers by ISBNs is insensitive to hyphens now. For example, all the following URLs point to the same cover and this works even if the ISBN is specified with hyphens in the edition record.
http://covers.openlibrary.org/b/isbn/1-59286-793-6-M.jpg
http://covers.openlibrary.org/b/isbn/1592867936-M.jpg
Please refer to the Open Library Covers API for more details.
We have built a secondary database for storing edition identifiers to cover ID mapping to make the accessing covers faster. Because of this there is some delay between adding an identifier to an edition record and accessing the cover using the newly added identifier. The delay is usually couple of seconds.
Please note that this API is intended for displaying covers on public facing websites and not for bulk download. To download the book covers in bulk, please refer to Bulk Access section of the API documentation.
We have recently noticed that some bots are downloading book covers by ISBNs at very high rate and that effected the performance of the system badly. We have added rate-limiting to limit the number of requests per IP address. The current allowed limit is 100 requests per IP for every 5 minutes. The limit is applicable only for cover accesses by various identifiers and there is no limit of accessing covers by cover ID.
This limit should be good enough for linking covers on public facing websites. Please consider using Open Library Books API if your website demands more. Since the Books API provides book cover URLs using cover IDs, the rate-limit won’t be applicable.
Please get in touch with us if you need any assistance in using this API to show book covers on your website.
pystatsd & 5,000 Lists!
We’re working hard to improve Open Library’s general stability and performance, after a few harrowing weeks moving our hardware infrastructure around. We’re beginning to measure more stuff across the site, from general activity levels (about 40,000 catalog edits every month!) to quite specific actions (like, seeing that every second, 1-3 people open up our BookReader).
We’ve begun using a super awesome, real-time stats processing package called pystatsd, a Python implementation of Etsy’s statsd server. My favourite bit is a program that sits on top of that called graphite which takes all the stats we collect with pystatsd and renders them as graphs in a browser. Suddenly, we can see the system in a new and useful way.
We’re also looking hard at improving our memcached configuration, recently introducing another 4 memcached machines into our pool. Now that we can measure memcached hits and misses using pystatsd and graphite, we’ll be able to tell when our caching stuff is actually improving. Yay!
Another tweak you might find interesting… it used to be that lists would only show up on the main Lists page if they contained at least 3 seeds. The other day, Raj and I upped that to at least 5 seeds, and that immediately produced a selection of arguably more interesting lists, most of which settle around a subject area. Here’s a small selection:
- Victorian Illustrations by Old Book Illustrations
- Computer Technology by James Buckingham
- French psychology, 1880-1930 by John Carson
- The Best Books for Writers by Mary Gannon
- Bees by Iona Stewart
- And, not to blow my own horn too loud, but, I stumbled across what looks like a pretty good list on The (UK) Independent site, so I “transcribed” that into Open Library: The 50 Books Every Child Should Read
Have you made a great list, or found someone else’s? Let us know in the comments!
New Titles in Lending Library!
Our little lending library is continuing to grow, this time with 90 new titles purchased directly from two fabulous eBook publishers: A Book Apart & Smashwords.
3 titles from A Book Apart are all must-reads for any discerning web professional…
Thanks to Mandy, Jeffrey and Jason at A Book Apart for joining in the fun. (Incidentally, Mandy’s blog, A Working Library, is a great read.)
There are also 87 ePubs from Smashwords, by authors like Amanda Hocking, Ruth Ann Nordin and Gerald M. Weinberg…
Thanks to @markcoker at Smashwords for working with us to get these new titles online.
Loans through the Open Library are exclusive to one Open Library account holder at once, for up to two weeks. For most titles, you can access the eBooks in one of three ways: directly in your web browser (using our BookReader), as a PDF or ePub (downloaded into Adobe Digital Editions). The new Smashwords titles are a bit different – they’re only available in ePub format, so only downloadable and readable in Adobe Digital Editions.
The Internet Archive (and Open Library) is actively seeking publishers who’d like us to buy their eBooks and make them available in the Lending Library. If you are a publisher interested to sell us your wares, please get in touch!
While that Lending Library — available to anyone with an Open Library account — is growing, we’re also working to expand the collection for our “In-Library” loans, currently at about 85,000 eBooks. This special In-Library program is a bit different, because it requires patrons to literally be inside a participating library’s network. Once that’s the case, patrons can see all the books available in the In-Library collection on Open Library, from all the libraries in the In-Library pool, currently around 150 North American libraries.
Public Library: An American Commons



Public Library: An American Commons is a photography exhibition on at the San Francisco Public Library’s Jewett Gallery, running from April 9 to June 12. The photographer, Robert Dawson, has captured the American relationship with public libraries across the country in a series of intimate portraits. From the Design Observer review:
What’s at stake here is more than access to a room full of books. The modern American public library is reading room, book lender, video rental outlet, internet café, town hall, concert venue, youth activity center, research archive, history museum, art gallery, homeless day shelter, office suite, coffeeshop, seniors’ clubhouse and romantic hideaway rolled into one.
Minimum Viable Record?
Having worked more closely with bibliographic data than I had ever expected to over the last couple of years, I still can’t quite believe how complicated it can be. I keep holding tight something Karen Coyle told me when I first started at Open Library, that “library metadata is diabolically rational.” Now that I’ve witnessed the cataloging from lots of different sources and am more familiar with the level of detail that’s possible in a library catalog, I have a new fondness for these intensely variegated information systems; at times devilishly detailed, at others wildly incomplete or arcanely abbreviated. Everyone likes to arrange things and classify them into groups. It’s when you try to get people to put things into groups that someone else has come up with that it starts getting messy.
At Open Library, we’re attempting to ingest catalog data from, well, everywhere. Every “dialect” of cataloging practice makes this mass consumption harder. In spite of the rational goal of standardized data entry, there is an intense diffusion of practice. (Have a look at Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley and Devin Becker if you haven’t already.)
A challenge I think we face today is a metastasized level of complexity, particularly as we attempt to begin to catalog works that have no physical form, but only exist electronically. Any challenge presents opportunity, and the opportunity here is to radically simplify the way things are represented in catalogs.
In February, I gave a presentation at the recent API Workshop held at the Maryland Institute of Technology and the Humanities (MITH). I talked about Open Library and paid particular attention to the resources we’re trying to put in place for developers to hook into the system.
Part of the presentation was an impromptu survey of the audience, where I passed around an index card for everyone, and asked people to write down the 5 fields they thought were adequate to describe a book. I framed the survey as a search for a “minimum viable record,” and it was fascinating to watch the audience squirm a bit as they asked for more guidance on the challenge. Can fields repeat? What’s the audience for this description? etc.
I’ve collated the results of the forty or so respondents into an ugly spreadsheet. There are 4 sheets, linked in the green strip at the bottom of the page:
- Book Raw – unfiltered results, in the order they were written
- Book Cooked V1 – all results blended, sorted alphabetically
- Book Merged – all results grouped
- Summary – with counts and a graph!
Here’s the final result:
So, on the shoulders of “minimum viable product“, a way for web application developers to get working code deployed quickly and effectively, I wonder if it’s time for a “minimum viable record” in place for bibliographic systems. Enough detail for a computer to match, correlate and compare, but not so much that having to process each record stops everything in its tracks.
You might have heard of the Open Publication Distribution System (OPDS) Catalog specification, which is a syndication format for electronic publications. Certainly, this new standard is a great step towards simpler representations of books — in this case, OPDS was initially designed to represent eBooks specifically — but I find myself wondering if it could be reduced further still, to pave the way for even easier exchange between systems. (Please note that all our edition records are now available in OPDS format, as well as RDF and JSON.)
Something like Title, Author, Date, Subject[s] and Identifier[s] might just do the trick, though it is of course irresistibly debatable. It’s an idea we’re going to look to as we work on our new Write API for Open Library. This minimum viable record will play gatekeeper for any new records we ingest (or that you export).
What do you think of this minimum viable blog post?














