Coverstore Improvements

We have done some improvements to coverstore, the Open Library book covers service, recently.

Now it is possible to access book covers by all the available identifiers. For example:

http://covers.openlibrary.org/b/goodreads/6383507-M.jpg
http://covers.openlibrary.org/b/librarything/8071257-M.jpg

Accessing covers by ISBNs is insensitive to hyphens now. For example, all the following URLs point to the same cover and this works even if the ISBN is specified with hyphens in the edition record.

http://covers.openlibrary.org/b/isbn/1-59286-793-6-M.jpg
http://covers.openlibrary.org/b/isbn/1592867936-M.jpg

Please refer to the Open Library Covers API for more details.

We have built a secondary database for storing edition identifiers to cover ID mapping to make the accessing covers faster. Because of this there is some delay between adding an identifier to an edition record and accessing the cover using the newly added identifier. The delay is usually couple of seconds.

Please note that this API is intended for displaying covers on public facing websites and not for bulk download. To download the book covers in bulk, please refer to Bulk Access section of the API documentation.

We have recently noticed that some bots are downloading book covers by ISBNs at very high rate and that effected the performance of the system badly. We have added rate-limiting to limit the number of requests per IP address. The current allowed limit is 100 requests per IP for every 5 minutes. The limit is applicable only for cover accesses by various identifiers and there is no limit of accessing covers by cover ID.

This limit should be good enough for linking covers on public facing websites. Please consider using Open Library Books API if your website demands more. Since the Books API provides book cover URLs using cover IDs, the rate-limit won’t be applicable.

Please get in touch with us if you need any assistance in using this API to show book covers on your website.

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.

6 Comments

  1. Posted April 28, 2011 at 1:11 am | Permalink

    I’m sorry to hear about the rate-limiting; 100 requests in 5 minutes represents a total of only 10 searches (assuming 10 results per page). That will kill Evergreen’s current usage of the covers API; we currently issue cover requests from the Evergreen server for new ISBNs, cache the results (because we’re conscientious!), and serve the results up to our users from the Evergreen server. For reasonably busy Evergreen sites, 100 requests in 5 minutes is a drop in the bucket.

    I’ve started cutting the Evergreen code over to use the Books API to retrieve the cover image URLs instead; hopefully that isn’t just pushing the problem around, though, as we’ll be making a similar number of requests to the Books API.

  2. Posted April 28, 2011 at 5:03 am | Permalink

    Dan, sorry to know that the rate-limit is effecting Evergreen sites. Is there any way to identify the Evergreen bot so that I can whitelist them?

  3. Posted May 10, 2011 at 1:21 am | Permalink

    Yes, indeed, 100 in 5 minutes is definitely NOT enough for most public facing websites. The rate limit is going to be a problem for my Umlaut software too. So if I change the software to use the Books api, to still do the exact same operation (first look up by identifier to get OL-id/cover-url), that is preferable to you, and will not be rate-limited? Not entirely sure why that’s kinder on your servers, but whatever works.

    Also, please make sure when you DO deny someone for rate-limiting, you issue the proper HTTP response code, so at least if I were looking through my logs for errors without having seen this blog post, I’d have some clue that the reason all my requests are failing has to do with rate-limiting. Except looks like there isn’t neccesarily a status code for this, hmm. Well, ideally at least return SOME error condition type status code (503 perhaps?), possibly with a Message indicating rate limiting? Please don’t return a 200 with an empty body, this is terrible on client code.

    On a different topic related to covers, with the addition of OCLCnumbers to many OL records, can covers be looked up by OCLCnumber now too?

  4. Posted May 10, 2011 at 1:22 am | Permalink

    Also, please add documentation of the rate limit to http://openlibrary.org/dev/docs/api/covers

  5. Posted May 10, 2011 at 1:27 am | Permalink

    And ah, I see all my questions are answered at /docs/api/covers, and it does mention the rate limiting, sweet.

    Dan, I wonder if it would make sense for Evergreen to download the complete mapping file at http://www.archive.org/download/ol_dump_2011-03-31/ol_dump_coverids_2011-03-31.txt.gz on a regular basis, index it internally, and do the ID lookup internally, only going to OL with a coverID already provided, which will not be rate-limited? Considering that for Umlaut, although it’s certainly more work on my end, not sure Umlaut’s use of OL cover’s is important enough to it’s use cases at present to justify, but may be a different calculation on benefit for Evergreen.

    Anand, that approach, would would certainly be kinder on your servers, would be a LOT easier if you could provide a persistent stable URL for “latest mapping file”, instead of the URL with a date stamp in it that you link to in the docs. AND if such a URL responded to a HEAD request with a real last-modified header, so client software could automatically periodically check it to see if there’s a new one and if so download and index locally.

  6. Posted May 10, 2011 at 3:48 pm | Permalink

    Jonathan, we are planning to integrate the generation of mapping file to monthly OL dump generation. Once that is done, there will be a stable URL for latest mapping file.

One Trackback

  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives