It’s been some months since we’ve updated you about what the Open Library is up to. Sorry about that. Thought it might be nice to produce a novella/brain dump to let you know where we’re at.
The short answer is: all sorts of things! I’ve been leading the project now for about 6 months, and have finally settled down enough to tell you what we’re up to. We’d love to hear what you think of our ideas perhaps in the comments of this post, or on our general discussion mailing list.
The Open Library project began in February of 2007, and launched in November that year, so it’s approaching 3 years old. During that time, we’ve amassed one of the biggest virtual library catalogs online, at some 23 million edition entries and some 6 million or so author records. We also have a ton of book covers. Our catalog is entirely open and free to use. You can download everything if you wish, or use our API to either link to our records, or to display Open Library data on your website.
When I started, we didn’t have much insight into what was happening on Open Library. We knew people were using it, but didn’t really know how much, or who or what was happening. The majority of edits to the catalog are made by our bots, running updates across the system, and creating new stub records. While this work is essential, I couldn’t see any of the humans using the catalog. So, we started trying to get some insight into the site usage using tools like Alexa amongst others.
Here’s what we’ve uncovered so far, with more to come:
- We have an average of about 400 concurrent visitors at any time, peaking at up to 900 people
- We’ve increased daily unique IPs from about 100,000 back in June to somewhere around 250,000 today
- Our site uptime has steadied to a very healthy 100% on a good week
- Our bounce rate is high. Too high. It’s a concern for us that people lob into Open Library thanks to our high search engine ranking, but bounce straight out again
- There are over 3,000 sites that link directly into Open Library. Wonderful! We’re working on understanding what those links are, and from where.
- Our membership is fairly small, but growing every day. You can edit and use Open Library without creating an account, which probably accounts for the modest membership.
I think the story that numbers like those tell is that we have an excellent foundation for growth. This is precisely what we’re banking on as we announce that we’ll be releasing a redesign of Open Library in the next few months. We’ll stay in touch about the actual dates, and it’s very likely we’ll do a soft release before we make the final transition live. Please, watch the blog for updates on timing.
There are a number of enhancements to Open Library that we’re planning to make in the upcoming redesign (or, “realignment” as Cameron Moll has written about). That’s not to say we won’t be taking the opportunity to update the site’s look and overall usability, but, the core of the release will be about the catalog and how you see it.
Having researched a lot of the historical documentation surrounding the project , I saw tons of ideas that sounded great and which it’s time to create, like the ability to tag records or authors, provide tools to upload small collections from special interest or rural libraries and to push what bibliographic data means on the web. We’re looking forward to beginning to make some of these ideas reality into 2010.
Key Components of the Redesign
Works
Open Library deals with books at the edition level. This makes finding “War and Peace” really tricky, because all we currently display are the hundreds of editions in a big unordered list. Tricky to find what you’re searching for… Luckily, the cataloging standards initiative called Functional Requirements for Bibliographic Records (FRBR), describes a “super-level” of book called “the Work” which describes the abstract idea of a book and not its constituent editions, probably making it easier to get started with research and the like.
We’ve been toiling for the past several months to roll up all our editions into logical Works. This is incredibly tricky for all sorts of reasons and as much as we would like it to be bulletproof perfect on the first go, it’s likely people will see one edition that should be in certain Work, or Work records that are really same book. Providing tools for fixing dupes like that is next on the list. That said, we’ve been testing our brand new Work search lately, and it’s given me (at least) an entirely different and exciting iew on the Open Library. We can suddenly see things like the books in our catalog with the most editions, or all the Works by Mark Twain (instead of a massive list of all the editions he’s supposed to have written) and more. Truly, it’s invigorating after being stuck in the edition “mud” for so long. Not that edition data is bad, of course, just that the aggregate is extremely useful.
Subjects
As a non-librarian, I have been both shocked and awed by the degree of classification that’s possible using library practices. Catalogers have worked hard to put books into very specific descriptive boxes and hierarchies. Being a fan of messy data and classification, I have stumbled upon lots of classifications for books whose “order” seems quite nonsensical.
For example, many of the science fiction books listed on Open Library have several very similar, convoluted subject classifications, separated by all manner of different characters. To the human eye, it seems like duplication of effort. One book might have the following subjects assigned to it:
Science Fiction - General, Fiction / Science Fiction / General, Fiction, Fiction - Science Fiction, Science Fiction
We could just show a list of concepts, like:
Science Fiction, General and Fiction
…instead. and turn each of those terms into links, which take you through to a page that can show all books with the same subjects.
Similarly…
Probability & statistics, Probabilities, Mathematics, Science/Mathematics, Probability & Statistics - General, Mathematics / Statistics
… could be consolidated into Probability, Statistics, Probabilities, Mathematics, Science and that pesky “General” subject. People are good at reading collections of words in a list and understanding the concepts of the list, we think. It’s almost more difficult to parse the variants as you see above, with all their repeats and the use of characters to indicate some sort of hierarchy.
So, we’re going to try that (but not delete the LCSHs, of course).
Links, links, links
The key interface into the current catalog is a search box: essential if you know what you’re looking for, but useless for browsing. We’re going to introduce new navigation elements into the site that will help people dive into the catalog and bounce around. Certainly, we’ll still have search (much improved, upgraded to SOLR 1.4), but, as we think about that high bounce rate, we want to help people hop around the catalog instead of coming and going so quickly. To borrow a phrase from Tom Coates, we are constructing a new view to the catalog to represent is as a web of data instead of discrete records. The more connections we can create between records, the richer that browsing experience can be.
From a linked data perspective, we also want to introduce the ability for people to connect our records with many more systems online. Right now, you can assign up to 6 identifiers with Open Library edition records: ISBN (10 & 13), Library of Congress Control Number (LCCN), Library of Congress Classification System (LC), Internet Archive and OCLC. These IDs are certainly valuable, and in deep circulation in library catalogs around the world, buuuut… there are loads of other bookish sites out there on the web that also have wonderful, rich information about books that we’d like to connect to. Examples include Goodreads, LibraryThing, Zotero amongst others, really any resources that people think are useful! The idea is to stop worrying about a canonical identifier and simply to try accumulate as many identifiers as we can. This idea will take a while to bear fruit, but it works on the premise that we have a new opportunity in cataloging now: to place books in a network instead of on a shelf.
Similarly, we would like to collect links to other sites that are relevant to a certain book or author. Did you know Alain de Botton has a Twitter account? Sites like The Guardian, Flashlight Worthy or the New York Review of Books have incredibly rich information about books and authors that would be wonderful to connect with from the Open Library catalog.
We’re also excited about the role Open Library can play in the new Book Server initiative that was launched by the Internet Archive in October this year:
The BookServer is a growing open architecture for vending and lending digital books over the Internet. Built on open catalog and open book formats, the BookServer model allows a wide network of publishers, booksellers, libraries, and even authors to make their catalogs of books available directly to readers through their laptops, phones, netbooks, or dedicated reading devices.
The basic idea is that publishers can publish a list of any/all epubs in their catalog to be aggregated by other services online. Open Library could be one of those aggregators. We hope to show a real time representation of whatever we can aggregate, so when you look for individual books, you see a live list of where you can get your hands on the document, whether for purchase or download. After all, isn’t the job of a library to get people to books?
Librarianship as the Foundation of Open Library
Open Library’s mission has always been to build a page on the web for every book ever published. We have only been able to start achieving that mission on the shoulders of the work of librarians. While it’s possible (and encouraged) for people to add new records for books we don’t know about yet, the vast majority of our records come directly from library catalogs.
The opportunity we have now is to help interested contributors to enrich these records. Having people who love a particular book, or who have some knowledge in a particular subject area, or who enjoy correcting typos, or who like to make sure all the boxes are filled in, or have a photo of a book they’ve read, or who found a great review of a book on another site can all contribute information to the Open Library. As Tim Spalding, founder of Library Thing, noted in his Social Cataloging talk presented in New Zealand this October, nobody’s quite sure where this “social cataloging” might go, or when it might become useful to librarians in a cataloging sense. What we do know is that there’s a lot of knowledge out there on the web about books, and we want to make a Open Library a place where people can contribute any amount, no matter how small, to make the catalog more useful.
The Open Library is an amazing resource, and now it’s time to take it to the next level. Yeah!
By the way, we’re looking for at least one senior web developer to join the team too, so if you’d like to join a small team doing interesting things with library catalogs, APIs and SOLR on an extensible wiki-editable platform built in Python, and you live close to San Francisco or would move here, please drop a line to info@.
Note: I did a small copy edit Dec 4. Pretty sure I didn’t remove anything of substance.