We’re doing some work on improving our search engine at the moment. As we release the new code, search performance may be intermittent. Apologies for the interruption, but, search will be much faster when everything settles down. We’ll drop a note in here when it’s back online.
Update, 6PM PST: We didn’t quite get as much done today as we’d planned, but search should be stable. More tomorrow!
Update, 9AM PST, 8/28: Holy search, Batman!! Before… searching on Open Library was a slog. But now! It’s a breeze! Our search guy, Paul, has been tightening knobs and flipping switches (aka making good use of SOLR stored fields), and our chief data munger, Edward, helped push out the new code this morning. Just see how fast our 24,781 bacon records show up! Then, there’s the “collection” of digitized books about cheese… Please let us know if you come across anything untoward.
We tried again, with better results this time. Search should be much faster now.
In this tweet, Edsu asked what fields we are storing in solr. The answer is: because of memory pressure on the search server, we only store the fields necessary to populate the search result page: book identifier (/b/OL12345M or the like), title, authors, author id’s, publishers, publication date, plus one or two others, plus the facet fields. We were previously storing only the book ID plus the facet fields, and populating the result page with database retrievals. Using the database like that is conceptually clean, but it got to be unbearably slow as our traffic level increased.
With the stored field scheme, if you update a book record (say by changing the publisher name) and then search for the book, there can be a delay of several minutes before the updated info shows up in the search results. We feel that the increased responsiveness makes it worth accepting this slight anomaly. Another consequence though is that if we change the fields that we display in search results, we will have to go through a somewhat tedious reindex and configuration change. However, the search result template is fairly stable these days, so we don’t expect to change it very often.
We are working on a further scheme that will address the above problems. We’ll post more about that when it happens 😉
I would like to have a copy of the book