Category Archives: Uncategorized

API Search.json Performance Tuning

Posted on January 16, 2025 by mek

This is a technical post regarding a breaking change for developers whose applications depend on the /search.json endpoint that is scheduled to be deployed on January 21st, 2025.

Description: This change reduces the default fields returned by /search.json to a more restrictive and performant set that we believe will meet most clients’ metadata needs and result in faster, higher quality service for the entire community.

Change: Developers are strongly encouraged to now follow our documentation to set the fields parameter on their requests with the specific fields their application requires. e.g:

https://openlibrary.org/search.json?q=sherlock%20holmes&fields=key,title,author_key,author_name,cover_i

Those relying on the previous behavior can still access the endpoint’s previous, full behavior by setting fields=* to return every field.

Reasoning: Our performance monitoring at Open Library has shown a high number of 500 responses related to search engine solr performance. During our investigation, we found that some endpoints, like search.json, return up to 500kb of payload and often return fields with large lists of data that are not frequently used by many clients. For more details, you can refer to the pull request implementing this change: https://github.com/internetarchive/openlibrary/pull/10350

As always, if you have questions or comments, please message us on x/twitter @openlibrary, bluesky, open an issue on github, or contact mek@archive.org.

Warmly,

The Open Library Maintainers

Improving Search, Removing Dead-Ends

Posted on October 3, 2024 by mek

Thanks to the work of 2024 Design & Engineering Fellow Meredith White, the Open Library search page now suggests Search Inside results any time a search fails to find books matching by title or author.

Before:

After:

The planning and development of this feature were led by volunteer and 2024 Design & Engineering Fellow, Meredith White who did a fantastic job bringing the idea to fruition.

Meredith writes: Sooner or later, a patron will take a turn that deviates from what a system expects. When this happens, the system might show a patron a dead-end, something like: ‘0 results found’. A good UX though, will recognize the dead-end and present the patron with options to carry on their search. The search upgrade was built with this goal in mind: help patrons seamlessly course correct past disruptive dead-ends.

Many patrons have likely experienced a case where they’ve typed in a specific search term and been shown the dreaded, “0 results found” message. If the system doesn’t provide any next steps to the patron, like a “did you mean […]?” link, then this is a dead-end. When patrons are shown dead-ends, they have the full burden of figuring out what went wrong with their search and what to do next. Is the item the patron is searching for not in the system? Is the wrong search type selected (e.g. are they searching in titles rather than authors)? Is there a typo in their search query? Predicting a patron’s intent and how they veered off course can be challenging, as each case may require a different solution. In order to develop solutions that are grounded in user feedback, it’s important to talk with patrons

In the case of Open Library, interviewing learners and educators revealed many patrons were unaware that the platform has search inside capabilities.

“Several interviewees were unaware of Open Library’s [existing] full-text search, read aloud, or note-taking capabilities, yet expressed interest in these features.”
https://blog.openlibrary.org/2024/06/16/listening-to-learners-and-educators/

Several patrons were also unaware that there’s a way to switch search modes from the default “All” to e.g. “Authors” or “Subjects”. Furthermore, several patrons expected the search box to be type-agnostic.

From our conversations with patrons and reviewing analytics, we learned many dead-end searches were the result of patrons trying to type search inside queries into the default search, which primarily considers titles and authors. What does this experience look like for a patron? An Open Library patron might type into the default book search box, a book quote such as: “All grown-ups were once children… but only a few of them remember it“. Unbeknownst to them, the system only searches for matching book titles and authors and, as it finds no matches, the patron’s search returns an anticlimactic ‘No results found’ message. In red. A dead-end.

As a Comparative Literature major who spent a disproportionate amount of time of my undergrad flipping through book pages while muttering, “where oh where did I read that quote?”, I know I would’ve certainly benefitted from the Search Inside feature, had I known it existed. With a little brainstorming, we knew the default search experience could be improved to show more relevant results for dead-end queries. The idea that emerged is: display Search Inside results as a “did you mean?” type suggestion when a search returns 0 matches. This approach would help reduce dead-ends and increase discoverability of the Search Inside feature. Thus the “Search Inside Suggestion Card” was born.

The design process started out as a series of Figma drawings:

Discussions with the design team helped narrow in on a prototype that would provide the patron with enough links and information to send them on their way to the Search Inside results page, a book page or the text reader itself, with occurrences of the user’s search highlighted. At the same time, the card had to be compact and easy to digest at a glance, so design efforts were made to make the quote stand out first and foremost.

After several revisions, the prototype evolved into this design:

Early Results

The Search Inside Suggestion card went live on August 21st and thanks to link tracks that I rigged up to all the clickable elements of the card, we were able to observe its effectiveness. Some findings:

In the first day, 2k people landed on the Search Inside Suggestion card when previously they would have seen nothing. That’s 2,000 dead-end evasion attempts!
Of these 2,000 users, 60% clicked on the card to be led to Search Inside results.
40% clicked on one of the suggested books with a matching quote.
~8% clicked on the quote itself to be brought directly into the text.

I would’ve thought more people would click the quote itself but alas, there are only so many Comparative Literature majors in this world.

Follow-up and Next Steps

To complement the efforts of the Search Inside Suggestion card’s redirect to the Search Inside results page, I worked on re-designing the Search Inside results cards. My goal for the redesign was to make the card more compact and match its styling as closely as possible to the Search Inside Suggestion card to create a consistent UI.

Before:

After:

The next step for the Search Inside Suggestion card is to explore weaving it into the search results, regardless of result count. The card will offer an alternate search path in a list of potentially repetitive results. Say you searched ‘to be or not to be’ and there happens to be several books with a matching title. Rather than scrolling through these potentially irrelevant results, the search result card can intervene to anticipate that perhaps it’s a quote inside a text that you’re searching for. With the Search Inside Suggestion card taking the place of a dead-end, I’m proud to report that a search for “All grown-ups were once children…” will now lead Open Library patron’s to Antoine de Saint-Exupéry’s The Little Prince, page 174!

Technical Implementation

For the software engineers in the room who want a peek behind the curtain, working on the “Search Inside Suggestion Card” project was a great opportunity to learn how to asynchronously, lazy load “parts” of webpages, using an approach called partials. Because Search Inside results can take a while to generate, we decided to lazy load the Search Inside Suggestion Card, only after the regular search had completed.

If you’ve never heard of a partial, well I hadn’t either. Rather than waiting to fetch all the Search Inside matches to the user’s search before the user sees anything, a ‘No books directly matched your search’ message and a loading bar appear immediately. The loading bar indicates that Search Inside results are being checked, which is UX speak for this partial html template chunk is loading.

So how does a partial load? There’s a few key players:

The template (html file) – this is the page that initially renders with the ‘No books directly matched your search’ message. It has a placeholder div for where the partial will be inserted.
The partial (html file) – this is the Search Inside Suggestion Card
The Javascript logic – this is the logic that says, “get that placeholder div from the template and attach it to an initialization function and call that function”
More Javascript logic – this logic says, “ok, show that loading indicator while I make a call to the partials endpoint”
A Python class – this is where the partials endpoint lives. When it’s called, it calls a model to send a fulltext search query to the database. This is where the user’s wrong turn is at last “corrected”. Their initial search in the Books tab that found no matching titles is now redirected to perform a Search Inside tab search to find matching quotes.
The data returned from the Python class is sent back up the line and the data-infused partial is inserted in the template from step 1. Ta-da!

About the Open Library Fellowship Program

The Internet Archive’s Open Library Fellowship is a flexible, self-designed independent study which pairs volunteers with mentors to lead development of a high impact feature for OpenLibrary.org. Most fellowship programs last one to two months and are flexible, according to the preferences of contributors and availability of mentors. We typically choose fellows based on their exemplary and active participation, conduct, and performance within the Open Library community. The Open Library staff typically only accepts 1 or 2 fellows at a time to ensure participants receive plenty of support and mentor time. Occasionally, funding for fellowships is made possible through Google Summer of Code or Internet Archive Summer of Code & Design. If you’re interested in contributing as an Open Library Fellow and receiving mentorship, you can apply using this form or email openlibrary@archive.org for more information.

Improving Open Library’s Translation Pipeline

Posted on July 30, 2024 by Drini Cami

A forward by Drini Cami
Drini Cami here, Open Library staff developer. It’s my pleasure to introduce Rebecca Shoptaw, a 2024 Open Library Engineering Fellow, to the Open Library blog in her first blog post. Rebecca began volunteering with us a few months ago and has already made many great improvements to Open Library. I’ve had the honour of mentoring her during her fellowship, and I’ve been incredibly impressed by her work and her trajectory. Combining her technical competence, work ethic, always-ready positive attitude, and her organization and attention to detail, Rebecca has been an invaluable and rare contributor. I can rely on her to take a project, break it down, learn anything she needs to learn (and fast), and then run it to completion. All while staying positive and providing clear communication of what she’s working on and not dropping any details along the way.

In her short time here, she has also already taken a guidance role with other new contributors, improving our documentation and helping others get started. I don’t know how you found us, Rebecca, but I’m very glad you did!

And with that, I’ll pass it to Rebecca to speak about one of her first projects on Open Library: improving our translation/internationalization pipeline.

Improving Open Library’s Translation Pipeline

Picture this: you’re browsing around on a site, not a care in the world, and suddenly out of nowhere you are told you can “cliquez ici pour en savoir plus.”

Maybe you know enough French to figure it out, maybe you throw it into Google Translate, maybe you can infer from the context, or maybe you just give up. In any of these cases, your experience of using the site just became that much less straightforward.

This is what the Open Library experience has been here and there for many non-English-speaking readers. All our translation is done by volunteers, so with over 300 site contributors and an average of 40 commits added to the codebase each week, there has typically been some delay between new text getting added to the site and that text being translated.

One major source of this delay was on the developer side of the translation process. To make translation of the site possible, the developers need to provide every translator with a list of all the phrases that will be visible to readers on-screen, such as the names of buttons (“Submit,” “Cancel,” “Log In”), the links in the site menu (“My Books,” “Collections,” “Advanced Search”), and the instructions for adding and editing books, covers, and authors. While updates to the visible text occur very frequently, the translation “template” which lists all the site’s visible phrases was previously only updated manually, a process that would usually happen every 3-6 months.

This meant that new text could sit on the site for weeks or months before our volunteer translators were able to access it for translation. There had to be a better way.

And there was! I’m happy to report that the Open Library codebase now automatically generates that template file every time a change is made, so translators no longer have to wait. But how does it work, and how did it all happen? Let’s get into some technical details.

How It Began

Back in February, one of the site’s translators requested an update to the template file so as to begin translating some of the new text. I’d done a little developer-side translation work on the site already, so I was assigned to the issue.

I ran the script to generate the new file, and right away noticed two things:

The process was very simple to run (a single command), and it ran very quickly.
The update resulted in a 2,132-line change to the template file, which meant it had fallen very, very out of date.

I pointed this out to the issue’s lead, Drini, and he mentioned that there had been talk of finding a way to automate the process, but they hadn’t settled on the best way to do so.

I signed off and went to make some lunch, then ran back and suggested that the most efficient way to automate it would be to check whether each incoming change includes new/changed site text, and to run the script automatically if so. He liked the idea, so I wrote up a proposal for it, but nothing really came of it until:

The Hook

In March, Drini reached back out to me with an idea about a potentially simple way to do the automation. Whenever a developer submits a new change they would like to make to the code, we run a set of automatic tests, called “pre-commit hooks,” mostly to make sure that their submission does not contain any typos and would not cause any problems if integrated into the site.

Since my automation idea had been to update the translation template each time a relevant change was made, Drini suggested that the most natural way to do that would be to add a quick template re-generation to the series of automated tests we already have.

The method seemed shockingly simple, so I went ahead and drafted an implementation of it. I tested it a few times on my own computer, found that it worked like a charm, and then submitted it, only to encounter:

The Infinite Loop of Failure

Here’s where things got interesting. The first version of the script simply generated a new template file whether or not the site’s text had actually been changed – this made the most sense since the process was so fast and if nothing actually had changed in the template, the developer wouldn’t notice a difference.

But strangely enough, even though my changes to the code didn’t include any new text, I was failing the check that I wrote! I delved into the code, did some more research into how these hooks work, and soon discovered the culprit.

The process for a simple check and auto-fix usually works as follows:

When the change comes in, the automated checks run; if the program notices that something is wrong (i.e. extra whitespace), it fixes any problems automatically if possible.
If it doesn’t notice anything wrong and/or doesn’t make any changes, it will report a success and stop there. If it notices a problem, even if it already auto-fixed it, it will report a failure and run again to make sure its fix was successful.
On the second run, if the automatic fix was successful, the program should not have to make any further changes, and will report a success. If the program does have to make further changes, or notices that there is still a problem, it will fail again and require human intervention to fix the problem.

This is the typical process for fixing small formatting errors that can easily be handled by an automation tool. But in this case, the script was running twice and reporting a failure both times.

By comparing the versions of the template, I discovered that the problem was very simple: the hook is designed, as described above, to report a failure and re-run if it has made any changes to the code. The template includes a timestamp that automatically lists when it was last updated down to the second. When running online, because more pre-commit checks are run than when running locally, pre-commit takes long enough that by the time it runs again, enough seconds have elapsed that it generates a new timestamp, causing it to notice a one-line difference between the current and previous templates (the timestamp itself), and so it fails again. I.e.:

The changes come in, and the program auto-updates the translation template, including the timestamp.
It notices that it has made a change (the timestamp and any new/changed phrases), so it reports a failure and runs again.
The program auto-updates the translation template again, including the timestamp.
It notices that it has made a change (the timestamp has changed), and reports a second failure.

And so on. An infinite loop of failure!

We could find no way to simply remove the timestamp from the template, so to get out of the infinite loop of failure, I ended up modifying the script so that it actually checks whether the incoming changes would affect the template before updating it. Basically, the script gathers up all the phrases in the current template and compares them to all the incoming phrases. If there is no difference, it does nothing and reports a success. If there is a difference, i.e. if the changes have added or changed the site’s text, it updates the template and reports a failure, so that now:

The changes come in, and the program checks whether an auto-update of the template would have any effect on the phrases.
If there are no phrase changes, it decides not to update the template and reports a success. If there are phrase changes, it auto-updates the template, reports a failure and runs again.
The program checks again whether an auto-update would have any effect, and this time it will not (since all the new phrases have been added), so it does not update the template or timestamp, and reports a success.

What it looks like locally:

A screen recording of the new translation script in action. A developer adds the word "Your" to the phrase "Delete Your Account" and submits the change. The automated tests run; the translation test fails, and updates the template. The developer submits the updated template change, and the automated tests run again and pass.

I also added a few other options to the script so that developers could run it manually if they chose, and could decide whether or not to see a list of all the files that the script found translatable phrases in.

The Rollout

To ensure we were getting as much of the site’s text translated as possible, I also proposed and oversaw a bulk formatting of a lot of the onscreen text which had previously not been findable by the template-updating function. The project was heroically taken on by Meredith (@merwhite11), who successfully updated the formatting for text across almost 100 separate files. I then did a full rewrite of the instructions for how to format text for translation, using the lessons we learned along the way.

When the translation automation project went live, I also wrote a new guide for developers so they would understand what to expect when the template-updating check ran, and answered various questions from newer developers re: how the process worked.

The next phase of the translation project involved using the same automated process we figured out to update the template to notify developers if their changes include text that isn’t correctly formatted for translation. Stef (@pidgezero-one) did a fantastic job making that a reality, and it has allowed us to properly internationalize upwards of 500 previously untranslatable phrases, which will make internationalization much easier to keep track of for future developers.

When I first updated the template file back in February of this year, it had not been updated since March of the previous year, about 11 months. The automation has now been live since May 1, and since then the template has already been auto-updated 35 times, or approximately every two to three days.

While the Open Library translation process will never be perfect, I think we can be very hopeful that this automation project will make une grosse différence.

Follow each other on Open Library

Posted on July 5, 2024 by mek

By Nick Norman, Mek, et al

Subscribe to readers with complimentary tastes to receive book recommendations.

Over the past few months, we’ve been rolling out the basic building blocks of the “Follow” feature: a way for readers to follow those with similar tastes and to tap into a personalized feed of book recommendations.

How does the “Follow” feature work?

Similar to following people on platforms like Facebook, Open Library’s “Follow” feature enables patrons to connect with fellow readers whose Reading Logs are set to public. When you follow other readers, their recent public reading activity will show up in your feed and hopefully help you discover interesting books to read next.

You can get to your feed from the My Books page, using the My Feed item in the left navigation menu:

What’s next?

Most of the functionality for following readers is live, but we’re still designing mechanisms for discovering readers to follow. Interested in shaping the development of this new feature? Take a look at these open Github issues relating to the Follow feature.

Your feedback is appreciated

Have other comments or thoughts? Please share them in the comments section below, connect with us on Twitter, and send us your feedback about the new “Follow” feature.

Let Readers Read

Posted on June 17, 2024 by mek

Mek here, program lead for OpenLibrary.org at the Internet Archive with important updates and a way for library lovers to help protect an Internet that champions library values.

Over the last several months, Open Library readers have felt the devastating impact of more than 500,000 books being removed from the Internet Archive’s lending library, as a result of Hachette v. Internet Archive.

In less than two weeks, on June 28th, the courts will hear the oral argument for the Internet Archive’s appeal.

What’s at stake is the very ability for library patrons to continue borrowing and reading the books the Internet Archive owns, like any other library

Consider signing this open letter to urge publishers to restore access to the 500,000 books they’ve caused to be removed from the Internet Archive’s lending library and let readers read.

The Open Library Blog

A web page for every book