Improving Open Library’s Translation Pipeline

A forward by Drini Cami
Drini Cami here, Open Library staff developer. It’s my pleasure to introduce Rebecca Shoptaw, a 2024 Open Library Engineering Fellow, to the Open Library blog in her first blog post. Rebecca began volunteering with us a few months ago and has already made many great improvements to Open Library. I’ve had the honour of mentoring her during her fellowship, and I’ve been incredibly impressed by her work and her trajectory. Combining her technical competence, work ethic, always-ready positive attitude, and her organization and attention to detail, Rebecca has been an invaluable and rare contributor. I can rely on her to take a project, break it down, learn anything she needs to learn (and fast), and then run it to completion. All while staying positive and providing clear communication of what she’s working on and not dropping any details along the way.

In her short time here, she has also already taken a guidance role with other new contributors, improving our documentation and helping others get started. I don’t know how you found us, Rebecca, but I’m very glad you did!

And with that, I’ll pass it to Rebecca to speak about one of her first projects on Open Library: improving our translation/internationalization pipeline.

Improving Open Library’s Translation Pipeline

Picture this: you’re browsing around on a site, not a care in the world, and suddenly out of nowhere you are told you can “cliquez ici pour en savoir plus.” 

Maybe you know enough French to figure it out, maybe you throw it into Google Translate, maybe you can infer from the context, or maybe you just give up. In any of these cases, your experience of using the site just became that much less straightforward.

This is what the Open Library experience has been here and there for many non-English-speaking readers. All our translation is done by volunteers, so with over 300 site contributors and an average of 40 commits added to the codebase each week, there has typically been some delay between new text getting added to the site and that text being translated.

One major source of this delay was on the developer side of the translation process. To make translation of the site possible, the developers need to provide every translator with a list of all the phrases that will be visible to readers on-screen, such as the names of buttons (“Submit,” “Cancel,” “Log In”), the links in the site menu (“My Books,” “Collections,” “Advanced Search”), and the instructions for adding and editing books, covers, and authors. While updates to the visible text occur very frequently, the translation “template” which lists all the site’s visible phrases was previously only updated manually, a process that would  usually happen every 3-6 months. 

This meant that new text could sit on the site for weeks or months before our volunteer translators were able to access it for translation. There had to be a better way.

And there was! I’m happy to report that the Open Library codebase now automatically generates that template file every time a change is made, so translators no longer have to wait. But how does it work, and how did it all happen? Let’s get into some technical details.

How It Began

Back in February, one of the site’s translators requested an update to the template file so as to begin translating some of the new text. I’d done a little developer-side translation work on the site already, so I was assigned to the issue. 

I ran the script to generate the new file, and right away noticed two things:

  1. The process was very simple to run (a single command), and it ran very quickly.
  2. The update resulted in a 2,132-line change to the template file, which meant it had fallen very, very out of date.

I pointed this out to the issue’s lead, Drini, and he mentioned that there had been talk of finding a way to automate the process, but they hadn’t settled on the best way to do so.

I signed off and went to make some lunch, then ran back and suggested that the most efficient way to automate it would be to check whether each incoming change includes new/changed site text, and to run the script automatically if so. He liked the idea, so I wrote up a proposal for it, but nothing really came of it until:

The Hook

In March, Drini reached back out to me with an idea about a potentially simple way to do the automation. Whenever a developer submits a new change they would like to make to the code, we run a set of automatic tests, called “pre-commit hooks,” mostly to make sure that their submission does not contain any typos and would not cause any problems if integrated into the site. 

Since my automation idea had been to update the translation template each time a relevant change was made, Drini suggested that the most natural way to do that would be to add a quick template re-generation to the series of automated tests we already have.

The method seemed shockingly simple, so I went ahead and drafted an implementation of it. I tested it a few times on my own computer, found that it worked like a charm, and then submitted it, only to encounter:

The Infinite Loop of Failure

Here’s where things got interesting. The first version of the script simply generated a new template file whether or not the site’s text had actually been changed – this made the most sense since the process was so fast and if nothing actually had changed in the template, the developer wouldn’t notice a difference.

But strangely enough, even though my changes to the code didn’t include any new text, I was failing the check that I wrote! I delved into the code, did some more research into how these hooks work, and soon discovered the culprit. 

The process for a simple check and auto-fix usually works as follows:

  1. When the change comes in, the automated checks run; if the program notices that something is wrong (i.e. extra whitespace), it fixes any problems automatically if possible.
  2. If it doesn’t notice anything wrong and/or doesn’t make any changes, it will report a success and stop there. If it notices a problem, even if it already auto-fixed it, it will report a failure and run again to make sure its fix was successful.
  3. On the second run, if the automatic fix was successful, the program should not have to make any further changes, and will report a success. If the program does have to make further changes, or notices that there is still a problem, it will fail again and require human intervention to fix the problem.

This is the typical process for fixing small formatting errors that can easily be handled by an automation tool. But in this case, the script was running twice and reporting a failure both times.

By comparing the versions of the template, I discovered that the problem was very simple: the hook is designed, as described above, to report a failure and re-run if it has made any changes to the code. The template includes a timestamp that automatically lists when it was last updated down to the second. When running online, because more pre-commit checks are run than when running locally, pre-commit takes long enough that by the time it runs again, enough seconds have elapsed that it generates a new timestamp, causing it to notice a one-line difference between the current and previous templates (the timestamp itself), and so it fails again. I.e.:

  1. The changes come in, and the program auto-updates the translation template, including the timestamp.
  2. It notices that it has made a change (the timestamp and any new/changed phrases), so it reports a failure and runs again.
  3. The program auto-updates the translation template again, including the timestamp.
  4. It notices that it has made a change (the timestamp has changed), and reports a second failure.

And so on. An infinite loop of failure!

We could find no way to simply remove the timestamp from the template, so to get out of the infinite loop of failure, I ended up modifying the script so that it actually checks whether the incoming changes would affect the template before updating it. Basically, the script gathers up all the phrases in the current template and compares them to all the incoming phrases. If there is no difference, it does nothing and reports a success. If there is a difference, i.e. if the changes have added or changed the site’s text, it updates the template and reports a failure, so that now:

  1. The changes come in, and the program checks whether an auto-update of the template would have any effect on the phrases. 
  2. If there are no phrase changes, it decides not to update the template and reports a success. If there are phrase changes, it auto-updates the template, reports a failure and runs again.
  3. The program checks again whether an auto-update would have any effect, and this time it will not (since all the new phrases have been added), so it does not update the template or timestamp, and reports a success.

What it looks like locally:

A screen recording of the new translation script in action. A developer adds the word "Your" to the phrase "Delete Your Account" and submits the change. The automated tests run; the translation test fails, and updates the template. The developer submits the updated template change, and the automated tests run again and pass.

I also added a few other options to the script so that developers could run it manually if they chose, and could decide whether or not to see a list of all the files that the script found translatable phrases in.

The Rollout

To ensure we were getting as much of the site’s text translated as possible, I also proposed and oversaw a bulk formatting of a lot of the onscreen text which had previously not been findable by the template-updating function. The project was heroically taken on by Meredith (@merwhite11), who successfully updated the formatting for text across almost 100 separate files. I then did a full rewrite of the instructions for how to format text for translation, using the lessons we learned along the way.

When the translation automation project went live, I also wrote a new guide for developers so they would understand what to expect when the template-updating check ran, and answered various questions from newer developers re: how the process worked.

The next phase of the translation project involved using the same automated process we figured out to update the template to notify developers if their changes include text that isn’t correctly formatted for translation. Stef (@pidgezero-one) did a fantastic job making that a reality, and it has allowed us to properly internationalize upwards of 500 previously untranslatable phrases, which will make internationalization much easier to keep track of for future developers.

When I first updated the template file back in February of this year, it had not been updated since March of the previous year, about 11 months. The automation has now been live since May 1, and since then the template has already been auto-updated 35 times, or approximately every two to three days. 

While the Open Library translation process will never be perfect, I think we can be very hopeful that this automation project will make une grosse différence.

Follow each other on Open Library

By Nick Norman, Mek, et al

Subscribe to readers with complimentary tastes to receive book recommendations.

Over the past few months, we’ve been rolling out the basic building blocks of the “Follow” feature: a way for readers to follow those with similar tastes and to tap into a personalized feed of book recommendations.

How does the “Follow” feature work?

Similar to following people on platforms like Facebook, Open Library’s “Follow” feature enables patrons to connect with fellow readers whose Reading Logs are set to public. When you follow other readers, their recent public reading activity will show up in your feed and hopefully help you discover interesting books to read next.

You can get to your feed from the My Books page, using the My Feed item in the left navigation menu:

What’s next?

Most of the functionality for following readers is live, but we’re still designing mechanisms for discovering readers to follow. Interested in shaping the development of this new feature? Take a look at these open Github issues relating to the Follow feature.

Your feedback is appreciated

Have other comments or thoughts? Please share them in the comments section below, connect with us on Twitter, and send us your feedback about the new “Follow” feature.

Let Readers Read

Mek here, program lead for OpenLibrary.org at the Internet Archive with important updates and a way for library lovers to help protect an Internet that champions library values.

Over the last several months, Open Library readers have felt the devastating impact of more than 500,000 books being removed from the Internet Archive’s lending library, as a result of Hachette v. Internet Archive.

In less than two weeks, on June 28th, the courts will hear the oral argument for the Internet Archive’s appeal.

What’s at stake is the very ability for library patrons to continue borrowing and reading the books the Internet Archive owns, like any other library

Consider signing this open letter to urge publishers to restore access to the 500,000 books they’ve caused to be removed from the Internet Archive’s lending library and let readers read.

Listening to Learners and Educators

Over the course of 2023, the Open Library team conducted design research and video-interviewed nine volunteers to determine how learners and educators make use of the OpenLibrary.org platform, what challenges get in their way, and how we can help them succeed. Participants of the study included a mix of students, teachers, and researchers from around the globe, spanning a variety of disciplines.

About the Participants

At the earliest stages of this research, a screener survey involving 466 participants helped us understand the wide range of patrons who use the Open Library. Excluding 141 responses that didn’t match the criteria of this research, the remaining 325 respondents identified as:

  • 126 high school, university, or graduate students
  • 64 self learners
  • 44 researchers
  • 41 K-12 teachers
  • 29 professors
  • 12 parents of K-12 students
  • 9 librarians

Participants reported affiliations with institutions spanning a diverse variety of geographies, including: Colombia, Romania, France, Uganda, Indonesia, China, India, Botswana, Nigeria, and Ireland.

Findings

A screenshot of the Findings section of the collaborative Mural canvas, filled with digital sticky notes

Here are the top 7 learnings we discovered from this research:

  1. The fact that the Open Library is free and accessible online is paramount to patrons’ success. During interviews, several participants told us that the Open Library helps them locate hard to find books they have difficulty finding elsewhere. At least two participants didn’t have access to a nearby library or a book vendor that could provide the book they needed. In a recent Internet Archive blog post, several patrons corroborated these challenges. In addition to finding materials, one or more of our participants are affected by disabilities or have worked with such persons who have limited mobility, difficulty commuting, and benefit from online access. Research use cases also drove the necessity for online access: At least two interviewed participants used texts primarily as references to cite or verify certain passages. The ability to quickly search for and access specific, relevant passages online was essential to helping them to succeed at their research objective, which may have otherwise been prohibitively expensive or technologically intractable. 
  2. Participants voiced the importance of internationalization and having books in multiple languages. Nearly every participant we interviewed advocated for the website and books to be made available in more languages. One participant had to manually translate sections of English books into Arabic for their students. Another participant who studied classic literature manually translated editions so they could be compared. A teach who we interviewed relayed to us that it was common for their ESL (English as a Second Language) students to ask for help translating books from English into their primary language.
  3. The interests of learners and educators who use the Open Library vary greatly. We expected to find themes in the types of books learners and educators are searching for. However, the foci of participants we interviewed spanned a variety of topics, from Buddhism, to roller coaster history, therapy, technical books, language learning materials, and classic literature. Nearly none of our candidates were looking for the same thing and most had different goals and learning objectives. One need they all have in common is searching for books.
  4. The Open Library has many educational applications we hadn’t intended. One or more participant reported they had used the read aloud feature in their classroom to help students with phonics and language learning. While useful, participants also suggested the feature sometimes glitches and robotic sounding voices are a turnoff. We also learned several educators and researchers link to Open Library when creating course syllabi for their students.
  5. Many of Open Library subject and collection pages weren’t sufficient for one or more of our learners and educators use cases. At least two interviewees ended up compiling their own collections using their own personal web pages and linking back to the Open Library. One participant tried to use Open Library to search for K-12, age-appropriate books pertaining to the “US Constitution Day” and was grateful for but underwhelmed by the results.
  6. The Open Library service & its features are difficult to discover.
    • Several interviewees were unaware of Open Library’s full-text search, read aloud, or note-taking capabilities, yet expressed interest in these features.
    • Many respondents of the screener survey believed their affiliated institutions were unaware of the Open Library platform.
  7. Open Library’s community is generous, active, and eager to participate in research to help us improve. Overall, 450+ individuals from 100+ institutions participated in this process. Previously, more than 2k individuals helped us learn how our patrons prefer to read and more than 750 participants helped us redesign our book pages.

Some of our learnings we had already predicted and seeing these predictions confirmed by data has also given us conviction in pursuing next steps. Some learnings were genuinely surprising to us, such as many teachers preferring the online web-based book reader because it doesn’t require them to install any extra software on school computers.

Proposals

After reviewing this list of findings and challenges, we’ve identified 10 areas where the Open Library may be improved for learners and educators around the globe:

  1. Continuing to make more books available online by expanding our Trusted Book Providers program and adding Web Books to the catalog.
  2. Participating in outreach to promote platform discovery & adoption. Connect with institutions and educators to surface opportunities for partnership, integration, and to establish clear patterns for using Open Library within learning environments.
  3. Adding onboarding flow after registration to promote feature discovery. Conduct followup surveys to learn more about patrons, their challenges, and their needs. Add an onboarding experience which helps newly registered patrons become familiar with new services.
  4. Making books available in more languages by prototyping the capability to translate currently open pages within bookreader to any language, on-the-fly.
  5. Creating better subject and collection page experiences by giving librarians tools to create custom collection pages and tag and organize books in bulk.
  6. Improving Read Aloud by using AI to generate more natural/human voices, make the feature more discoverable in the bookreader interface, and fix read aloud navigation so it works more predictably.
  7. Allowing educators to easily convert their syllabi to lists on Open Library using a Bulk Search & List creator feature.
  8. Moving to a smarter, simpler omni-search experience that doesn’t require patrons to switch modes in order to get the search results they want.
  9. Importing missing metadata and improving incomplete records by giving more librarians better tools for adding importers and identifying records with missing fields.
  10. Improving the performance and speed of the service so it works better for more patrons in more areas.

About the Process

The Open Library team was fortunate to benefit from the guidance and leadership of Abbey Ripstra, a Human-Centered Design Researcher who formerly led Design Research efforts at the Wikimedia organization. This research effort wouldn’t have been achievable without her help and we’re grateful.

In preparation for our research, Abbey helped us outline a design process using an online collaboration and presentation tool called Mural.

During the planning process, we clarified five questions:

  1. What are the objectives of our research?
  2. How will we reach and identify the right people to speak with?
  3. What questions will we ask interview participants?
  4. How will we collate results across participants and synthesize useful output?
  5. How can we make participants feel comfortable and appreciated?

To answer these questions, we:

  • Developed a Screener Survey survey which we rendered to patrons who were logged in to OpenLibrary.org. In total, 466 patrons from all over the world completed the survey, from which we identified 9 promising candidates to interview. Candidates were selected in such a way as to maximize the diversity of academic topics, coverage of geographies, and roles within academia.
  • We followed up with each candidate over email to confirm their interest in participating, suggesting meeting times, and then sent amenable participants our Welcome Letter and details of our Consent Process.
  • We coordinated interview times, delegated interview and note taking responsibilities, and kept track of the state of each candidate in the process using an Operations Tracker we built using Google Sheets:
  • During each interview, the elected interviewer followed our Interviewer’s Guide while the note taker took notes. At the end of each interview, we tidied up our notes and debriefed by adding the most salient notes to a shared mural board.
  • When the interviewing stage had concluded, we sent thank you notes and small thank you gifts to participants. Our design team then convened to cluster insights across interviews and surface noteworthy learnings.

The new Open Library Team Page

By Nick Norman, Elizabeth Mays, & Mek

More than just a ‘thank you’, Open Library’s new Team Page shines a spotlight, beyond staff, at the invaluable efforts of leads, fellows, and contributors – spanning engineering, design, librarianship, and communications – who make openlibrary.org possible.

The Open Library website is an open source effort, powered by an extensive network of volunteer contributors from across the globe. Some contributors swim by to nibble on a specific issue or check out our weekly community calls. Other contributors plant roots and collaborate with staff, as appointed Fellows, to make progress on involved projects that may entail weeks or months of thoughtful preparation. A select few contributors become intimately familiar with our systems, choose to mentor others in the community, and volunteer to manage and lead specific, discrete parts of the project, like our design system, our javascript practices, or internationalization. 

In the past, the website had a stale list of contributors and we didn’t have an established framework for spotlighting the generous humans behind Open Library and keeping this list up to date. With the skillful touch of fellows from our design team—Debbie San, Jaye Lasseigne—and mentorship from Scott Barnes on staff, we now have a beautiful, filterable, and maintainable way of showcasing the achievements of Open Library’s diverse community of contributors: https://openlibrary.org/about/team

We had an opportunity to interview Debbie San, who is responsible for the new Team Page design, to learn more about the design process for this project, and Jaye Lasseigne, who led the new page’s implementation.

An Interview with the Designer & Developer

Speaking with Debbie about the Team Page’s Design Process:

Q.) What led to the decision to create a new team page? 

A.) Debbie’s Insight: I have always believed that it is crucial to recognize individuals for their work. Open Library has many unique and talented individuals, volunteers and staff alike. Our team page is an opportunity to recognize them.

Q.) What was the inspiration behind the team page design?

A.) Debbie’s Response: There were many different websites used as inspirations. We looked at team pages from universities, smaller and bigger projects, and anything else that could help the vision of redesigning our team page.

Q.) How do you incorporate collective input and diverse perspectives into the design process?

A.) Debbie’s Advice: Design is a creative process, but it doesn’t mean it’s a solo process. I believe in the power of collective input and collaboration. Even when I wasn’t 100% sold on the feedback, I valued the diverse perspectives that shaped our collective vision. In the realm of design, embracing a variety of viewpoints is important when it comes to refining and enhancing the end result.

Q.) How do you approach the iterative process in design, particularly when creating different mock-ups?

A.) Debbie’s Thoughts: Even though the implementation may seem simple, challenges may appear, and it is up to everyone, designers and developers alike, to dialogue, to grow together and to find the best solutions. 

I am super thankful to have worked with Jaye and Scott here and how hard they worked to bring this design to life. Now we have a team page that celebrates all staff and contributors who empower Open Library.

Speaking with Jaye about the Team Page’s Technical Implementation:

Q.) Can you share some insights into how your team worked together to bring this page to life? 

A.) Jaye’s Thoughts: Debbie and I worked really well together! I got Debbie’s Figma designs and immediately started working to put it in code. I also received help from Scott Barnes and Mek (Program Lead) to hook up my CSS file, and Jim Champ showed me how to hook up a Javascript file. I remember checking in with Debbie a few times to get feedback on how the design looked on the browser.

Q.) Can you elaborate on challenges you encountered and how you overcame them during the coding process?

A.) Jaye’s Response: Most of my personal challenges came from my limited knowledge of the codebase and where files were located. To help me understand the codebase, I watched some of the videos in the ‘Getting Started’ guide on the Open Library GitHub.

After that, I found I still had questions so I reached out to Mek for help on the CSS. He was able to show me where the CSS files are located, and from there, I was able to figure out how to hook my CSS up to my HTML page. When I got to the Javascript portion, I reached out to Jim Champ who explained the flow of the Javascript files and where everything needed to go for it to work.

Q.) What advice would you give to other organizations who are looking to create a team page?

A.) Jaye’s Advice: “Do lots of research on other team pages you may find online. Find examples you like – you don’t need to reinvent the wheel.”

Just in Time for Growth

Debbie and Jaye’s hard work comes at an important time, given the recent growth of Open Library’s community of contributors.

A Snapshot of Open Library contributors representing 15+ Nations

In 2023, the Open Library project registered interest from 443 volunteer applicants, while cultivating a community of over 1,000 members on Slack. The project also benefited from 2,500 survey respondents, 20+ active developers, and 5 fellows across our 4 programs: Design, Communications, Engineering, and Librarianship. We celebrated the achievements of our community members during our 2023 Open Library Community Celebration.

Join In or Follow Along

Whether you’re a patron, a community contributor, or someone discovering Open Library for the first time, we invite you to explore our new Team Page to meet some of the people who power Open Library.  Or, you can follow us on Twitter for our latest updates. If you’re inspired by our mission and want to contribute, let us know at openlibrary.org/volunteer.