Category Archives: Community

Sandy Chu: My Internship at the Internet Archive

This summer, continuing a years-long tradition, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative focused on bringing new contributors into open source software development. This year, I was lucky enough to mentor Sandy, a long-time Open Library volunteer, on an exciting project to increase the accessibility of our books with real-time translations. We have invited Sandy to speak about her experience here as we reach the culmination of the GSoC period. It was a pleasure getting to work on this exciting project with you Sandy! – Drini

My name is Sandy Chu and I am a 2025 Google Summer of Code (GSoC) candidate who had the opportunity to work with the amazing Internet Archive engineering team. Prior to participating in the GSoC program, I had contributed as a volunteer software engineer for the Open Library open source repo. As someone who grew up using local libraries as a place to supplement my education and read books that my school could not afford, I was drawn to the Open Library’s mission to empower book lovers and provide a free, valuable resource to all. You can view my initial proposal here.

Coming soon in September, the Open Library will be able to better serve its global audience to access books that were previously not available due to a lack of localization. With the help of open source projects such as the Mozilla Firefox Translation Models and Bergamot Translator library, a new BookReader plugin will have the ability to leverage a user’s browser and hardware resources to toggle translations from a book’s original language to a translation in their language. Additionally, the translated text will also work with the ReadAloud feature to read books in the translated language.

The “Real-Time In-Browser Book Translation w/ Read Aloud (TTS)” project closely aligns with the Open Library’s 2025 goal of providing more with less. Although the Internet Archive hosts and provides its patrons with hundreds of thousands of publicly available works, patrons are limited to a subset of works that were published in their native language. Due to the unique image based implementation of the BookReader application, default browser translator options are not viable for many readers, so this project presents an opportunity to make a big impact for international audiences.

Currently in internal beta, the translation plugin allows patrons to quickly initiate a local translator on-their device and translate the book’s text in just a few seconds per page. With nine distinct languages available for translation from English (and potentially over 40 as we update to Mozilla’s latest models), this project will make countless works more accessible for patrons.

The primary goals of this project were:

  • Translating a book’s original text content to the patron’s desired language with minimal delay or disruption
  • Creating a visually seamless experience to maintain the immersive experience of reading a book without having to go back and forth between a translator and the book
  • Redirecting the existing TTS plugin to use the translated text when the BookReader is in translation mode

Language

Total Readable Books on OL

% of All Readable / Borrowable Books (out of 4,526,060)

Native Speakers Globally

(in millions)

English

3,034,445

67.04%

390

French

332,052

7.33%

74

German

180,341

3.98%

76

Spanish

120,516

2.66%

484

Chinese

90,531

2.00%

1,158

Korean

5,384

0.11%

81

Arabic

2,415

.000533%

142

Retrieved from Wikipedia, which references Ethnologue as its source. Chinese and Arabic dialects are grouped together since they both have a unified written system.

Translations

At the center of the translation plugin are the Neural Machine Translation (NMTs) models provided by Mozilla Foundation’s Firefox Translation Models project. These files contain the lexical and vocabulary conversions from the original language to the target language; these compact models are essential to the real-time, browser-side aspect of this project. Since we are currently using an older subset of models, the translation feature is still considered in the “alpha” stage of maturity and accuracy.

When the translation plugin is enabled by the user, the language registry and model files are fetched from a server within the Internet Archive. After the models have successfully loaded into the user’s browser, we are able to use the Bergamot Translator project scripts to create a dedicated Web Worker, which is initialized to handle the translation tasks in a separate background thread. The Web Worker immediately retrieves the text content within the text selection layer for the currently visible page(s) for translation. Pages that have been rendered but not visible in the BookReader are given a lower priority and translated after the queue of visible content is completed. 

An unmodified page in the BookReader.

The translation plugin script feeds the text within the text selection layer into the model for processing and prepares the stage for the translated output by covering the original image and text selection layer with a beige background. [Pull Request #1410]

The translation plugin has initialized and is providing the original text to the language model.

Once the translation is completed by the dedicated Web Worker, the output is then used as the text content for its respective paragraph chunk and appears as if the work is actually written in the target translation language.

The translation has completed and is now on the page!

Images with captions are also carefully handled so that the translated text box occupies nearly the same space as the text selection layer itself. 

Each translated paragraph is stored in a cache with a unique key to prevent the browser from re-translating recently viewed content [Pull Request #1410/commit]. To prevent readers from having to wait for the translation when “flipping” to the previous/next page, the translation plugin targets the visible pages on the screen then works to complete the translations for the non-visible but loaded pages. If a user decides to flip far from their current page in the work, the translation plugin will detect the newly rendered page and translate / populate the translated text layer while adjusting to a new page on the fly.

The text selection layer has been adjusted to appear in red.
The translation layer occupies roughly the same height and width by copying the text selection layer’s properties.

Fine-tuning the visual presentation and behind-the-scenes functionality of the plugin were the main challenges for this portion of the project. Ensuring that the translations for each text chunk were done without depending on a previous chunk was an essential behavior we identified in the early stages of the project. Both asynchronous and synchronous behavior is implemented within the code to ensure that users do not have to wait for longer than needed for paragraphs to complete their translations. The translation plugin utilizes event listeners within the BookReader to detect when a newly rendered text layer is created, which then triggers a translation call to the text content from the upcoming page.

Styling the translated text layer also proved to be difficult. Although it is possible to reuse the style properties on the existing (and invisible) selection text layer, additional adjustments were needed to ensure that the visible translation text would not overlap or go beyond the bounds of the original paragraph. In the early phases of the translation plugin development, there were many instances of text chunks exceeding the boundaries set for the translation layer, which resulted in scrollbars appearing within paragraph elements or not aligning properly with the text on the page.

A screen capture from an earlier version of the translate plugin. A scrollbar can be seen in the 2nd paragraph element of the left page.

Another styling issue that caught us off guard was a pre-existing bug that was only visible in the Chrome browser. Since Drini and I were both using Firefox as our default browser, we later learned during a demo that there was an element scaling issue that was immediately visible when the translation plugin was activated. [Pull Request #1421/commit]

ReadAloud

The next major piece of this project was to connect the translation plugin to the ReadAloud feature and allow users to hear the translated text read aloud. 

The normal flow of the TTS (Text-to-Speech) plugin calls a server-side API to retrieve chunks of text and bounding rectangles based on the page and paragraph index. However, since we have the translated text available within the BookReader locally, the extra network calls to the server were dropped in favor of feeding the translated text lines into the TTS engine directly. Tweaking the pre-existing functionality of the TTS plugin to interact with the content generated by the translate plugin required a substantial amount of investigation to figure out where the adjustments needed to be made for the translation plugin to gracefully take over. 

When the TTS plugin is activated, it checks whether or not the translation plugin is enabled within the BookReader. If the translation plugin is active, the TTS plugin retrieves the translated text on the page to use as its text input for the voice engine. 

Voice overs are also automatically adjusted as soon as the TTS begins to streamline the reading process. By checking the source language from the work’s metadata, the default voice of the TTS reader is automatically adjusted to the target language that was set within the translate plugin. The voice menu is also re-rendered to allow users to more easily switch between the source, target, or other languages for the TTS reader. [Pull Request #1430/commit]

ReadAloud Menu
The ReadAloud voices menu as it is seen without the translation plugin activated.
ReadAloud Menu With Translation
Voices are categorized by the source language, target language, and other languages detected on a user’s system.

Visual parity between the original TTS and translated TTS was maintained as well by highlighting the entire translated paragraph section. Since network info containing the bounding rectangles for a text chunk were no longer available, I was able to use a paragraph element’s offset properties to highlight the text being actively read by the TTS reader [Pull Request #1431/commit].

The BookReader highlights chunks of text that are actively being dictated by the ReadAloud plugin.
ReadAloud highlight with translation active
With a few tweaks, the ReadAloud highlighting feature can also be used to highlight the translated text being dictated by the voice over.

Although this stage of the project did not require as much new code, we encountered a relatively complex issue that would cause the TTS reader to not progress if the translation plugin is activated in a part of a book that contained one or more blank pages. The translation adjusted implementation of the TTS plugin would wait for a new page to be loaded and rendered within the browser but remain stuck on a page due to a synchronization issue. After two weeks of extensive investigation and testing, we were able to resolve the issue by utilizing an existing method that returns all pages that have been loaded but not rendered in the DOM yet [Pull Request #1431/commit] and consolidating the asynchronous translation call with Promise.all().

Next Steps

For now, this feature is currently scheduled to be released for internal testing before being released for full public use. While the majority of goals were completed within the project timespan, there are many additional improvements and expansions that are planned in the future as the BookReader’s translate plugin becomes more mature. The next major steps for this project involve expanding the number of available translation pairs by integrating the latest models from Mozilla’s Translation Model project, receiving and implementing feedback from a round of internal testing, and continually improving the UI of the plugin. Unit tests and offline testing environments are also part of the project’s future goals to help improve the troubleshooting process for developers.

Conclusion

I would like to express my thanks once again to my GSoC mentor Drini for his guidance. The first few weeks of this project felt especially daunting, but the patience and advice that I was given throughout this program helped me realize that this big intimidating project was easier to manage as a number of small tasks were taken step-by-step. I am very glad that I had the chance to be challenged in new ways while being able to leverage my existing JavaScript skills. 

I am extremely grateful that I was able to participate in the GSoC program and to help contribute to a high-impact feature for both the Internet Archive and Open Library. Though my time as a GSoC contributor has officially ended, I intend to continue my work as a contributor with the Open Library team to expand on the functionality of this feature and help increase the availability of published works to a wider global community.

The new Open Library Team Page

By Nick Norman, Elizabeth Mays, & Mek

More than just a ‘thank you’, Open Library’s new Team Page shines a spotlight, beyond staff, at the invaluable efforts of leads, fellows, and contributors – spanning engineering, design, librarianship, and communications – who make openlibrary.org possible.

The Open Library website is an open source effort, powered by an extensive network of volunteer contributors from across the globe. Some contributors swim by to nibble on a specific issue or check out our weekly community calls. Other contributors plant roots and collaborate with staff, as appointed Fellows, to make progress on involved projects that may entail weeks or months of thoughtful preparation. A select few contributors become intimately familiar with our systems, choose to mentor others in the community, and volunteer to manage and lead specific, discrete parts of the project, like our design system, our javascript practices, or internationalization. 

In the past, the website had a stale list of contributors and we didn’t have an established framework for spotlighting the generous humans behind Open Library and keeping this list up to date. With the skillful touch of fellows from our design team—Debbie San, Jaye Lasseigne—and mentorship from Scott Barnes on staff, we now have a beautiful, filterable, and maintainable way of showcasing the achievements of Open Library’s diverse community of contributors: https://openlibrary.org/about/team

We had an opportunity to interview Debbie San, who is responsible for the new Team Page design, to learn more about the design process for this project, and Jaye Lasseigne, who led the new page’s implementation.

An Interview with the Designer & Developer

Speaking with Debbie about the Team Page’s Design Process:

Q.) What led to the decision to create a new team page? 

A.) Debbie’s Insight: I have always believed that it is crucial to recognize individuals for their work. Open Library has many unique and talented individuals, volunteers and staff alike. Our team page is an opportunity to recognize them.

Q.) What was the inspiration behind the team page design?

A.) Debbie’s Response: There were many different websites used as inspirations. We looked at team pages from universities, smaller and bigger projects, and anything else that could help the vision of redesigning our team page.

Q.) How do you incorporate collective input and diverse perspectives into the design process?

A.) Debbie’s Advice: Design is a creative process, but it doesn’t mean it’s a solo process. I believe in the power of collective input and collaboration. Even when I wasn’t 100% sold on the feedback, I valued the diverse perspectives that shaped our collective vision. In the realm of design, embracing a variety of viewpoints is important when it comes to refining and enhancing the end result.

Q.) How do you approach the iterative process in design, particularly when creating different mock-ups?

A.) Debbie’s Thoughts: Even though the implementation may seem simple, challenges may appear, and it is up to everyone, designers and developers alike, to dialogue, to grow together and to find the best solutions. 

I am super thankful to have worked with Jaye and Scott here and how hard they worked to bring this design to life. Now we have a team page that celebrates all staff and contributors who empower Open Library.

Speaking with Jaye about the Team Page’s Technical Implementation:

Q.) Can you share some insights into how your team worked together to bring this page to life? 

A.) Jaye’s Thoughts: Debbie and I worked really well together! I got Debbie’s Figma designs and immediately started working to put it in code. I also received help from Scott Barnes and Mek (Program Lead) to hook up my CSS file, and Jim Champ showed me how to hook up a Javascript file. I remember checking in with Debbie a few times to get feedback on how the design looked on the browser.

Q.) Can you elaborate on challenges you encountered and how you overcame them during the coding process?

A.) Jaye’s Response: Most of my personal challenges came from my limited knowledge of the codebase and where files were located. To help me understand the codebase, I watched some of the videos in the ‘Getting Started’ guide on the Open Library GitHub.

After that, I found I still had questions so I reached out to Mek for help on the CSS. He was able to show me where the CSS files are located, and from there, I was able to figure out how to hook my CSS up to my HTML page. When I got to the Javascript portion, I reached out to Jim Champ who explained the flow of the Javascript files and where everything needed to go for it to work.

Q.) What advice would you give to other organizations who are looking to create a team page?

A.) Jaye’s Advice: “Do lots of research on other team pages you may find online. Find examples you like – you don’t need to reinvent the wheel.”

Just in Time for Growth

Debbie and Jaye’s hard work comes at an important time, given the recent growth of Open Library’s community of contributors.

A Snapshot of Open Library contributors representing 15+ Nations

In 2023, the Open Library project registered interest from 443 volunteer applicants, while cultivating a community of over 1,000 members on Slack. The project also benefited from 2,500 survey respondents, 20+ active developers, and 5 fellows across our 4 programs: Design, Communications, Engineering, and Librarianship. We celebrated the achievements of our community members during our 2023 Open Library Community Celebration.

Join In or Follow Along

Whether you’re a patron, a community contributor, or someone discovering Open Library for the first time, we invite you to explore our new Team Page to meet some of the people who power Open Library.  Or, you can follow us on Twitter for our latest updates. If you’re inspired by our mission and want to contribute, let us know at openlibrary.org/volunteer.

It takes a Classroom to build an Open Library

On most days, the Open Library is hard at work improving the experience it offers to students and teachers in classrooms. But for the past few months, Open Library has had the privilege of enjoying contributions from 7 students around the globe who had been assigned by their universities to participate on open source software projects.

First and foremost, the entire Open Library community extends our deep gratitude to AUEB / Athens University of Economics and Business‘s Dr. Diomidis Spinellis (professor of Software Engineering, who taught the course Software Engineering in Practice) and NYU‘s Dr. Joanna Klukowska (Clinical Associate Professor of Computer Science, who taught the course CSCI-UA.0480-061, Open Source Software Development) for incorporating open source contributions into the curriculum of their classrooms. As we hope you’ll see, the decision to promote hands-on development has an outsized impact on supporting open source projects like ours.

In the spring semester of 2022, four students from Greece’s AUEB (Constantina Z., Vassilis B., Dimitris B., and Philippos P. / Φιλιππος Π.) and three students from NYU (Michelle T., Crystal C., Chloe Q.) spent time participating in community calls, problem solving, and improving the Open Library service for the public. In return they received mentorship and first-hand experience learning how to contribute to a platform trusted by millions of international readers.

This year, the foci of Open Library’s roadmap is improving core experiences for patrons. Towards this goal, each of these students exceeded our expectations by contributing meaningful improvements like: Chinese internationalization of the website, google analytics to help inform us on meaningful ways to improve the organization of the website, fixing broken mobile navigation for our Books Page, UI improvements for sharing books on social media, adding APIs for Trending Books, and much more. We’re extremely proud of and grateful for the work these students were able to contribute.

In the past, Open Library has reserved a special honorary title of “Open Library Fellow” for exemplary contributors who have demonstrated exceptional commitment, leadership, and impact with the Open Library project. Our list of previous Fellows include Sabreen Parveen (who designed our onboarding experience), Yash Saravgi (who developed our mobile Progressive Web App), and Bharat Kalluri (who helped standardize our import pipelines). Each dedicated several months implementing features which redefined core behaviors and experiences of the Open Library.

This year, we believe one student in particular, Constantina Zouni, stands out as being especially deserving of this special Fellowship distinction, for her initiative, participation in engineering and design process with stakeholders, and outstanding work ethic.

Please join us in celebrating the work of this 2022 international student cohort, sharing our gratitude, and congratulating Constantina on her inspiring example.

Improving Experiences for Open Library Patrons

By Constantina Zouni

As this semester of my studies is coming to an end, I want to do a retrospect about my experience with the open library project.

My Journey with Open Library

In the beginning of the semester my professor Dr. Diomidis Spinellis for the course “Software Engineering in Practice” announced that in the context of an assignment we had to choose an open source project to make contributions thought out the semester. As a result, I started searching for a project and I was lucky to quickly find open library’s repository. Some of the main reason that made me to choose that project is that the community was very friendly and really open to contributors. The documentation of the project was really detailed and there were videos that helped me understand how the project works. Also, another good thing was that the issues of the project were well organized with labels and the context was explanatory enough. Moreover, the project seemed to be very active with quick responses in the comments section and pull request merges almost every day. After the first communication with the team everything went very smoothly. I was welcomed in the slack channel, and I was invited to participate in the weekly meetings. Mek quickly stepped in and helped me to get started. Because that period was busy and contributors from other universities also chose to contribute to open library the project’s team made effort to create a GitHub project and assign issues to everyone. I started solving minor issues related with text appearing when not needed, adding the subtitle to the search results and some UI improvements. Ultimately, in collaboration with another student from my university Vassilis Bubis we created the twitter social card that enables users to share their book lists. Through out the whole period that I contributed to the project I was impressed that Mek and the other members of the open library team dedicated time answering our messages and even jumping on small zoom meetings.

Book page header in mobile

One of the issues that I think had a big impact in the open library users is the improvement of the book page header in the mobile environment. When users visited a book page from mobile the experience wasn’t that pleasant. The book title and other important information like the author, the subtitle etc didn’t fit in the phone screen and the user had to scroll down to see them. The issue was more significant in the cases where the book covers were ambiguous, and it made difficult for the user to understand if they were in the correct page. Jim Champ recommended to follow a specific layout for the book page in mobile in order to fit all the important information in the mobile page. The challenge was the layout had to be different depending on the device of the user. My first implementation involved some java script code that change the order of the elements and an event listener that was activated when the screen had a specific size. The open library team quickly informed me that this implementation was causing a delay in the loading of the page, and they recommended me to use HTML and CSS. This time with a new implementation and the help of Jim Champ who was reviewing my pull request I managed to solve the issue using an HTML file that included only the title summary and some CSS commands.

Book header in mobile before and after

Dynamic book list preview for sharing

This new feature was a little more challenging than the previous one. This time I collaborated with Vassilis Bubis in order to create a dynamic preview for the book lists of the users that displays the first 5 books of the list. Then this preview is passed to the twitter social card and every time a user wants to share a list with the URL the preview image appears. This is a more interactive way for users to show their book lists to others and makes open library more recognizable across twitter users. The first challenge was to create a mock-up of the preview. To achieve that I used a design tool called Figma to create prototypes with different colour combinations and I let the open library team to decide which on they like more. For the design I used colours from the open library’s webpage, and I added a twist in the preview that represents a self where the books are placed. Alongside with the mock-ups Vassilis worked on retrieving the book covers that we need and place them above the background with the help of a Python library called Pillow. Then I stepped in, and I made sure that every book cover was resized in a way proportional to the original dimensions that it had. We noticed that some covers were stretching so it was important that every time we changed the width of a cover the height was adjusted properly. Another challenge was the text that we wanted to add in the preview. The text had to change dynamically, and we had to change line every time the characters exceeded a specific number to achieve an aesthetically pleasant result. One issue that we faced was that the coordinates of the covers that we had figured out with Figma had to change because in python the coordinates are applied from upper left corner compared to Figma that apply to the center of an image. After solving that Vassilis and I proceeded on storing the image in an in-memory binary array for better performance and finally creating the API for the list page.

Twitter social card for book lists sharing

Book page editing improvement

While working on some issues in the book page I realised that compared to other library webpages open library gives users the ability to edit the details and the information of a book. That feature is very valuable because users can add important details for a book that were missing when it was added, they can update that information and they can add descriptions and subjects that might be useful for other users. Although this feature is really important the editing user interface is not that pleasant. When users click on the edit button, they are directed to another page. My recommendation regarding that is to use a modal that pops up when the button is clicked. In that way users will feel like they have more control because they won’t be directed to another page, and they can still see the book page behind the modal. Another issue with the existing editing form is that users can discard the changes with the cancel button, but they can’t undo a change without deleting all the changes. In the mock-up that I created I added an arrow in the right upper side that symbolizes the undo action. I noticed that the examples for every field were placed next to the field title, and I opted to move them inside the text box for a clearer look. Finally, I added the info symbols beside every field that provides details on how you should fill out that specific field. Overall, the purpose of those recommendations is to make the booking editing more simple, compact and user friendly.

Book editing page now
Edit book page with modal created with Figma

Introducing Trusted Book Providers

Building the Internet’s library is no easy task, and it can’t be done alone. Thankfully, we’re not alone in wanting to provide access to knowledge, books, and reading — which is why we’re excited to introduce Trusted Book Providers into Open Library. This feature allows us to provide direct “Read” links to a number of carefully selected, reputable sources of books online. Integrations with Project Gutenberg and LibriVox are up and running, and integrations with Standard Ebooks, OpenStax, and Wikisource are in progress. By linking to these outstanding organizations, we’re excited to help promote their wonderful work as well as give Open Library patrons easy access to more trusted sources for digital books. We see this as a step in helping the world of open access books flourish.

Viewing LibriVox and Gutenberg works in Open Library

For more than ten years, Open Library has allowed patrons from across the globe to read, borrow, and listen to digital books from the Internet Archive’s prodigious lending library and public domain collection. Since then, the Internet Archive has partnered closely with more than 1,000 US libraries to accession books, ensure their digital preservation, and make them useful to select audiences, such as those with print disabilities, through controlled library practices.

Open Library is now excited to expand its “Read” buttons to include not only the millions of books made available by the Internet Archive, but also works from other trusted digital collections. What does this mean for patrons? It means more books and more reading options — such as LibriVox’s human-read public domain audiobooks, Standard Ebooks’ lovingly formatted modern epubs, or Project Gutenberg’s reflowable-text books. We hope this will result in a more inclusive ecosystem and shine more light on the amazing work done by these other mission-aligned non-profit organizations.

Choosing the First Trusted Book Providers

We selected the first group of Trusted Book Providers based on several factors. First, we prioritized non-profit organizations who are reputable, well-established, and have a similar focus on serving public good. Second, we looked for providers whose holdings increased the diversity of book formats Open Library may link to. Thirdly, we looked for providers who focus on open & permissive licensing, or public domain material.

Project Gutenberg

Project Gutenberg is the oldest digital library online. Founded in 1971 (was the internet even around then?), the volunteer-driven organization is dedicated to creating free, open, long-lasting eBooks that are easily accessible from many devices. The Internet Archive already proudly preserves most of Project Gutenberg’s over 60,000 titles, and Open Library is excited to be able to have users read from Project Gutenberg directly. For patrons, the human-curated, reflowable-text formats made available by Project Gutenberg are ideal for reading on small screens, e-readers, and also for powerful accessibility customization, like dyslexic fonts and screen readers.

Browse on Open Library

LibriVox

Founded in 2005, LibriVox’s stated mission is “to make all books in the public domain available, narrated by real people and distributed for free, in audio format on the internet.” And with over 15,000 editions in over 80 languages, they’re making great headway! The Internet Archive also works with LibriVox, and provides storage for their mass of audio files. For patrons, LibriVox integration means they will now have access to human-spoken audiobooks for many public domain works.

Browse on Open Library

Standard Ebooks

Standard Ebooks is a volunteer-driven project dedicated to producing new editions of public domain ebooks that are lovingly formatted, open source, free of copyright restrictions, and free of cost. Founded in 2015, Standard Ebooks books are carefully standardized and normalized to work great as reflowable-text html, as well as modern epubs with all the trimmings — table of contents, typographical attention to detail, beautiful public domain cover art, and more. For patrons, Standard Ebooks’ over 500 titles are perfect for reading on web browsers, phones, or e-readers due to their reflowable text and modern epub features specifically optimized for every e-reader platform.

In Progress… | Browse at Standard Ebooks

OpenStax

OpenStax is a non-profit dedicated to creating original, free, open-access high school and college textbooks. Part of the non-profit corporation, Rice University, OpenStax has created over 60 high quality, peer-reviewed textbooks since its launch in 2012, with some titles available in English, Spanish, and Polish. Open Library will include OpenStax read links so our patrons can find and access these digital-only materials online or as PDF or ePub downloads.

In Progress… | Browse at OpenStax

Wikisource

Launched in 2003, Wikisource is an online digital library of free-content textual sources on a wiki, operated by the Wikimedia Foundation (the folks who run Wikipedia). Wikisource has a huge community of editors dedicated to converting scans of classic books to error-free, proofread digital books. And improving their records is as easy as editing a Wikipedia page! Offering reading options online or offline as PDF, ePub, mobi, etc for millions of records, Wikisource’s catalog, spanning over 30 languages, is unparalleled. And soon, you’ll be able to find these works right in Open Library!

In Progress… | Browse at Wikisource

How Trusted Book Providers Work

As a patron, you shouldn’t have to do anything special to access titles from our Trusted Partners.

When designing support for Trusted Providers, we wanted to find the right balance between convenience and trust. We didn’t want patrons to get confused by a button taking them to a new website without warning. But we also didn’t want to introduce unnecessary friction and multiple clicks preventing patrons from easily accessing books. As a result, our team team converged on two strategies:

  1. When a Read button is for a Trusted Provider, the button will have an external link icon like:
  2. When you click a Trusted Provider button, a message will appear on Open Library providing context about the Trusted Provider. The Trusted Provider link will be open within a new browser tab.

Recommend a Trusted Book Provider

Are you a book service, library, or publisher which would like to integrate with the Open Library’s catalog? Or is there a service you’d like to recommend?

Please recommend or apply to become a Trusted Book Provider using this form.

The Open Book Genome Project

We’ve all heard the advice, don’t judge a book by its cover. But then how should we go about identifying books which are good for us? The secret depends on understanding two things:

  1. What is a book?
  2. What are our preferences?

We can’t easily answer the second question without understanding the first one. But we can help by being good library listeners and trying to provide tools, such as the Reading Log and Lists, to help patrons record and discover books they like. Since everyone is different, the second question is key to understanding why patrons like these books and making Open Library as useful as possible to patrons.

What is a book?

As we’ve explored before, determining whether something is a book is a deceptively difficult task, even for librarians. It’s a bound thing made of paper, right? But what about audiobooks and ebooks? Ok, books have ISBNs right? But many formats can have ISBNs and books published before 1967 won’t have one. And what about yearbooks? Is a yearbook a book? Is a dictionary a book? What about a phonebook? A price guide? An atlas? There are entire organizations, like the San Francisco Center for the Book, dedicated to exploring and pushing the limits of the book format.

In some ways, it’s easier to answer this question about humans than books because every human is built according to a specific genetic blueprint called DNA. We all have DNA, what make us unique are the variations of more than 20,000 genes that our DNA are made of, which help encode for characteristics like hair and eye color. In 1990, an international research group called the Human Genome Project (HGP) began sequencing the human genome to definitively uncover, “nature’s complete genetic blueprint for building a human being”. The result, which completed in 2003, was a compelling answer of, “what is a human?”.

Nine years later, Will Glaser & Tim Westergren drew inspiration from HGP and launched a similar effort called the Music Genome Project, using trained experts to classify and label music according to a taxonomy of characteristics, like genre and tempo. This system became the engine which powers song recommendations for Pandora Radio.

Circa 2003, Aaron Stanton, Matt Monroe, Sidian Jones, and Dan Bowen adapted the idea of Pandora to books, creating a book recommendation service called BookLamp. Under the hood, they devised a Book Genome Project which combined computers and crowds to “identify, track, measure, and study the multitude of features that make up a book”.

Their system analyzed books and surfaced insights about their structure, themes, age-appropriateness, and even pace, bringing us withing grasping distance of the answer to our question: What is a book?

BookLamps-Theme-Currents-for-Carrie

Sadly, the project did not release their data, was acquired by Apple in 2014, and subsequently discontinued. But they left an exciting treasure map for others to follow.

And follow, others did. In 2006, a project called the Open Music Genome Project attempted to create a public, open, community alternative to Pandora’s Music Genome Project. We thought this was a beautiful gesture and a great opportunity for Open Library; perhaps we could facilitate public book insights which any project in the ecosystem could use to create their own answer for, “what is a book?”. We also found inspiration from complimentary projects like StoryGraph, which elegantly crowd sources book tags from patrons to help you, “choose your next book based on your mood and your favorite topics and themes”, HaithiTrust Research Center (HTRC) which has led the way in making book data available to researchers, and the Open Syllabus Project which is surfacing useful academic books based on their usage across college curriculum.

Introducing the Open Book Genome Project

Over the last several months, we’ve been talking to communities, conducting research, speaking with some of the teams behind these innovative projects, and building experiments to shape a non-profit adaptation of these approaches called the Open Book Genome Project (OBGP).

Our hope is that this Open Book Genome Project will help responsibly make book data more useful and accessible to the public: to power book recommendations, to compare books based on their similarities and differences, to produce more accurate summaries, to calculate reading levels to match audiences to books, to surface citations and urls mentioned within books, and more.

OBGP hopes to achieve these things by employing a two pronged approach which readers may continue learning about in following two blog posts:

  1. The Sequencer – a community-engineered bot which reads millions of Internet Archive books and extracts key insights for public consumption.
  2. Community Reviews – a new crowd-sourced book tagging system which empowers readers to collaboratively classify & share structured reviews of books.

Or hear an overview of the OBGP in this half-hour tech talk: