Category Archives: Uncategorized

Improvements to the Main Navigation

Dana, UX/Design Fellow on Open Library

Forward by Mek

Few aspects of a website have greater impact and receive less recognition than good navigation. If done well, a site’s main navigation is almost invisible: it’s there when patrons need it, out of the way when they don’t, and it zips patrons to the right place without making them overthink. Over the years, we’ve attempted improvements to our navigation but we haven’t had the design bandwidth to conduct user research and verify that our changes solved our patrons problem. It turns out, we still had several opportunities for improvement. That’s why this year we were incredible lucky to have collaborated with Open Library UX Design Fellow Dana Fein-Schaffer, who recently transitioned into design from a previous role as a neuropsychological researcher. Dana’s formal education and experience links a trifecta of complimentary fields — computer science, psychology, and design — and has resulted in unique perspective as we’ve endeavored to redesign several of Open Library’s core experiences: the website’s main navigation and the desktop version of our My Books page.

As an Open Library UX Design Fellow, Dana has been in charge of design direction, conducting user interviews, figma mockups, feedback sessions, and communicating decisions to stakeholders. In addition to her skill prototyping and notable problem solving capabilities, Dana’s warmth with the community, affinity for collaboration, and enthusiasm for the project has made teaming up with her a gift. While we’d rather the fellowship not end, we endorse her work with great enthusiasm and highly recommend organizations which share our values to view Dana’s portfolio and engage her for future design opportunities.

The Design Process

IMG_7214.JPG

Hello, I’m Dana Fein-Schaffer, and I’ve been working as a UX Design Fellow with Open Library over the past several months. I’m currently transitioning into UX Design because after gaining professional experience in both psychology research and software engineering, I’ve realized that UX is the perfect blend of my skills and interests. I’ve been enjoying growing my UX skillset, and working with Open Library has been a perfect opportunity for me to gain some formal experience because I love reading and was hoping to work on a book-related project. Moving forward, I’m particularly excited about working as a UX designer for an organization that focuses on social good, especially in the literary, education, or healthcare spaces! If you have a role that may be a good fit, or if you work in one of those industries and are interested in connecting, please feel free to reach out. You can also learn about me from my portfolio.

Problem

At the beginning of my project, the current navigation bar and hamburger menu looked like this: 

I met with members of the Open Library team to identify three key areas of concern: 

  1. The label “more” was not descriptive, and it was unclear to patrons what this meant
  2. The hamburger menu was not consistent with the navigation bar items
  3. The hamburger menu was confusing for patrons, especially new patrons, to navigate

Approach

Before I began my project, the Open Library team decided to implement an interim solution to the first point above. To address the concern with the “more” label, the navigation menu was changed to instead include “My Books” and “Browse.” The website analytics showed that patrons frequent their Loans page most, so for now, the “My Books” page brings patrons to the Loans page. 

To begin redesigning the main site navigation, I first created a prototype in Figma with some potential solutions that built off of this updated navigation menu:

  1. I created a dropdown menu for “My Books” that would allow patrons to select the specific page they would like to go to, rather than automatically going to the Loans page
  1. I reorganized the hamburger menu to be consistent with the navigation menu and to use the subheadings of “Contribute” and “Resources” instead of “More.” I felt that these changes would make the hamburger menu easier to navigate for both new and long-time patrons. 

After creating the prototype, my next goal was to get feedback from patrons, so I scheduled user interviews with volunteers.

User Interviews

I conducted user interviews via Zoom with four patrons to answer the following questions: 

  1. How do users feel about the My Books dropdown? 
  2. Are users using My Books and Browse from the navigation menu or hamburger menu?
  3. Are users able to effectively use the hamburger? Do they find it easier or harder to find what they’re looking for using the reorganized hamburger?

Results & Findings

  1. Three out of the four patrons preferred the dropdown, and the fourth user didn’t have a preference between the versions. The patrons enjoyed having the control to navigate to a specific section of My Books. 
  2. All users used the navigation menu at the top of the page to navigate, rather than the hamburger menu, which supported the switch to “My Books” instead of “More.” This finding also highlighted the need to make sure patrons could access the pages they wanted to access from the top navigation menu. 
  3. Finally, all four patrons found the existing hamburger menu confusing and preferred the reorganized hamburger. Some patrons specifically mentioned that the reorganized hamburger was more compact and that they felt that the headings “contribute” and “resources” were more clear than “more.”

Synthesis and a New Direction

After learning from the user interviews that patrons preferred increased granularity for accessing My Books and a more concise hamburger menu, the Open Library team began discussing the exact implementation. We wanted to keep the navigation and hamburger menus consistent; however, we also wanted to provide many options in the My Books dropdown, which made the hamburger menu less concise. 

At the same time that we were debating the pros and cons of various solutions, another Open Library UX Design fellow, Samuel G, was working on designing a My Books landing page for the mobile site. Inspired by his designs, I created mockups for a desktop version of the My Books page. Having a My Books page that summarizes patrons’ loans, holds, and reading log at a glance allows patrons to still have increased control over My Books navigation while keeping the hamburger menu concise, since all of the individual My Books items can now be condensed into one link. 

Furthermore, having a My Books landing page opens the door for more ways for patrons to interact with their reading through Open Library. For instance, I’ve created a mockup that includes a summary of a patron’s reading stats and yearly reading goal at the top of the page. 

As we work towards implementing this design, I’m looking forward to getting feedback from patrons and brainstorming even more ways to maximize the use of this page. 

Reflections

Working with the Open Library team has been an amazing experience. I’m so grateful that I got to lead a UX project from start to end, beginning with user research and ending with final designs that are ready to be implemented. Working with such a supportive team has allowed me to learn more about the iterative design process, get comfortable with sharing and critiquing my designs, and gain more experience with design tools, such as Figma. It was also a great learning experience that sometimes your projects will take an unexpected turn, but those turns help you eventually come to the best possible design solution. Thank you to everyone who provided feedback and helped me along the way, especially Mek, who was a wonderful mentor, and Sam, who was a great collaborator on our My Books mobile and desktop project!

About the Open Library Fellowship Program

The Internet Archive’s Open Library Fellowship is a flexible, self-designed independent study which pairs volunteers with mentors to lead development of a high impact feature for OpenLibrary.org. Most fellowship programs last one to two months and are flexible, according to the preferences of contributors and availability of mentors. We typically choose fellows based on their exemplary and active participation, conduct, and performance within the Open Library community. The Open Library staff typically only accepts 1 or 2 fellows at a time to ensure participants receive plenty of support and mentor time. Occasionally, funding for fellowships is made possible through Google Summer of Code or Internet Archive Summer of Code & Design. If you’re interested in contributing as an Open Library Fellow and receiving mentorship, you can apply using this form or email openlibrary@archive.org for more information.

It takes a Classroom to build an Open Library

On most days, the Open Library is hard at work improving the experience it offers to students and teachers in classrooms. But for the past few months, Open Library has had the privilege of enjoying contributions from 7 students around the globe who had been assigned by their universities to participate on open source software projects.

First and foremost, the entire Open Library community extends our deep gratitude to AUEB / Athens University of Economics and Business‘s Dr. Diomidis Spinellis (professor of Software Engineering, who taught the course Software Engineering in Practice) and NYU‘s Dr. Joanna Klukowska (Clinical Associate Professor of Computer Science, who taught the course CSCI-UA.0480-061, Open Source Software Development) for incorporating open source contributions into the curriculum of their classrooms. As we hope you’ll see, the decision to promote hands-on development has an outsized impact on supporting open source projects like ours.

In the spring semester of 2022, four students from Greece’s AUEB (Constantina Z., Vassilis B., Dimitris B., and Philippos P. / Φιλιππος Π.) and three students from NYU (Michelle T., Crystal C., Chloe Q.) spent time participating in community calls, problem solving, and improving the Open Library service for the public. In return they received mentorship and first-hand experience learning how to contribute to a platform trusted by millions of international readers.

This year, the foci of Open Library’s roadmap is improving core experiences for patrons. Towards this goal, each of these students exceeded our expectations by contributing meaningful improvements like: Chinese internationalization of the website, google analytics to help inform us on meaningful ways to improve the organization of the website, fixing broken mobile navigation for our Books Page, UI improvements for sharing books on social media, adding APIs for Trending Books, and much more. We’re extremely proud of and grateful for the work these students were able to contribute.

In the past, Open Library has reserved a special honorary title of “Open Library Fellow” for exemplary contributors who have demonstrated exceptional commitment, leadership, and impact with the Open Library project. Our list of previous Fellows include Sabreen Parveen (who designed our onboarding experience), Yash Saravgi (who developed our mobile Progressive Web App), and Bharat Kalluri (who helped standardize our import pipelines). Each dedicated several months implementing features which redefined core behaviors and experiences of the Open Library.

This year, we believe one student in particular, Constantina Zouni, stands out as being especially deserving of this special Fellowship distinction, for her initiative, participation in engineering and design process with stakeholders, and outstanding work ethic.

Please join us in celebrating the work of this 2022 international student cohort, sharing our gratitude, and congratulating Constantina on her inspiring example.

Improving Experiences for Open Library Patrons

By Constantina Zouni

As this semester of my studies is coming to an end, I want to do a retrospect about my experience with the open library project.

My Journey with Open Library

In the beginning of the semester my professor Dr. Diomidis Spinellis for the course “Software Engineering in Practice” announced that in the context of an assignment we had to choose an open source project to make contributions thought out the semester. As a result, I started searching for a project and I was lucky to quickly find open library’s repository. Some of the main reason that made me to choose that project is that the community was very friendly and really open to contributors. The documentation of the project was really detailed and there were videos that helped me understand how the project works. Also, another good thing was that the issues of the project were well organized with labels and the context was explanatory enough. Moreover, the project seemed to be very active with quick responses in the comments section and pull request merges almost every day. After the first communication with the team everything went very smoothly. I was welcomed in the slack channel, and I was invited to participate in the weekly meetings. Mek quickly stepped in and helped me to get started. Because that period was busy and contributors from other universities also chose to contribute to open library the project’s team made effort to create a GitHub project and assign issues to everyone. I started solving minor issues related with text appearing when not needed, adding the subtitle to the search results and some UI improvements. Ultimately, in collaboration with another student from my university Vassilis Bubis we created the twitter social card that enables users to share their book lists. Through out the whole period that I contributed to the project I was impressed that Mek and the other members of the open library team dedicated time answering our messages and even jumping on small zoom meetings.

Book page header in mobile

One of the issues that I think had a big impact in the open library users is the improvement of the book page header in the mobile environment. When users visited a book page from mobile the experience wasn’t that pleasant. The book title and other important information like the author, the subtitle etc didn’t fit in the phone screen and the user had to scroll down to see them. The issue was more significant in the cases where the book covers were ambiguous, and it made difficult for the user to understand if they were in the correct page. Jim Champ recommended to follow a specific layout for the book page in mobile in order to fit all the important information in the mobile page. The challenge was the layout had to be different depending on the device of the user. My first implementation involved some java script code that change the order of the elements and an event listener that was activated when the screen had a specific size. The open library team quickly informed me that this implementation was causing a delay in the loading of the page, and they recommended me to use HTML and CSS. This time with a new implementation and the help of Jim Champ who was reviewing my pull request I managed to solve the issue using an HTML file that included only the title summary and some CSS commands.

Book header in mobile before and after

Dynamic book list preview for sharing

This new feature was a little more challenging than the previous one. This time I collaborated with Vassilis Bubis in order to create a dynamic preview for the book lists of the users that displays the first 5 books of the list. Then this preview is passed to the twitter social card and every time a user wants to share a list with the URL the preview image appears. This is a more interactive way for users to show their book lists to others and makes open library more recognizable across twitter users. The first challenge was to create a mock-up of the preview. To achieve that I used a design tool called Figma to create prototypes with different colour combinations and I let the open library team to decide which on they like more. For the design I used colours from the open library’s webpage, and I added a twist in the preview that represents a self where the books are placed. Alongside with the mock-ups Vassilis worked on retrieving the book covers that we need and place them above the background with the help of a Python library called Pillow. Then I stepped in, and I made sure that every book cover was resized in a way proportional to the original dimensions that it had. We noticed that some covers were stretching so it was important that every time we changed the width of a cover the height was adjusted properly. Another challenge was the text that we wanted to add in the preview. The text had to change dynamically, and we had to change line every time the characters exceeded a specific number to achieve an aesthetically pleasant result. One issue that we faced was that the coordinates of the covers that we had figured out with Figma had to change because in python the coordinates are applied from upper left corner compared to Figma that apply to the center of an image. After solving that Vassilis and I proceeded on storing the image in an in-memory binary array for better performance and finally creating the API for the list page.

Twitter social card for book lists sharing

Book page editing improvement

While working on some issues in the book page I realised that compared to other library webpages open library gives users the ability to edit the details and the information of a book. That feature is very valuable because users can add important details for a book that were missing when it was added, they can update that information and they can add descriptions and subjects that might be useful for other users. Although this feature is really important the editing user interface is not that pleasant. When users click on the edit button, they are directed to another page. My recommendation regarding that is to use a modal that pops up when the button is clicked. In that way users will feel like they have more control because they won’t be directed to another page, and they can still see the book page behind the modal. Another issue with the existing editing form is that users can discard the changes with the cancel button, but they can’t undo a change without deleting all the changes. In the mock-up that I created I added an arrow in the right upper side that symbolizes the undo action. I noticed that the examples for every field were placed next to the field title, and I opted to move them inside the text box for a clearer look. Finally, I added the info symbols beside every field that provides details on how you should fill out that specific field. Overall, the purpose of those recommendations is to make the booking editing more simple, compact and user friendly.

Book editing page now
Edit book page with modal created with Figma

2021 End-of-Year Community Updates

Hi Open Library Community! This is going to be a less formal post detailing some of our recent community meetings and exciting Q3 (quarter 3) opportunities to learn, celebrate, and participate with the Open Library project.

Earlier this Month

Upcoming Events

  1. 📙 Library Leader’s Forum 2021-10-13 & 2021-10-20
  2. 🎉 Open Library Community Celebration (RSVP) 2021-10-26
  3. 📅 2022 Roadmap Community Planning (join) 2021-11-02 @ 10am PT

Open Library Community Celebration 2021

Last year we started the tradition of doing an Open Library Community Celebration to honor the contributions & impact of those in our community. On October 26, 2021 @ 10am Pacific we will be hosting our 2nd annual community celebration. We hope you can join us!

During this online event, you’ll hear from members of the community as we:

  • Announce our latest developments and their impacts
  • Raise awareness about opportunities to participate
  • Show a sneak-peek into our future: 2022

EDIT: The Community Celebration happened and you may watch it here!


5-Year Vision

End of September on 2021-09-28 @ 10am PT, the Open Library community came together to brainstorm Open Library’s possible long-term directions. Anyone in the community is welcome to comment and add their notes and thoughts:

https://docs.google.com/document/d/1q_jAcdEc705H3gsZv_Yt_08c8YFmefdSvMbiljc2O8g/edit#


2021 Year-End Review

On 2021-10-12 @ 10am PT the community met to review what we had accomplished (see review doc) on our 2021 roadmap.


2022 Community Planning

First week of November on 2021-11-02 @ 10am PT the community will meet to brainstorm goals for Open Library’s 2022 roadmap. This community planning call will be open to the public here.

EDIT: Community Planning happened and you can see the results or leave comments using the 2022 roadmap link.

EDIT: Our 2022 Priorities can now be seen here: https://docs.google.com/document/d/1edU3lCTHAjFr1mXUilh8l1_rNek33pRTzFHltBU–fM/edit#

Extending a Warm Welcome to New Patrons

By Sabreen Parveen with Ray Berger & Mek

A Forward from the Mentors

For book lovers who use openlibrary.org every day, it may be easy to forget what it felt like to visit the website for the first time. Some features which some were able to learn the hard way — through trial and error — may not be as easy or intuitive for others to understand. We feel like we’ve failed each time a patron leaves the library, frustrated, and before even having the chance to understand the value it may provide to them.

At Open Library, we strive to design a service which is accessible and easy for anyone to use and understand. We understand that everyone has different experiences and usability needs. Our mission is to make books as accessible and useful to the public as possible, and we’re unable to do this if patrons aren’t given the opportunity and resources to learn how our services work.

After polling dozens of patrons on video calls and through surveys, we started to get a good idea about which aspects of the website are most confusing to new patrons. The most common question was, “what is Open Library and what does it let you do?”. We tried to search for a clear explanation on our homepage, but there wasn’t one — just rows of books we assumed patrons would click on and somehow understand how it all worked. We also received useful questions concerning which books on Open Library are readable, borrowable, or what is meant when a book shows as unavailable or not in the library. We also received questions about how the Reading Log works. We decided to address some of these frequently asked questions at the earliest possible entry point: on our home page with a new Onboarding Carousel. Leading this project was 2021 Open Library Fellow, Sabreen, with the mentorship of Ray & Mek. We’re so excited and proud to showcase Sabreen’s hard work to you!

Designing a Simple-to-use Onboarding Experience

By Sabreen Parveen

This summer I got this amazing opportunity to work with the Internet Archive as an Open Library Fellow where I contributed to the Onboarding Project.

My Journey with Open Library

I decided to join the Open Library community in 2020 because I was interested in contributing to an open source project and improving my abilities as a programmer and designer. Several things about Open Library stuck out to me while I was browsing projects on github. Firstly, I had the knowledge of the languages and frameworks it used. Secondly the documentation was very clear and easy to understand. Thirdly, the issue tracker contained many exciting ways for me to help. Most importantly the project had an active community and hosted calls every week where I could work with others and ask questions. Once I had familiarized myself with the project, I joined Open Library’s public gitter chatroom and asked questions about getting started. Shortly after, I attended my first community call, received a Slack invite, and later that week submitted my first contribution! I have joined almost all the community calls since. Gradually I started solving more and more issues, many of them related to web accessibility and SEO. I also started creating graphics for Open Library’s “monthly reads” pages. The community must have been excited about my contributions, because this year I was invited to be a 2021 Open Library Fellow and to team up with a mentor to lead a flexible, high-impact project to completion.

Selecting a Project: Onboarding Flow

The project I chose for my 2021 Open Library Fellowship was to add a new user onboarding experience to Openlibrary.org homepage to help new patrons get an overview of the website and how to use its features.

The problem

First time visitors to OpenLibrary.org often report getting confused because they don’t know how to use the service. We had several indicators this was the case:

  • From my own experience, I had been confused when I first started using the website. I didn’t know what the “Want to read button” does? I came to know about the list feature while solving an issue.
  • Bounce Rate: Open Library has a fairly high bounce rate, which is a measure of percentage of people who visit a website and leave without continuing to the other pages. We wondered if this is because patrons were confused about how to use the website and so we wanted to test this.
  • Feedback: We received this feedback from patrons emailing us about their experience

So by adding onboarding flow many of the users will get an insight of what the website actually does.

Implementation

While designing user onboarding, we wanted to create a system that was interactive, contextual, and easy to use and understand. As a result, we decided to start by adding an onboarding carousel to the homepage, the most common place patrons would land on when visiting the website for the first time. We designed the carousel to feature five cards: Read Free Library Books Online, Keep Track of Your Favourite Books, Try the virtual library explorer, Be an Open Librarian and Feedback form to receive feedback from the visitors. 

We  decided on a carousel as the format because they’re

  • non-interruptive.
  • persistent, unlike other onboarding design patterns that only show up upon signup and are never seen again.
  • easy to explore.

When clicked, each card redirects patrons to a FAQs page. In an upcoming version, the “keep track of your favourite books” card will instead trigger an onboarding modal with a step-by-step tutorial containing several slides explaining how we can add a book to our reading log, create a new list and view your reading log. Each feature is explained using a GIF, which is short and descriptive. You can close the modal at any step and any time. The modal creation was a long process of discussions and feedback, but finally we came up with a simple and attractive modal.

During implementation we kept following things in our mind:

  • The icons for the home page cards. Their resemblance with the text.
  • Eye catchy and easy to understand captions
  • Links the card will redirect people to (currently FAQs page)
  • GIFs should be contextual.
  • Modal design should be such that the main focus should be on the GIF and not the modal itself. Also easy navigation between the slides was necessary.

Design Process

To make this project successful, we had weekly meetings and discussions in the community channel to get everyone’s opinion. Designs were mocked up using Figma. I also had the chance to present my ideas before the Internet Archive’s product team. We used feedback from these meetings to review our previous decisions, our progress, and inform next steps. 

Results

  • Alexa: The bounce rate is now reduced to 38.2%.
  • Google Analytics: More than 5000 engagements with these cards.
  • Infrastructure to continue building from which we can re-use in other situations. 

Next Steps

  • Doodles to bring more character to the homepage cards
  • Include pop-up tutorials for more of the cards (other than just Reading Log + Lists)
  • Ability to hide / show the carousel (for patrons who have already received the information) 

My experience

I had a pretty good time working with experienced mentors Mek and Raymond Berger. They were very supportive during the entire program. Sometimes we spent our meeting time finding solutions to some problems together. Additionally, I learned more about project management and clarifying a plan by breaking issues into manageable steps. I got to spend time learning about new industry tools like Figma, which we used for presenting designs and Google Analytics for tracking key metrics. I also gained a deeper understanding of user experience. I learned to design by thinking as a patron of Open Library, what would she or he want? Will it be useful or easy to understand? I appreciated the flexibility of the Open Library Fellowship program, there was no pressure on me so that I could focus on my studies also. We tried to have clear next steps and homeworks at the end of each of our calls. The calls helped clarify what we were hoping to accomplish and provided direction and feedback. Finally, having the community available for regular feedback was really useful for tuning our designs.

About the OpenLibrary Fellowship Program

The Internet Archive’s Open Library Fellowship is a flexible, self-designed independent study which pairs volunteers with mentors to lead develop of a high impact feature for OpenLibrary.org. Most fellowship programs last one to two months and are flexible, according to the availability of contributors. We typically choose fellows based on their exemplary and active participation, conduct, and performance working within the Open Library community. The Open Library staff typically only accepts 1 or 2 fellows at a time to ensure participants receive plenty of support and mentor time. If you’re interested in volunteering as an Open Library Fellow and receiving mentorship, you can apply using this form or email openlibrary@archive.org for more information.

GSoC 2021: Making Books Lendable with the Open Book Genome Project

By: Nolan Windham & Mek

I’m Nolan Windham, an incoming freshman at Claremont McKenna College. This summer I participated in my first Google Summer of Code with the Internet Archive. I’ll be sharing the achievements I’ve made with the Open Book Genome Project sequencer, an open source tool which extracts structured data from the contents of the Internet Archive’s massive digitized book collection.

The purpose of the Open Book Genome Project to create “A Literary Fingerprint for Every Book” using the Internet Archive’s 5 million book digital library. A book’s fingerprint currently consists of 1gram (single word) and 2gram (two word) term frequency, Flesch–Kincaid readability level, referenced URLs, and ISBNs found within the book.

Try it out!

Anyone can try running the OBGP Sequencer on an Internet Archive open access book using the new OBGP Sequencer™ Google Colab Notebook. This interactive notebook runs directly within the browser, no installation required. If you have any questions, please email us.

If you are interested in seeing the source code or contributing check out the GitHub. If this project sounds fascinating to you and you’d like to learn more or keep the project going, please talk to us!

How I got involved

I first found the Internet Archive in high school where I used the Wayback Machine for research and Open Library for borrowing books. As I found out more about the Archive’s services and history, I became more and more interested in its operation and its mission: to provide “Universal Access to All Knowledge”. Once I heard this mission, I was hooked and knew I wanted to help. During a school trip to San Francisco, I joined one the Archive’s Friday physical tours (which I highly recommend). The tour guide was impressed with the amount of information this high-schooler knew about the Archive’s operation and took me aside after the tour and showed me Book Reader’s read aloud feature and answered some questions about the book derive process. The tour guide then invited me to join the Open Library community chat where developers, librarians, and patrons discuss all things Open Library. This tour guide turns out to be Mek, my project mentor, Open Library Program Lead, and Citizen of The World.

I started attending the weekly Open Library community calls to learn more about how Open Library works, the issues the project faced, and how I could help. After months of showing up to calls, learning about open source, and developing my programming skills, Mek showed me an interesting prototype called the Open Book Genome Project.

Background

The Open Book Genome Project (OBGP) is a public good, community-run effort whose mission is to produce, “open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.” It was based on a previous effort led by a group in 2003 called the Book Genome Project, to “identify, track, measure, and study the multitude of features that make up a book.” Think of it as Pandora’s Music Genome Project but for books. Apple acquired and discontinued the Book Genome Project in 2014, leaving a gap in the book ecosystem which the Open Book Genome Project community now hopes to help fill for the public benefit.

The Open Book Genome Project is one of many efforts facilitated by members of the Internet Archive’s Open Library community. Their flagship service, OpenLibrary.org is a non-profit, open-source, public online library catalog founded by the late Aaron Swartz, which allows book lovers around the world to access millions of the Internet Archive’s digital books using Controlled Digital Lending (CDL). Open Library hopes the Open Book Genome Project may help patrons discover and learn more about books in some of the ways the Book Genome Project originally aimed to accomplish.

You can learn more about the history of the Open Book Genome Project in an upcoming blog post. You can also learn more about the other half of the Open Book Genome Project called Community Reviews in this blog post.

Here’s where we started

When I began working on the OBGP Sequencer, the general code structure and a few features were in place. The sequencer could extract a book’s N-gram term frequency and identify its copyright page number. There were many features in the product development pipeline, but no one dedicated to implement them. Over the past few months, I led development to add and improve the Sequencer’s functionality, created an automated pipeline to process books in volume, and deployed this pipeline to production on the Archive’s corpus of books.

One challenging part of the development process was getting ISBN extraction working accurately. The ISBN extractor works by first finding what it thinks is the book’s copyright page and then checking for a valid ISBN checksum in every number sequence. Although this approach works, there are often a lot of strange edge cases usually having to do with poor optical character recognition. To address this, I was  manually spot checking books for ISBN’s that were detected and missed, and investigating why to iteratively improve the extraction process. Here is a screenshot of my process.

Another challenge later on in the development process was getting books processed at scale. With a collection as large as the Archive’s, parallelization of processing is an essential component of scaling the sequencer up. I taught myself to use some of Python’s parallelization libraries and implemented them. Another challenge was getting parallelization working with the database. I addressed this by making the file system and directory layout database because modern file systems are built to work well with parallel I/O.

Here’s what we were able to accomplish with OBGP

  1. Make more books borrowable to patrons
  2. Add reading levels for thousands of books
  3. Identify & save urls found within books
  4. Produce a large public dataset of book insights

Making Books Lendable

Nearly 200,000 books digitized by the Internet Archive were missing key metadata like ISBN. The ISBN is used to look up all sorts of book information which is helpful for determining whether a book is eligible for the Internet Archive’s lending program. The absence of this key information was thus preventing tens of thousands of eligible books from.

As of writing, the Open Book Genome Project sequencer has extracted ISBN’s for 25,705 books that were previously unknown. 12,700 of those are newly lendable to patrons. Take a look at them here!

These books now have identifying information and are linked to Open Library Records. Open Library pages that  had no books available now have borrowable books. Here is a before and after screenshot.

Before

After

Adding reading levels

It’s often difficult to identify age-appropriate materials for students and children. By adding reading level information to Internet Archive’s book catalog, we’re able to make age-appropriate books more accessible.

The Sequencer now performs a Flesch–Kincaid readability test on each book on which it is run. This resulting Flesch–Kincaid grade level estimation allows students, parents, and teachers to filter their searches for books which include appropriate reading levels.

Preserving URLs

Open Library is aware of more than 1M books containing urls. These mentions by credible authors are like a vote of confidence of their relevance and usefulness. These websites are at risk of link rot and without preservation could be lost forever. But given the average webpage only lasts 100 days, it’s only a matter of time before millions of URLs found in millions of books will be preserved for future generations.

As of writing, URLs have been successfully extracted from more than 13,000 books, which will soon be preserved on the Wayback Machine. Many of the high quality references found in published books have not yet been preserved and now will be.

Producing public datasets

The original goal of OBGP was to produce an open, public data set of book insights capable of powering the open web. As of writing, the Open Book Genome Project sequencer has uploaded genomes for 180,642 books. For every book sequenced, a book genome is made publicly accessible that provides insights into the book without needing to borrow it. The goal of this is to increase the quantity and quality of publicly available descriptive information available for every book, so that readers and researchers can make better informed decisions and glean deeper insights about books. This supports readers, researchers, book sellers, libraries, and beyond.

Personal Development

I really enjoyed participating in GSoC with the Internet Archive because I was able to build programming foundations and gain industry experience that will prove invaluable in my future. I developed my project management skills, became more comfortable programming in Python and using new software libraries, and advanced my knowledge of dev-ops tools like Docker.

The future of the project

If you may be interested in contributing or learning more about the Open Book Genome Project sequencer, please send us an email.

Although we made a lot of progress with this projects development, there is still a lot more to be done. Here is a quick list of possible future features to get you excited about the possibilities of this project:

  • Make URL’s clickable
  • Identifying meaningful semantic elements in books, like Entities and Citations
  • increase the number of previewed pages & volume of previewable content.
  • Clickable Chapters in Table of Contents
  • Library of Congress Catalog Number extraction
  • Copyright information (Publisher, copyright date) extraction
  • Book and chapter Summarization and Topic Classification