Category Archives: Google Summer of Code (GSoC)

Google Summer of Code 2023: Supercharging Subject Pages

Hello, I am Jayden Teoh, a student from Singapore, and this year I participated as a 2023 Google Summer of Code contributor with the Internet Archive’s Open Library project to improve the site’s performance and supercharge subject pages.

If you are an Open Library patron, you have likely encountered times where certain pages seem to take and eternity to load. The Open Library team understands the importance of a smooth browsing experience and empathizes with how degraded site performance affects patrons. This is why we prioritized site performance as a key focus for our 2023 GSoC roadmap. As strongly as we felt about improving the core performance of the current website, we also wanted to push the boundaries of Open Library’s capabilities by releasing community-powered subject pages we hope will help patrons more easily showcase and discover books they’ll love. I’m excited to share more about what we accomplished and next steps in our plans.

Improving Site Performance

According to Browserstack,”40% of visitors will leave a website if it takes longer than three seconds to load”. But how do we measure which pages are slow or fast? How do we determine if a slow load time is an anomaly or a systemic pattern? Do we care about improving the average load time for a page or eliminating the most egregious case where pages load especially slowly?

Identifying site performance issues can be a challenging task. In order to effectively address this issue, Mek, a GSoC mentor for the project, suggested the use of performance tracking tools such as Sentry, as well as considering Google’s Core Web Vitals metrics, using Google’s PageSpeed Insights (PSI) reports and running Lighthouse audits.

Sentry, a visual dashboard often used for error monitoring, has a “Performance” mode we were able to use to identify and rank pages according to metrics called P50 and P95 — the upper bound number of seconds at which 50% (P50) and 5% (P95) of transaction took to complete. For example, a P95 score of 5 seconds tells us that 95% of such requests completed within 5 seconds (and perhaps 5% were slower). Once we ranked pages in consideration of these metric, it became clearer just how bad certain pages could be in worst case scenarios. We coupled this information with our own domain expertise about which pages are most important to the average patron’s experience and then embarked on a journey with the aspiration of reducing the average load time of key pages by at least half.

For each row in Sentry’s performance dashboard, one can “drill in” to the page to see stack tracebacks and detailed breakdowns about which functions were participating most to the slow response.

Our research revealed 2 opportunities:

  1. The “Search Inside” page was taking more than 11 seconds and an average of more than 2 seconds because the response was making redundant archive.org metadata request on each search result match on the page to determine each book’s availability, rather than computing the availability of all the books in a single request.
  2. Several of the slow pages had a common slow component — the LoanStatus borrow button — which we could speed up by caching and thus “feed two birds with one scone”.

By the end of the 12 weeks of this program, we manage to reduce the load times of several key pages significantly. One of my proudest achievements was the reduction of the ‘search/inside’ page by over 500%. This feature is important to patrons because it allows them to search for content within books, rather than just searching based on the author and title so I am glad we were able to make this feature faster and thus more accessible.

Editor’s note: We are still collecting metrics and plan to add before-and-after graphs of the search inside page speeds. Our changes to the borrow button are in the process of being staged and tested and we’re excited to update this blog post with metrics in the future. Hopefully you have noticed the improvements since it was launched a few weeks ago!

Unleashing the Power of Subject Tags

Empowering Librarians and Expanding Book Categorization at Open Library

For almost a decade, the Open Library has had basic subject pages that give readers a way to browse or search for books on a given topic, see books with similar subjects, and discover prolific authors of a genre. It may surprise you to learn that the whole page experience is generated based on the name of the subject. For instance, when one visits the “Magic” subject page, one may notice a carousel of books that is populated using a query based on its name: “subject:magic“. This approach gives us a simple formula for creating millions of subject pages on-the-fly, but it also has significant shortcomings.

Namely, subject pages are incapable of storing additional metadata about a given subject and the current subject pages is limited to showcasing a single carousel of books. If the subject is overly vague, like “textbooks“, the reader may often not be shown a useful set of books and there’s no affordance provided that helps the reader narrow their search further, e.g. to design textbooks. If we search for a subject called “design textbooks“, we are informed no matching subjects exist. However, if we do an intersecting search for books that are subject:textbooks AND subject:”industrial design”, there are a few interesting results! The problem is, there’s currently no mechanism which allows librarians to extend Open Library subjects and specify which book collections should show up.

My primary objective through GSoC was to give librarians the ability to enrich and edit any subject page on Open Library so each page may be as beautiful and thoughtfully curated as a library or bookstore showcase. Our solution was to give librarians the ability to create a new “Tag” document for any subject page and load it with custom logic to extend how that subject page should be rendered. Tags serve as a catalyst for librarians to provide more precise categorizations within broad subjects. By leveraging Subject tags, librarians can dive deeper into specific areas of interest, allowing readers to discover a rich array of sub-subjects. For instance, librarians might choose to add new rules into the Tag document for the Cooking subject featuring carousels for vegan and budget cooking, in order to make it more useful for readers. This granularity opens up a world of possibilities, enabling readers to explore their preferred niches and discover hidden gems within subjects they cherish. Just like how a physical library may rotate their bookshelves with new categories every month, Subject Tags grant librarians more freedom to curate interesting subject topics that may suit patrons, allowing for a more personal and humane touch to the book discovery process.

By now, I hope you are able to understand just how pertinent Subject Tags will be to our Open Library and why it is a privilege for me to be working on such an important feature. Although the idea is clear, the implementation certainly is not. Open Library’s database is built using our own niche and complex Wiki engine called Infogami. To create a new class of data, we would have to create a new Infogami type. Here’s the catch: there has not been a new Infogami type created in the last 13 years and there is no existing documentation for doing so. Navigating any new code architecture can be a tedious task for any programmer and now I had to miraculously work with an arcane technology that no one knows how to use? What could go wrong?

Thankfully, I had the support of a wonderful community and amazing mentors like Mek, Jim, and Drini. They provided me with a lot of guidance throughout the process of reverse engineering the creation of an Infogami type. And after months of work, I was able to successfully incorporate a new Subject Tags Infogami type into the Open Library architecture. Especially since Open Library is an open-source project, I decided to write a tutorial and document the unintuitive technical aspects of implementing a new Infogami type, as a gift to help future developers who may wish to extend the functionality of the platform in similar ways. The tutorial can be found here.

Now, let me show you the power of Subject Tags and how they can be used to enrich the Open Library’s Subject pages. Let’s use the ‘Magic’ subject page as an example. This is how it looks right now.

As you can see, currently the subject page is plain with no description about what the subject is about. That’s not very informative is it? Prior to Subject Tags, we are unable to store more information about subjects because they are just strings with no capabilities to store other metadata. However, now with the Subject Tags, we can do that! Let me show you how. First, let’s add a new Subject Tag into the Open Library for the ‘Magic’ subject.

The Subject Tag creation form allows us to store metadata about the ‘Magic’ subject, including its description. After we’ve created the Subject Tag, let’s head back to the ‘Magic’ subject page. Tada, we can now see the newly added description in the subject page.

You are probably still not convinced of the utility of Subject Tags. Let me give you a deeper glimpse into the realm of possibilities that Subject Tags offer. Currently on the Magic page, we are only able to display a carousel with books that have the subject ‘Magic’. What if we want to include a carousel displaying books about ‘Magic Tricks for Kids’? Well, with Subject Tags, now we can! As a librarian, we can edit the ‘Magic’ Subject Tag and use the experimental Plugin to define a new carousel. Right now, the interface is quite advanced, is still being prototyped, and is intended for expert librarians who know how to compose queries, but in the future we aim to make it easy for any librarian to extend the functionality of subject pages using Tags.

Plugins allow subject pages to load custom templates within our system and utilizes them to enrich the subject page. For example, in the Plugins field of the Subject Tag edit form above, we included a new QueryCarousel Plugin that allows the ‘Magic’ subject page to search for all books with the “magic tricks juvenile literature” subject and display them in a template carousel. Let’s take a look at the ‘Magic’ subject page again. 

Fascinating isn’t it? Subject Tags have enabled the enhancement of the previously one-dimensional subject pages. Through Subject Tags, librarians are now equipped to curate and display information that can enrich the book discovery experience of patrons.

What happens when librarians want to add a new carousel of books to a subject page but the books haven’t been labeled with subjects? When we developed the Tag feature, adding a subject to books had to be done one book at a time. To aid librarians in the process of curating books and subjects, I also implemented a Bulk Tagging tool that enables librarians to add subjects to multiple books simultaneously. 

Subject Tags are still in beta so we can time our time understanding the needs of our patrons and the librarians who will use these new tools. As a next step, we have decided to do research on where this feature can have the most impact and will focus our efforts on enriching a small handful of specific subject pages using Tags. One subject we’re excited to prototype with is ‘Cooking’. The team has been curating the best cooking-related information to showcase using Subject Tags and testing new features to launch alongside Subject Tags. Here is a mockup by Roya, a fellow in our design community, showing one possible UI we have in mind:

When Subject Tags are launched, we hope you can visit the ‘Cooking’ page and provide us with input on what we can improve on and what you would like to explore in a Subject page. With your feedback, librarians will have a better understanding on how to enhance your book exploration process with personally curated topics. Moving forward, we will utilize Subject Tags to enrich other subject pages on the Open Library and slowly phase out the mundane subject pages we have currently.

Ending note

Thank you to the incredible Open Library community for their unwavering support over these past months. A special shout-out goes to Mek, whose mentorship has been nothing short of exceptional. Not only has Mek dedicatedly guided me through the program, but also gone the extra mile to make sure I’ve had the most enriching learning journey. Lastly, my deepest thanks to the Internet Archive and Google Summer of Code for making it possible for me to be a part of this life-changing experience. This is an experience that I’ll never forget.

Internet Archive Summer of Design 2022

Forward by Mek

For several years and with much gratitude, the Internet Archive has participated in Google’s Summer of Code (GSoC). GSoC is a program run by Google that supports select open source projects, like the Open Library, by generously funding students to intern with them for a summer. Participation in GSoC is as selective of organizations as it is for students and so in years when GSoC is not available to the Internet Archive, we try to fund our own Internet Archive Summer of Code (IASoC) paid fellowship opportunity.

GSoC and IASoC are traditionally limited to software engineering candidates which has meant that engineering contributions on Open Library have often outpaced its design. This year, to help us take steps towards righting this balance, an exceedingly generous donor (who wishes to remain anonymous but who is no less greatly appreciated) funded our first ever Internet Archive Summer of Design fellowship which was awarded to Hayoon Choi, a senior design student at CMU. In this post, we’re so excited to introduce you to Hayoon, showcase the impact she’s made with the Open Library team through her design work this summer, and show how her contributions are helping lay the groundwork to enable future designers to make impact on the Open Library project!

Introducing Hayoon Choi

Profile photo for hayoonc

Hello, my name is Hayoon Choi and this summer I worked as a UX designer with Open Library as part of the Internet Archive Summer of Code & Design fellowship program. I am a senior attending Carnegie Mellon University, majoring in Communication Design and minoring in HCI. I’m interested in learning more about creative storytelling and finding ways to incorporate motion design and interactions into digital designs. 

Problem

When I first joined the Open Library team, the team was facing three design challenges:

  1. There was no precedent or environment for rapidly prototyping designs
  2. There wasn’t a living design system, just an outdated & static design pattern library
  3. The website didn’t display well on mobile devices, which represents and important contingency of patrons.

Approach

In order to solve these challenges, I was asked to lead two important tasks:

  1. Create a digital mockup of the existing book page (deskop and mobile) to enable rapid prototyping
  2. A propose a redesign of the book page optimized for mobile.

To achieve the first task, I studied the current design of the Open Library Book Page and prototyped the current layout for both mobile and desktop using Figma. In the process, I made sure every element of that Figma file is easily editable so that in the future, designers and developers can explore with the design without having to code.

For the second task, we first scoped our work by setting our focus to be the set of content which appears above the fold — that is, the content which first loads and appears within the limited viewport of a mobile device. We wanted to make sure that when the page initially loads, our patrons are satisfied with the experience they receive.

Even before conducting interviews with patrons, there were easily identifiable design issues with the current mobile presentation:

  • Information hierarchy: some texts were too big; certain information took up too much space; placement of the book information were hard to discover
  • Not mobile friendly: Some images were shown too small on images; it was hard to scroll through the related books; one feature included hovering, which is not available on mobile devices

To address these concerns, I worked with the Open Library community to receive feedback and designed dozens of iterations of the mobile book page using Figma. Based on feedback I learned about the most necessary information to be presented above-the-fold, I choose to experiment with 6 elements:

  1. The primary Call To Action (CTA) buttons: how do I make them more highlighted?
  2. The Navigation Bar: which placement and styling are most convenient and effective?
  3. The Editions Table: how might we make it easier for patrons to discover which other book editions and languages may be available?
  4. Ratings & reviews: how do I encourage users to rate more and help them understand the book effectively with the review system?
  5. Sharing: how do I make it easier for users to share the book?
  6. The Information Hierarchy: how can we reorder content to better meet the diverse needs of our audience?

From these questions and feedback from the Open Library team, I was able to settle on five designs which seemed like effective possibilities for showcasing differences in book cover size, sharing buttons, information display, and rating and reviewing system which we wanted to test:

User Interviews & Mazes

With these five designs selected, I planned on running multivariate user testings to get feedback from actual users and to understand how I can more effectively make improvements to the design. 

I believed that I would gather more participants if the user testing was done remotely since it would put less pressure on them. However, I wasn’t sure how I would do this until I discovered a tool called Maze.

Maze provides a way for patrons to interact with Figma mockups, complete certain tasks, answer questions, and leave feedback. While this is happening, Maze can record video sessions, keep track of where patrons are clicking, and provide valuable data about success rates on different tasks. I felt this service could be extremely useful and fitting for this project; therefore I went ahead and introduced Maze to the Open Library’s team. Thanks to a generous 3-month free partner coupon offered by Maze, I was able to create six Maze projects — one for each of our five new designs, as well as our current design as a control for our experiment. Each of these six links were connected to a banner that appeared across the Open Library website for a week. Each time the website was reloaded, the banner randomized the presented link so participants would be evenly distributed among the six Maze projects.

Although the Maze projects showed patrons different mobile screens, they enabled comparisons of functionality by asking patrons to answer the same pool of 7 questions and tasks:

  1. What was the color of the borrow button (after showing them the screen for five seconds)
  2. What key information is missing from this screen (while showing the above-the-fold screen)
  3. Share and rate this book
  4. Borrow the Spanish edition for this book
  5. Try to open a Spanish edition
  6. Review this book
  7. Try to open the preview of this book

In between these tasks, the participants were asked to rate how challenging these tasks were and to write their feelings or opinions.

In addition to Maze, which we hoped would help us scale our survey to reach a high volume of diverse participants, we also conducted two digital person-to-person user interviews over Zoom to get more in depth understanding about how patrons approach challenging tasks. Because Maze can only encode flows we program directly, these “in person” interviews gave us the ability to intervene and learn more when patrons became confused.

Results & Findings

After around a week of releasing the Maze links on the website, we were able to get a total of 760 participants providing feedback on our existing and proposed designs. Maze provided us with useful synthesis about how long it took participants to complete tasks and showed a heat map of where patrons were clicking (correctly or incorrectly) on their screens. These features were helpful when evaluating which designs would better serve our patrons. Here’s a list of findings I gathered from Maze:

The Sharing Feature:

Results suggest that the V1 design was most clear to patrons for the task of sharing the book. It was surprising to learn patrons, on average, spent the most time completing the task on this same design. Some patrons provided feedback which challenged our initial expectations about what they wanted to accomplish, reporting that they were opposed to sharing a book or that their preferred social network was not included in the list of options.

Giving a book a Star Rating:

One common reaction for all designs was that people expected that clicking on the book’s star ratings summary would take them to a screen where they could rate the book. It was surprising and revealing to learn that many patrons didn’t know how to rate books on our current book page design!

Leaving a Community Review

When participants were asked to leave a community review, some scrolled all the way down the screen instead of using the review navigation link which was placed above the fold. In design V4, using a Tag 🏷️ icon for a review button confused many people who didn’t understand the relationship between book reviews and topic tags. In addition, the designs which tested combining community review tags and star ratings under a single “review” button were not effective at supporting patrons in the tasks of rating or reviewing books.

Borrowing Other Editions

Many of our new designs featured a new Read button with a not-yet-implemented drop down button. While it was not our intention, we found many people clicked the unimplemented borrow drop down with the expectation that this would let them switch between other available book editions, such as those in different languages. This task also taught us that a book page navigation bar at the top of the design was most effective at supporting patrons through this task. However, after successfully clicking the correct navigation button, patrons had a difficult time using the provided experience to find an borrow a Spanish edition within the editions table. Some patrons expected more obvious visual cues or a filtering system to more easily distinguish between available editions in different languages.

Synthesis

By synthesizing feedback across internal stakeholders, user interviews, and results from our six mazes, we arrived at a design proposal which provides patrons with several advantages over today’s existing design:

  • First and foremost, redesigned navigation at the very top of the book page
  • A prominent title & author section which showcases the book’s star ratings and invites the patron to share the book.
  • A large, clear book cover to orient patrons.
  • An actionable section which features a primary call to action of “Borrow”, a “Preview” link, and a visually de-emphasized “Want to Read” button. Tertiary options are provided for reviewing the book and jotting notes.
  • Below the fold, proposals for a re-designed experience for leaving reviews and browsing other editions.
(Before)
(After)

Reflections

I had a great time working with Open Library and learning more about the UX field. I enjoyed the process of identifying problems, iterating, and familiarizing myself with new tools. Throughout my fellowship, I got great feedback and support from everyone from the team, especially my mentor Mek. He helped me plan an efficient schedule while creating a comfortable working environment. Overall, I truly enjoyed my working experience here and I hope my design works will get to help patrons in the future!

About the Open Library Fellowship Program

The Internet Archive’s Open Library Fellowship is a flexible, self-designed independent study which pairs volunteers with mentors to lead development of a high impact feature for OpenLibrary.org. Most fellowship programs last one to two months and are flexible, according to the preferences of contributors and availability of mentors. We typically choose fellows based on their exemplary and active participation, conduct, and performance within the Open Library community. The Open Library staff typically only accepts 1 or 2 fellows at a time to ensure participants receive plenty of support and mentor time. Occasionally, funding for fellowships is made possible through Google Summer of Code or Internet Archive Summer of Code & Design. If you’re interested in contributing as an Open Library Fellow and receiving mentorship, you can apply using this form or email openlibrary@archive.org for more information.

GSoC 2021: Making Books Lendable with the Open Book Genome Project

By: Nolan Windham & Mek

I’m Nolan Windham, an incoming freshman at Claremont McKenna College. This summer I participated in my first Google Summer of Code with the Internet Archive. I’ll be sharing the achievements I’ve made with the Open Book Genome Project sequencer, an open source tool which extracts structured data from the contents of the Internet Archive’s massive digitized book collection.

The purpose of the Open Book Genome Project to create “A Literary Fingerprint for Every Book” using the Internet Archive’s 5 million book digital library. A book’s fingerprint currently consists of 1gram (single word) and 2gram (two word) term frequency, Flesch–Kincaid readability level, referenced URLs, and ISBNs found within the book.

Try it out!

Anyone can try running the OBGP Sequencer on an Internet Archive open access book using the new OBGP Sequencer™ Google Colab Notebook. This interactive notebook runs directly within the browser, no installation required. If you have any questions, please email us.

If you are interested in seeing the source code or contributing check out the GitHub. If this project sounds fascinating to you and you’d like to learn more or keep the project going, please talk to us!

How I got involved

I first found the Internet Archive in high school where I used the Wayback Machine for research and Open Library for borrowing books. As I found out more about the Archive’s services and history, I became more and more interested in its operation and its mission: to provide “Universal Access to All Knowledge”. Once I heard this mission, I was hooked and knew I wanted to help. During a school trip to San Francisco, I joined one the Archive’s Friday physical tours (which I highly recommend). The tour guide was impressed with the amount of information this high-schooler knew about the Archive’s operation and took me aside after the tour and showed me Book Reader’s read aloud feature and answered some questions about the book derive process. The tour guide then invited me to join the Open Library community chat where developers, librarians, and patrons discuss all things Open Library. This tour guide turns out to be Mek, my project mentor, Open Library Program Lead, and Citizen of The World.

I started attending the weekly Open Library community calls to learn more about how Open Library works, the issues the project faced, and how I could help. After months of showing up to calls, learning about open source, and developing my programming skills, Mek showed me an interesting prototype called the Open Book Genome Project.

Background

The Open Book Genome Project (OBGP) is a public good, community-run effort whose mission is to produce, “open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.” It was based on a previous effort led by a group in 2003 called the Book Genome Project, to “identify, track, measure, and study the multitude of features that make up a book.” Think of it as Pandora’s Music Genome Project but for books. Apple acquired and discontinued the Book Genome Project in 2014, leaving a gap in the book ecosystem which the Open Book Genome Project community now hopes to help fill for the public benefit.

The Open Book Genome Project is one of many efforts facilitated by members of the Internet Archive’s Open Library community. Their flagship service, OpenLibrary.org is a non-profit, open-source, public online library catalog founded by the late Aaron Swartz, which allows book lovers around the world to access millions of the Internet Archive’s digital books using Controlled Digital Lending (CDL). Open Library hopes the Open Book Genome Project may help patrons discover and learn more about books in some of the ways the Book Genome Project originally aimed to accomplish.

You can learn more about the history of the Open Book Genome Project in an upcoming blog post. You can also learn more about the other half of the Open Book Genome Project called Community Reviews in this blog post.

Here’s where we started

When I began working on the OBGP Sequencer, the general code structure and a few features were in place. The sequencer could extract a book’s N-gram term frequency and identify its copyright page number. There were many features in the product development pipeline, but no one dedicated to implement them. Over the past few months, I led development to add and improve the Sequencer’s functionality, created an automated pipeline to process books in volume, and deployed this pipeline to production on the Archive’s corpus of books.

One challenging part of the development process was getting ISBN extraction working accurately. The ISBN extractor works by first finding what it thinks is the book’s copyright page and then checking for a valid ISBN checksum in every number sequence. Although this approach works, there are often a lot of strange edge cases usually having to do with poor optical character recognition. To address this, I was  manually spot checking books for ISBN’s that were detected and missed, and investigating why to iteratively improve the extraction process. Here is a screenshot of my process.

Another challenge later on in the development process was getting books processed at scale. With a collection as large as the Archive’s, parallelization of processing is an essential component of scaling the sequencer up. I taught myself to use some of Python’s parallelization libraries and implemented them. Another challenge was getting parallelization working with the database. I addressed this by making the file system and directory layout database because modern file systems are built to work well with parallel I/O.

Here’s what we were able to accomplish with OBGP

  1. Make more books borrowable to patrons
  2. Add reading levels for thousands of books
  3. Identify & save urls found within books
  4. Produce a large public dataset of book insights

Making Books Lendable

Nearly 200,000 books digitized by the Internet Archive were missing key metadata like ISBN. The ISBN is used to look up all sorts of book information which is helpful for determining whether a book is eligible for the Internet Archive’s lending program. The absence of this key information was thus preventing tens of thousands of eligible books from.

As of writing, the Open Book Genome Project sequencer has extracted ISBN’s for 25,705 books that were previously unknown. 12,700 of those are newly lendable to patrons. Take a look at them here!

These books now have identifying information and are linked to Open Library Records. Open Library pages that  had no books available now have borrowable books. Here is a before and after screenshot.

Before

After

Adding reading levels

It’s often difficult to identify age-appropriate materials for students and children. By adding reading level information to Internet Archive’s book catalog, we’re able to make age-appropriate books more accessible.

The Sequencer now performs a Flesch–Kincaid readability test on each book on which it is run. This resulting Flesch–Kincaid grade level estimation allows students, parents, and teachers to filter their searches for books which include appropriate reading levels.

Preserving URLs

Open Library is aware of more than 1M books containing urls. These mentions by credible authors are like a vote of confidence of their relevance and usefulness. These websites are at risk of link rot and without preservation could be lost forever. But given the average webpage only lasts 100 days, it’s only a matter of time before millions of URLs found in millions of books will be preserved for future generations.

As of writing, URLs have been successfully extracted from more than 13,000 books, which will soon be preserved on the Wayback Machine. Many of the high quality references found in published books have not yet been preserved and now will be.

Producing public datasets

The original goal of OBGP was to produce an open, public data set of book insights capable of powering the open web. As of writing, the Open Book Genome Project sequencer has uploaded genomes for 180,642 books. For every book sequenced, a book genome is made publicly accessible that provides insights into the book without needing to borrow it. The goal of this is to increase the quantity and quality of publicly available descriptive information available for every book, so that readers and researchers can make better informed decisions and glean deeper insights about books. This supports readers, researchers, book sellers, libraries, and beyond.

Personal Development

I really enjoyed participating in GSoC with the Internet Archive because I was able to build programming foundations and gain industry experience that will prove invaluable in my future. I developed my project management skills, became more comfortable programming in Python and using new software libraries, and advanced my knowledge of dev-ops tools like Docker.

The future of the project

If you may be interested in contributing or learning more about the Open Book Genome Project sequencer, please send us an email.

Although we made a lot of progress with this projects development, there is still a lot more to be done. Here is a quick list of possible future features to get you excited about the possibilities of this project:

  • Make URL’s clickable
  • Identifying meaningful semantic elements in books, like Entities and Citations
  • increase the number of previewed pages & volume of previewable content.
  • Clickable Chapters in Table of Contents
  • Library of Congress Catalog Number extraction
  • Copyright information (Publisher, copyright date) extraction
  • Book and chapter Summarization and Topic Classification

Giacomo Cignoni: My Internship at the Internet Archive

This summer, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative to help students gain coding experience by contributing to open source projects. I was lucky enough to mentor Giacomo while he worked on improving our BookReader experience and infrastructure. We have invited Giacomo to write a blog post to share some of the wonderful work he has done and his learnings. It was a pleasure working with you Giacomo, and we all wish you the best of luck with the rest of your studies! – Drini


Hi, I am Giacomo Cignoni, a 2nd year computer science student from Italy. I submitted my 2020 Google Summer of Code (GSoC) project to work with the Internet Archive and I was selected for it. In this blogpost, I want to tell you about my experience and my accomplishments working this summer on BookReader, Internet Archive’s open source book reading web application.

Continue reading

Google Summer of Code 2020: Adoption by Book Lovers

by Tabish Shaikh & Mek

OpenLibrary.org,the world’s best-kept library secret: Let’s make it easier for book lovers to discover and get started with Open Library.

Hi, my name is Tabish Shaikh and this summer I participated in the Google Summer of Code program with Open Library to develop improvements which will help book lovers discover and use OpenLibrary.org.

Continue reading