Author Archives: mek

About mek

Citizen of the World, Open Librarian @ openlibrary.org

Extending a Warm Welcome to New Patrons

By Sabreen Parveen with Ray Berger & Mek

A Forward from the Mentors

For book lovers who use openlibrary.org every day, it may be easy to forget what it felt like to visit the website for the first time. Some features which some were able to learn the hard way — through trial and error — may not be as easy or intuitive for others to understand. We feel like we’ve failed each time a patron leaves the library, frustrated, and before even having the chance to understand the value it may provide to them.

At Open Library, we strive to design a service which is accessible and easy for anyone to use and understand. We understand that everyone has different experiences and usability needs. Our mission is to make books as accessible and useful to the public as possible, and we’re unable to do this if patrons aren’t given the opportunity and resources to learn how our services work.

After polling dozens of patrons on video calls and through surveys, we started to get a good idea about which aspects of the website are most confusing to new patrons. The most common question was, “what is Open Library and what does it let you do?”. We tried to search for a clear explanation on our homepage, but there wasn’t one — just rows of books we assumed patrons would click on and somehow understand how it all worked. We also received useful questions concerning which books on Open Library are readable, borrowable, or what is meant when a book shows as unavailable or not in the library. We also received questions about how the Reading Log works. We decided to address some of these frequently asked questions at the earliest possible entry point: on our home page with a new Onboarding Carousel. Leading this project was 2021 Open Library Fellow, Sabreen, with the mentorship of Ray & Mek. We’re so excited and proud to showcase Sabreen’s hard work to you!

Designing a Simple-to-use Onboarding Experience

By Sabreen Parveen

This summer I got this amazing opportunity to work with the Internet Archive as an Open Library Fellow where I contributed to the Onboarding Project.

My Journey with Open Library

I decided to join the Open Library community in 2020 because I was interested in contributing to an open source project and improving my abilities as a programmer and designer. Several things about Open Library stuck out to me while I was browsing projects on github. Firstly, I had the knowledge of the languages and frameworks it used. Secondly the documentation was very clear and easy to understand. Thirdly, the issue tracker contained many exciting ways for me to help. Most importantly the project had an active community and hosted calls every week where I could work with others and ask questions. Once I had familiarized myself with the project, I joined Open Library’s public gitter chatroom and asked questions about getting started. Shortly after, I attended my first community call, received a Slack invite, and later that week submitted my first contribution! I have joined almost all the community calls since. Gradually I started solving more and more issues, many of them related to web accessibility and SEO. I also started creating graphics for Open Library’s “monthly reads” pages. The community must have been excited about my contributions, because this year I was invited to be a 2021 Open Library Fellow and to team up with a mentor to lead a flexible, high-impact project to completion.

Selecting a Project: Onboarding Flow

The project I chose for my 2021 Open Library Fellowship was to add a new user onboarding experience to Openlibrary.org homepage to help new patrons get an overview of the website and how to use its features.

The problem

First time visitors to OpenLibrary.org often report getting confused because they don’t know how to use the service. We had several indicators this was the case:

  • From my own experience, I had been confused when I first started using the website. I didn’t know what the “Want to read button” does? I came to know about the list feature while solving an issue.
  • Bounce Rate: Open Library has a fairly high bounce rate, which is a measure of percentage of people who visit a website and leave without continuing to the other pages. We wondered if this is because patrons were confused about how to use the website and so we wanted to test this.
  • Feedback: We received this feedback from patrons emailing us about their experience

So by adding onboarding flow many of the users will get an insight of what the website actually does.

Implementation

While designing user onboarding, we wanted to create a system that was interactive, contextual, and easy to use and understand. As a result, we decided to start by adding an onboarding carousel to the homepage, the most common place patrons would land on when visiting the website for the first time. We designed the carousel to feature five cards: Read Free Library Books Online, Keep Track of Your Favourite Books, Try the virtual library explorer, Be an Open Librarian and Feedback form to receive feedback from the visitors. 

We  decided on a carousel as the format because they’re

  • non-interruptive.
  • persistent, unlike other onboarding design patterns that only show up upon signup and are never seen again.
  • easy to explore.

When clicked, each card redirects patrons to a FAQs page. In an upcoming version, the “keep track of your favourite books” card will instead trigger an onboarding modal with a step-by-step tutorial containing several slides explaining how we can add a book to our reading log, create a new list and view your reading log. Each feature is explained using a GIF, which is short and descriptive. You can close the modal at any step and any time. The modal creation was a long process of discussions and feedback, but finally we came up with a simple and attractive modal.

During implementation we kept following things in our mind:

  • The icons for the home page cards. Their resemblance with the text.
  • Eye catchy and easy to understand captions
  • Links the card will redirect people to (currently FAQs page)
  • GIFs should be contextual.
  • Modal design should be such that the main focus should be on the GIF and not the modal itself. Also easy navigation between the slides was necessary.

Design Process

To make this project successful, we had weekly meetings and discussions in the community channel to get everyone’s opinion. Designs were mocked up using Figma. I also had the chance to present my ideas before the Internet Archive’s product team. We used feedback from these meetings to review our previous decisions, our progress, and inform next steps. 

Results

  • Alexa: The bounce rate is now reduced to 38.2%.
  • Google Analytics: More than 5000 engagements with these cards.
  • Infrastructure to continue building from which we can re-use in other situations. 

Next Steps

  • Doodles to bring more character to the homepage cards
  • Include pop-up tutorials for more of the cards (other than just Reading Log + Lists)
  • Ability to hide / show the carousel (for patrons who have already received the information) 

My experience

I had a pretty good time working with experienced mentors Mek and Raymond Berger. They were very supportive during the entire program. Sometimes we spent our meeting time finding solutions to some problems together. Additionally, I learned more about project management and clarifying a plan by breaking issues into manageable steps. I got to spend time learning about new industry tools like Figma, which we used for presenting designs and Google Analytics for tracking key metrics. I also gained a deeper understanding of user experience. I learned to design by thinking as a patron of Open Library, what would she or he want? Will it be useful or easy to understand? I appreciated the flexibility of the Open Library Fellowship program, there was no pressure on me so that I could focus on my studies also. We tried to have clear next steps and homeworks at the end of each of our calls. The calls helped clarify what we were hoping to accomplish and provided direction and feedback. Finally, having the community available for regular feedback was really useful for tuning our designs.

About the OpenLibrary Fellowship Program

The Internet Archive’s Open Library Fellowship is a flexible, self-designed independent study which pairs volunteers with mentors to lead develop of a high impact feature for OpenLibrary.org. Most fellowship programs last one to two months and are flexible, according to the availability of contributors. We typically choose fellows based on their exemplary and active participation, conduct, and performance working within the Open Library community. The Open Library staff typically only accepts 1 or 2 fellows at a time to ensure participants receive plenty of support and mentor time. If you’re interested in volunteering as an Open Library Fellow and receiving mentorship, you can apply using this form or email openlibrary@archive.org for more information.

The Open Book Genome Project

We’ve all heard the advice, don’t judge a book by its cover. But then how should we go about identifying books which are good for us? The secret depends on understanding two things:

  1. What is a book?
  2. What are our preferences?

We can’t easily answer the second question without understanding the first one. But we can help by being good library listeners and trying to provide tools, such as the Reading Log and Lists, to help patrons record and discover books they like. Since everyone is different, the second question is key to understanding why patrons like these books and making Open Library as useful as possible to patrons.

What is a book?

As we’ve explored before, determining whether something is a book is a deceptively difficult task, even for librarians. It’s a bound thing made of paper, right? But what about audiobooks and ebooks? Ok, books have ISBNs right? But many formats can have ISBNs and books published before 1967 won’t have one. And what about yearbooks? Is a yearbook a book? Is a dictionary a book? What about a phonebook? A price guide? An atlas? There are entire organizations, like the San Francisco Center for the Book, dedicated to exploring and pushing the limits of the book format.

In some ways, it’s easier to answer this question about humans than books because every human is built according to a specific genetic blueprint called DNA. We all have DNA, what make us unique are the variations of more than 20,000 genes that our DNA are made of, which help encode for characteristics like hair and eye color. In 1990, an international research group called the Human Genome Project (HGP) began sequencing the human genome to definitively uncover, “nature’s complete genetic blueprint for building a human being”. The result, which completed in 2003, was a compelling answer of, “what is a human?”.

Nine years later, Will Glaser & Tim Westergren drew inspiration from HGP and launched a similar effort called the Music Genome Project, using trained experts to classify and label music according to a taxonomy of characteristics, like genre and tempo. This system became the engine which powers song recommendations for Pandora Radio.

Circa 2003, Aaron Stanton, Matt Monroe, Sidian Jones, and Dan Bowen adapted the idea of Pandora to books, creating a book recommendation service called BookLamp. Under the hood, they devised a Book Genome Project which combined computers and crowds to “identify, track, measure, and study the multitude of features that make up a book”.

Their system analyzed books and surfaced insights about their structure, themes, age-appropriateness, and even pace, bringing us withing grasping distance of the answer to our question: What is a book?

BookLamps-Theme-Currents-for-Carrie

Sadly, the project did not release their data, was acquired by Apple in 2014, and subsequently discontinued. But they left an exciting treasure map for others to follow.

And follow, others did. In 2006, a project called the Open Music Genome Project attempted to create a public, open, community alternative to Pandora’s Music Genome Project. We thought this was a beautiful gesture and a great opportunity for Open Library; perhaps we could facilitate public book insights which any project in the ecosystem could use to create their own answer for, “what is a book?”. We also found inspiration from complimentary projects like StoryGraph, which elegantly crowd sources book tags from patrons to help you, “choose your next book based on your mood and your favorite topics and themes”, HaithiTrust Research Center (HTRC) which has led the way in making book data available to researchers, and the Open Syllabus Project which is surfacing useful academic books based on their usage across college curriculum.

Introducing the Open Book Genome Project

Over the last several months, we’ve been talking to communities, conducting research, speaking with some of the teams behind these innovative projects, and building experiments to shape a non-profit adaptation of these approaches called the Open Book Genome Project (OBGP).

Our hope is that this Open Book Genome Project will help responsibly make book data more useful and accessible to the public: to power book recommendations, to compare books based on their similarities and differences, to produce more accurate summaries, to calculate reading levels to match audiences to books, to surface citations and urls mentioned within books, and more.

OBGP hopes to achieve these things by employing a two pronged approach which readers may continue learning about in following two blog posts:

  1. The Sequencer – a community-engineered bot which reads millions of Internet Archive books and extracts key insights for public consumption.
  2. Community Reviews – a new crowd-sourced book tagging system which empowers readers to collaboratively classify & share structured reviews of books.

Or hear an overview of the OBGP in this half-hour tech talk:

GSoC 2021: Making Books Lendable with the Open Book Genome Project

By: Nolan Windham & Mek

I’m Nolan Windham, an incoming freshman at Claremont McKenna College. This summer I participated in my first Google Summer of Code with the Internet Archive. I’ll be sharing the achievements I’ve made with the Open Book Genome Project sequencer, an open source tool which extracts structured data from the contents of the Internet Archive’s massive digitized book collection.

The purpose of the Open Book Genome Project to create “A Literary Fingerprint for Every Book” using the Internet Archive’s 5 million book digital library. A book’s fingerprint currently consists of 1gram (single word) and 2gram (two word) term frequency, Flesch–Kincaid readability level, referenced URLs, and ISBNs found within the book.

Try it out!

Anyone can try running the OBGP Sequencer on an Internet Archive open access book using the new OBGP Sequencer™ Google Colab Notebook. This interactive notebook runs directly within the browser, no installation required. If you have any questions, please email us.

If you are interested in seeing the source code or contributing check out the GitHub. If this project sounds fascinating to you and you’d like to learn more or keep the project going, please talk to us!

How I got involved

I first found the Internet Archive in high school where I used the Wayback Machine for research and Open Library for borrowing books. As I found out more about the Archive’s services and history, I became more and more interested in its operation and its mission: to provide “Universal Access to All Knowledge”. Once I heard this mission, I was hooked and knew I wanted to help. During a school trip to San Francisco, I joined one the Archive’s Friday physical tours (which I highly recommend). The tour guide was impressed with the amount of information this high-schooler knew about the Archive’s operation and took me aside after the tour and showed me Book Reader’s read aloud feature and answered some questions about the book derive process. The tour guide then invited me to join the Open Library community chat where developers, librarians, and patrons discuss all things Open Library. This tour guide turns out to be Mek, my project mentor, Open Library Program Lead, and Citizen of The World.

I started attending the weekly Open Library community calls to learn more about how Open Library works, the issues the project faced, and how I could help. After months of showing up to calls, learning about open source, and developing my programming skills, Mek showed me an interesting prototype called the Open Book Genome Project.

Background

The Open Book Genome Project (OBGP) is a public good, community-run effort whose mission is to produce, “open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.” It was based on a previous effort led by a group in 2003 called the Book Genome Project, to “identify, track, measure, and study the multitude of features that make up a book.” Think of it as Pandora’s Music Genome Project but for books. Apple acquired and discontinued the Book Genome Project in 2014, leaving a gap in the book ecosystem which the Open Book Genome Project community now hopes to help fill for the public benefit.

The Open Book Genome Project is one of many efforts facilitated by members of the Internet Archive’s Open Library community. Their flagship service, OpenLibrary.org is a non-profit, open-source, public online library catalog founded by the late Aaron Swartz, which allows book lovers around the world to access millions of the Internet Archive’s digital books using Controlled Digital Lending (CDL). Open Library hopes the Open Book Genome Project may help patrons discover and learn more about books in some of the ways the Book Genome Project originally aimed to accomplish.

You can learn more about the history of the Open Book Genome Project in an upcoming blog post. You can also learn more about the other half of the Open Book Genome Project called Community Reviews in this blog post.

Here’s where we started

When I began working on the OBGP Sequencer, the general code structure and a few features were in place. The sequencer could extract a book’s N-gram term frequency and identify its copyright page number. There were many features in the product development pipeline, but no one dedicated to implement them. Over the past few months, I led development to add and improve the Sequencer’s functionality, created an automated pipeline to process books in volume, and deployed this pipeline to production on the Archive’s corpus of books.

One challenging part of the development process was getting ISBN extraction working accurately. The ISBN extractor works by first finding what it thinks is the book’s copyright page and then checking for a valid ISBN checksum in every number sequence. Although this approach works, there are often a lot of strange edge cases usually having to do with poor optical character recognition. To address this, I was  manually spot checking books for ISBN’s that were detected and missed, and investigating why to iteratively improve the extraction process. Here is a screenshot of my process.

Another challenge later on in the development process was getting books processed at scale. With a collection as large as the Archive’s, parallelization of processing is an essential component of scaling the sequencer up. I taught myself to use some of Python’s parallelization libraries and implemented them. Another challenge was getting parallelization working with the database. I addressed this by making the file system and directory layout database because modern file systems are built to work well with parallel I/O.

Here’s what we were able to accomplish with OBGP

  1. Make more books borrowable to patrons
  2. Add reading levels for thousands of books
  3. Identify & save urls found within books
  4. Produce a large public dataset of book insights

Making Books Lendable

Nearly 200,000 books digitized by the Internet Archive were missing key metadata like ISBN. The ISBN is used to look up all sorts of book information which is helpful for determining whether a book is eligible for the Internet Archive’s lending program. The absence of this key information was thus preventing tens of thousands of eligible books from.

As of writing, the Open Book Genome Project sequencer has extracted ISBN’s for 25,705 books that were previously unknown. 12,700 of those are newly lendable to patrons. Take a look at them here!

These books now have identifying information and are linked to Open Library Records. Open Library pages that  had no books available now have borrowable books. Here is a before and after screenshot.

Before

After

Adding reading levels

It’s often difficult to identify age-appropriate materials for students and children. By adding reading level information to Internet Archive’s book catalog, we’re able to make age-appropriate books more accessible.

The Sequencer now performs a Flesch–Kincaid readability test on each book on which it is run. This resulting Flesch–Kincaid grade level estimation allows students, parents, and teachers to filter their searches for books which include appropriate reading levels.

Preserving URLs

Open Library is aware of more than 1M books containing urls. These mentions by credible authors are like a vote of confidence of their relevance and usefulness. These websites are at risk of link rot and without preservation could be lost forever. But given the average webpage only lasts 100 days, it’s only a matter of time before millions of URLs found in millions of books will be preserved for future generations.

As of writing, URLs have been successfully extracted from more than 13,000 books, which will soon be preserved on the Wayback Machine. Many of the high quality references found in published books have not yet been preserved and now will be.

Producing public datasets

The original goal of OBGP was to produce an open, public data set of book insights capable of powering the open web. As of writing, the Open Book Genome Project sequencer has uploaded genomes for 180,642 books. For every book sequenced, a book genome is made publicly accessible that provides insights into the book without needing to borrow it. The goal of this is to increase the quantity and quality of publicly available descriptive information available for every book, so that readers and researchers can make better informed decisions and glean deeper insights about books. This supports readers, researchers, book sellers, libraries, and beyond.

Personal Development

I really enjoyed participating in GSoC with the Internet Archive because I was able to build programming foundations and gain industry experience that will prove invaluable in my future. I developed my project management skills, became more comfortable programming in Python and using new software libraries, and advanced my knowledge of dev-ops tools like Docker.

The future of the project

If you may be interested in contributing or learning more about the Open Book Genome Project sequencer, please send us an email.

Although we made a lot of progress with this projects development, there is still a lot more to be done. Here is a quick list of possible future features to get you excited about the possibilities of this project:

  • Make URL’s clickable
  • Identifying meaningful semantic elements in books, like Entities and Citations
  • increase the number of previewed pages & volume of previewable content.
  • Clickable Chapters in Table of Contents
  • Library of Congress Catalog Number extraction
  • Copyright information (Publisher, copyright date) extraction
  • Book and chapter Summarization and Topic Classification

Introducing: Community Reviews

You can now publicly review books using structured book #tags on Open Library with Community Reviews. Take a look, try it out, and send us feedback!


Many social book websites including Goodreads & LibraryThing feature text reviews from the community. Why hasn’t Open Library?

As a non-profit library service with a small staff, there are three reasons we’ve resisted the urge to add text reviews to Open Library. First and foremost, we feel strongly about preserving Open Library as an inclusive, safe, neutral place where readers can trust the information they receive. Some opinionated reviews, even though valid, may contend with this goal. Secondly, we’re cautious about adding features which may require a large time investment to moderate well. We’d rather spend our time making it easier for people across the globe to find books in their native languages than sink all of our time reviewing spam. Finally, there are indeed already several websites which feature text reviews. We’re excited to link patrons to these resources and think our time may be better served exploring new ways of adding unique value back to the book ecosystem.

This all said, reviews are one of the most requested features by book lovers on Open Library and we feel its important readers to have their voices heard. So what are our options?

A review of reviews

One super-power of text reviews is that they are unstructured. Their open-ended format allows reviewers to express very nuanced and deep thoughts like, how impressively the male author Arthur Golden was able to portray the emotional turmoil of the female characters portrayed in Memoirs of A Geisha. This super-power does come with a trade-off. It can be challenging to compare reviews and know which should be trusted; two reviews may have completely diverging styles or focus. One reviewer may be reacting to the story line while another may be critiquing the book’s pace. Reviews are often not easily digestible. A lot of information is lost when one tries to compress a review into a single star rating. Because of these challenges with “digestibility”, it’s also challenging to summarize text reviews as data which may be used to help people discover new books. Amazon has some techniques which we considered:

A collaborative approach

How can Open Library empower readers to share their impressions about books in a new way, facilitate useful reviews which are structured and easily digestible, while maintaining a safe and neutral library landscape?

Open Library’s collaborative approach, which we’re calling Community Reviews, borrows from an old (now defunct) project called BookLamp and a more recent project called StoryGraph, which let participants use tags to vote on & review various aspects of books like pace, genre, mood, and more:

StoryGraph crowd sources tags like genre and mood from the community and use this information to help readers find the right book for them
BookLamp used a hybrid of robots and crowd sourcing to identify themes and topics within books.

The more participants who vote using review tags, the more accurate and meaningful the review becomes for the community. Instead of sifting through dozens of text reviews, Community Reviews gives readers a birds-eye view across many publicly listed dimensions they might care about like Pace, Enjoyability, Clarity, Difficulty, Breadth, Genre, Mood, Impressions, Length, Credibility, Text Features, Content Warnings, Terminology, and Purpose.

Here’s what Open Library Community Reviews looks like:

By clicking “+ Add your community review”, any logged in reader may submit their own public, anonymous reviews:

Building Together

Community Reviews features a public schema which anyone may reference or propose changes to. It’s a work in progress and will undoubtedly need the community’s feedback to become useful over time.

Feedback

Community Reviews is a beta work in progress and we expect it to change drastically over the coming weeks based on feedback from our community. We also anticipate issues and bugs may emerge — you can help by reporting bugs and issues here.

We do have every intention for Community Reviews to be included (in an anonymized form) in our public monthly data dumps for the benefit of our community and via our APIs, though this may take some time to implement.

As the number of Community Reviews increases, our plan is to include them in our search engine so you have ever more ways to identify the best books for you.

We know many patrons would still love to see text reviews on Open Library and that Community Reviews isn’t a replacement for every use case. We sincerely appreciate this and still, we hope that readers will find this new feature valuable and provide us with feedback to improve it over time.

Thanks

We’d like to sincerely thank Jim Champ who recently joined as staff member on Open Library and whose leadership was indispensable in bringing this feature to life. Thank you to you Drini Cami, also staff at Open Library, for his contributions to improving the user experience. If you hate the idea or execution, blame Mek but do give us feedback to improve.

Library Explorer at Library Leaders Forum

Introducing the Open Library Explorer

Try it here! If you like it, share it.

Bringing 100 Years of Librarian-Knowledge to Life

By Nick Norman with Drini Cami & Mek

At the Library Leaders Forum 2020 (demo), Open Library unveiled the beta for what it’s calling the Library Explorer: an immersive interface which powerfully recreates and enhances the experience of navigating a physical library. If the tagline doesn’t grab your attention, wait until you see it in action:

Drini showcasing Library Explorer at the Library Leaders Forum
Get Ready to Explore

In this article, we’ll give you a tour of the Open Library Explorer and teach you how one may take full advantage of its features. You’ll also get a crash course on the 100+ years of library history which led to its innovation and an opportunity to test-drive it for yourself. So let’s get started!  

Continue reading