Introducing the Open Library Explorer

Try it here! If you like it, share it.

Bringing 100 Years of Librarian-Knowledge to Life

By Nick Norman with Drini Cami & Mek

At the Library Leaders Forum 2020 (demo), Open Library unveiled the beta for what it’s calling the Library Explorer: an immersive interface which powerfully recreates and enhances the experience of navigating a physical library. If the tagline doesn’t grab your attention, wait until you see it in action:

Drini showcasing Library Explorer at the Library Leaders Forum

Get Ready to Explore

In this article, we’ll give you a tour of the Open Library Explorer and teach you how one may take full advantage of its features. You’ll also get a crash course on the 100+ years of library history which led to its innovation and an opportunity to test-drive it for yourself. So let’s get started!  

What better way to set the stage than by taking a trip down memory lane to the last time you were able to visit your local public library. As you pass the front desk, a friendly librarian scribbles some numbers on a piece of paper which they hand to you and points you towards a relevant section. With the list of library call numbers in your hand as your compass, you eagerly make your way through waves of towering bookshelves. Suddenly, you depart from reality and find yourself navigating through a sea of books, discovering treasures you didn’t even know existed.

big Library Books
Library photo courtesy of pixy.org/5775865/

Before you know it, one book gets stuffed under one arm, two more books go under your other arm, and a few more books get positioned securely between your knees. You’re doing the math to see how close you are to your check-out limit. Remember those days?

What if you could replicate that same library experience and access it every single day, from the convenience of your web browser? Well, thanks to the new Open Library Explorer, you can experience the joys of a physical library right in your web browser, as well as leverage superpowers which enable you to explore in ways which may have previously been impossible.

Before we dive into the bells-and-whistles of the Library Explorer, it’s worth learning how and why such innovations came to be.

Who needs Library Explorer?

This year we’ve seen systems stressed to their max due to the COVID-19 pandemic. With libraries and schools closing their doors globally and stay-at-home orders hampering our access, there has been a paradigm shift in the needs of researchers, educators, students, and families to access fundamental resources online. Getting this information online is a challenge in and of itself. Making it easy to discover and use materials online is another entirely. How does one faithfully compress the entire experience of a reliable, unbiased, expansive public library and its helpful, friendly staff into a 14” computer screen?

Some sites, like Netflix or YouTube, solve this problem with recommendation engines that populate information based on what people have previously seen or searched. Consequently, readers may unknowingly find themselves caught in a sort of “algorithmic bubble.”

An algorithmic bubble (or “filter bubble”) is a state of intellectual or informational isolation that’s perpetuated by personalized content. Algorithmic bubbles can make it difficult for users to access information beyond their own opinions—effectively isolating them in their own cultural or ideological silos. 

Drini Cami, the creator of Library Explorer, says that users’ caught inside these algorithmic bubbles “won’t be exposed to information that is completely foreign to [them]. There is no way to systematically and feasibly explore.” Hence the reasoning behind the Library Explorer’s intelligence comes out of a need to discover information without the constraints of algorithmic bubbles.

As readers are exposed to more information, the question becomes, how can readers fully explore swaths of new information and still enjoy the experience?

Let’s take a look at how the Library Explorer tackles that half of the problem.

Humanity’s Knowledge Brought to Life

Earlier this year, Open Library added the ability to search materials by both Dewey Decimal Classification and Library of Congress Classification. These systems contain embedded within them over 100 years of librarian experience, and provide a systematized approach to sort through the entirety of humanity’s knowledge embedded in books. 

It is important to note, the systematization of knowledge alone does not necessarily make it easily discoverable. This is what makes the Library Explorer so special. Its digital interface opens the door for readers to seamlessly navigate centuries of books anywhere online.

Thanks to innovations such as the Library Explorer, readers can explore more books and access more knowledge with a better experience.

A tour of Library Explorer’s features

If you’re pulling up a chair for the first time, the Library Explorer presents you with tall, clickable bookshelves situated across your screen. Each shelf has its own identity that can morph into new classes of books and subject categories with a single click. And that’s only the beginning of what it offers.

In addition to those smart filters, the Library Explorer wants you to steer the ship… not the other way around. In other words, you can personalize single rows of books, expand entire shelves, or construct an entire library-experience that evolves around your exact interests. You can custom tailor your own personal library from the comfort of your device, wherever you may be.

Quick question: as a kid, did you ever layout your newly checked-out library books on your bed to admire them? Well, the creators behind the Library Explorer found a way to mimic that same experience. If you so choose, you can zoom out of the Library Explorer interface to get a complete view of the library you’ve constructed.

Let’s explore one more set of cool features the Library Explorer offers by clicking on the “Filter” icon at the bottom of the page.

By selecting “Juvenile,” you can instantly transform your entire library into a children’s library, but keep all the useful organization and structure provided by the bookshelves. It’s as if your own personal librarian ran in at lightning speed and removed every book from each shelf that didn’t meet your criteria. Or you may type in “subject:biography” and suddenly your entire library shows you a tailored collection of just biographies on every subject. The sky is your limit.

If you click on the Settings tab, you’re given several options to customize the look and feel of your personal Library Explorer. You can switch between using Library of Congress or Dewey Decimal classification to organize your shelves. You can also choose from a variety of delightful options to see your books in 3D. Each book has the correct thickness determined by its actual number of pages. To see your favorite book in 3D, click the settings icon at the bottom of the screen and then press the 3D button.

Maybe you’ve experienced a time where you had limited space in your book bag. Perhaps because of that, you chose to wait on checking out heavier books. Or, maybe you judged a book’s strength of knowledge based on its thickness. If that’s you, guess what? The Open Library Explorer lets you do that. 

It gets personal…

The primary goal of the Library Explorer was to create an experimental interface that ‘opens the door’ for readers to locate new books and engage with their favorite books. The Library Explorer is one of many steps that both the Internet Archive and the Open Library have made towards making knowledge easy to discover.

As you know, such innovation couldn’t be possible without people who believe in the necessity of reading. Here is a list of the names of those who contributed to the creation of the Library Explorer:

  • Drini Cami, Open Library Developer and Library Explorer Creator
  • Mek Karpeles, Open Library Program Lead
  • Jim Shelton, UX Designer, Internet Archive
  • Ziyad Basheer, Product Designer
  • Tinnei Pang, Illustrator and Product Designer
  • James Hill-Khurana, Product Designer
  • Nick Norman, Open Library Storyteller & Volunteer Communications Lead 

Well, this is the moment you’ve been waiting for. Go here and give the Library Explorer a beta test-run. Also, follow @OpenLibrary on Twitter to learn about other features as soon as they’re released.

But before you go… in the comments below, tell us your favorite library experience. We’d love to hear!

Posted in Uncategorized | 14 Responses

Importing your Goodreads & Accessing them with Open Library’s APIs

by Mek

Today Joe Alcorn, founder of readng, published an article (https://joealcorn.co.uk/blog/2020/goodreads-retiring-API) sharing news with readers that Amazon’s Goodreads service is in the process of retiring their developer APIs, with an effective start date of last Tuesday, December 8th, 2020.

Deprecation notice on Goodreads API documentation
A screenshot taken from Joe Alcorn’s post

The topic stirred discussion among developers and book lovers alike, making the front-page of the popular Hacker News website.

Hacker News at 2020-12-13 1:30pm Pacific.

The Importance of APIs

For those who are new to the term, an API is a method of accessing data in a way which is designed for computers to consume rather than people. APIs often allow computers to subscribe to (i.e. listen for) events and then take actions. For example, let’s say you wanted to tweet every time your favorite author published a new book. One could sit on Goodreads and refresh the website every fifteen minutes. Or, one might write a twitter bot which automatically connects to Goodreads and checks real-time data using its API. In fact, the reason why Twitter bots work, is that they use Twitter’s API, a mechanism which lets specially designed computer programs submit tweets to the platform.

As one of the more popular book services online today, tens of thousands of readers and organizations rely on Amazon’s Goodreads APIs to lookup information about books and to power their book-related applications across the web. Some authors rely on the data to showcase their works on their personal homepages, online book stores to promote their inventory, innovative new services like thestorygraph are using this data to help readers discover new insights, and even librarians and scholastic websites rely on book data APIs to make sure their catalog information is as up to date and accurate as possible for their patrons.

For years, the Open Library team has been enthusiastic to share the book space with friends like Goodreads who have historically shown great commitment by enabling patrons to control (download and export) their own data and enabling developers to create flourishing ecosystems which promote books and readership through their APIs. When it comes to serving an audience of book lovers, there is no “one size fits all” and we’re glad so many different platforms and APIs exist to provide experiences which meet the needs of different communities. And we’d like to do our part to keep the landscape flourishing.

“The sad thing is it [retiring their APIs] really only hurts the hobbyist projects and Goodreads users themselves.” — Joe Alcorn

Picture of Aaron Swartz by Noah Berger/Landov from thedailybeast

At Open Library, our top priority is pursuing Aaron Swartz‘s original mission: to serve as an open book catalog for the public (one page for every book ever published) and ensure our community always has free, open data to unlock a world of possibilities. A world which believes in the power of reading to preserve our cultural heritage and empower education and understanding. We sincerely hope that Amazon will decide it’s in Goodreads’ best interests to re-instate their APIs. But either way, Open Library is committed to helping readers, developers, and all book lovers have autonomy over their data and direct access to the data they rely on.

One reason patrons appreciate Open Library is that it aligns with their values

Imports & Exports

In August 2020, one of our Google Summer of Code contributors Tabish Shaikh helped us implement an export option for Open Library Reading Logs to help everyone retain full control of their book data. We also created a Goodreads import feature to help patrons who may want an easy way to check which Goodreads titles may be available to borrow from the Internet Archive’s Controlled Digital Lending program via openlibrary.org and to help patrons organize all their books in one place. We didn’t make a fuss about this feature at the time, because we knew patrons have a lot of options. But things can change quickly and we want patrons to be able to make that decision for themselves.

For those who may not have known, Amazon’s Goodreads website provides an option for downloading/exporting a list of books from one’s bookshelves. You may find instructions on this Goodreads export process here. Open Library’s Goodreads importer enables patrons to take this exported dump of their Goodreads bookshelves and automatically add matching titles to their Open Library Reading Logs.

The Goodreads import feature from https://openlibrary.org/account/import

Known issues. Currently, Open Library’s Goodreads Importer only works for (a) titles that are in the Open Library catalog and (b) which are new enough to have ISBNs. Our staff and community are committed to continuing to improve our catalog to include more titles (we added more than 1M titles this year) and we plan to improve our importer to support other ID types like OCLC and LOC.

APIs & Data

Developers and book overs who have been relying on Amazon’s Goodreads APIs are not out of luck. There are several wonderful services, many of them open-source, including Open Library, which offer free APIs:

  1. Wikidata.org (by the same group who brought us Wikipedia) is a treasure trove of metadata on Authors and Books. Open Library gratefully leverages this powerful resource to enrich our pages.
  2. Inventaire.io is a wonderful service which uses Wikidata and Openlibrary data (API: api.inventaire.io)
  3. Bookbrainz.org (by the group who runs Musicbrainz) is a up-and-coming catalog of books
  4. WorldCat by OCLC offers various metadata APIs

Did we miss any? Please let us know! We’d love to work together, build stronger integrations with, and support other book-loving services.

Open Library’s APIs. And of course, Open Library has a free, open, Book API which spans nearly 30 million books.

Bulk Data. If you need access to all our data, Open Library releases a free monthly bulk data dump of Authors, Books, and more.

Spoiler: Everything on Open Library is an API!

One of my favorite parts of Open Library is that practically every page is an API. All that is required is adding “.json” to the end. Here are some examples:

Search
https://openlibrary.org/search?q=lord+of+the+rings is our search page for humans…
https://openlibrary.org/search.json?q=lord+of+the+rings is our Search API!

Books
https://openlibrary.org/books/OL25929351M/Harry_Potter_and_the_Methods_of_Rationality is the human page for Harry Potter and the Methods of Rationality…
https://openlibrary.org/books/OL25929351M.json is its API!

Authors
https://openlibrary.org/authors/OL2965893A/Rik_Roots is a human readable author page…
https://openlibrary.org/authors/OL2965893A.json and here is the API!

Did We Mention: Full-text Search over 4M Books?

Major hat tip to the Internet Archive’s Giovanni Damiola for this one: Folks may also appreciate the ability to full-text search across 4M of the Internet Archive’s books (https://blog.openlibrary.org/2018/07/14/search-full-text-within-4m-books) on Open Library:

You can try it directly here:
http://openlibrary.org/search/inside?q=thanks%20for%20all%20the%20fish

As per usual, nearly all Open Library urls are themselves APIs, e.g.:
http://openlibrary.org/search/inside.json?q=thanks%20for%20all%20the%20fish

Get Involved

Questions? Open Library is an free, open-source, nonprofit project run by the Internet Archive. We do our development transparently in public (here’s our code) and our community spanning more than 40 volunteers meets every week, Tuesday @ 11:30am Pacific. Please contact us to join our call and participate in the process.

Bugs? If something isn’t working as expected, please let us know by opening an issue or joining our weekly community calls.

Want to share thanks? Please follow up on twitter: https://twitter.com/openlibrary and let us know how you’re using our APIs!

Thank you

A special thank you to our lead developers Drini Cami, Chris Clauss, and one of our lead volunteer engineers, Aaron, for spending their weekend helping fix a Python 3 bug which was temporarily preventing Goodreads imports from succeeding.

A Decentralized Future

The Internet Archive has a history cultivating and supporting the decentralized web. We operate a decentralized version of archive.org and host regular meetups and summits to galvanize the distributed web community.

In the future, we can imagine a world where no single website controls all of your data, but rather patrons can participate in a decentralized, distributed network. You may be interested to try Bookwyrm, an open-source decentralized project by Mouse, former engineer on the Internet Archive’s Archive-It team.

Posted in Uncategorized | 1 Response

On Bookstores, Libraries & Archives in the Digital Age

The following was a guest post by Brewster Kahle on Against The Grain (ATG) – Linking Publishers, Vendors, & Librarians

See the original article here on ATG’s website

By: Brewster Kahle, Founder & Digital Librarian, Internet Archive​​​​​​​

​​​Back in 2006, I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.”  This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural.  My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive.

The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place. 

Bookstores: The Thrill of the Hunt

Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before Amazon.com and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game.

Libraries: Offering Conversations not Answers

The libraries that I used in Boston—MIT LibrariesHarvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’ short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored. 

Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas.  Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better. 

Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet).

But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States.  Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others. 

But why was this library of law books not available to everyone? It stung me. It did not seem right. 

A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square:  “Free to All.”  

Archives: A Wonderful Place for Singular Obsessions

When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated.  I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector.

So in this formulation, an archive is a collection, archives are collections of collections.  Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process.

The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future. 

Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light.

Digital Libraries: A Memex Dream, a Global Brain

So when I helped start the Internet Archive, we wanted to build a digital library—a “complete enough” collection, and “organized enough” that everything would be there and findable. A Universal Library. A Library of Alexandria for the digital age. Fulfilling the memex dream of Vanevar Bush (do read “As We May Think“), of Ted Nelson‘s Xanadu, of Tim Berners-Lee‘s World Wide Web, of Danny Hillis‘ Thinking Machine, Raj Reddy’s Universal Access to All Knowledge, and Peter Russell’s Global Brain.

Could we be smarter by having people, the library, networks, and computers all work together?  That is the dream I signed on to.  I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be  a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.”

About the Author: Brewster Kahle, Founder & Digital Librarian, Internet Archive

Brewster Kahle
Brewster Kahle

A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all.

Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to Amazon.com in 1999.  Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta.

Posted in Discussion, Librarianship, Uncategorized | Comments closed

Amplifying the voices behind books

Exploring how Open Library uses author data to help readers move from imagination to impact

By Nick Norman, Edited by Mek & Drini

Image Source: Pexels / Pixabay from popsugar

According to René Descartes, a creative mathematician, “The reading of all good books is like a conversation with the finest [people] of past centuries.” If that’s true, then who are some of the people you’re talking to?

If you’re not sure how to answer that question, you’ll definitely appreciate the ‘Author Stats’ feature  developed  by Open Library.

A deep dive into author stats

Author stats give readers clear insights about their favorite authors that go much deeper than the front cover: such as birthplace, gender, works by time, ethnicity, and country of citizenship. These bits and pieces of knowledge about authors can empower readers in some dynamic ways. But how exactly?

To answer that question, consider a reader who’s passionate about the topic of cultural diversity. However, after the reader examines their personalized author stats, they realize that their reading history lacks diversity. This doesn’t mean the reader isn’t passionate about cultural diversity; rather, author stats empowers the reader to pinpoint specific stats that can be diversified.

Take a moment … or a day, and think about all the books you’ve read — just in the last year or as far back as you can. What if you could align the pages of each of those books with something meaningful … something that matters? What if each time you cracked open a book, the voices inside could point you to places filled with hope and opportunity?

According to Drini Cami — Open Library’s lead developer behind Author Stats ,  “These stats let readers determine where the voices they read are coming from.” Drini continues saying, “A book can be both like a conversation as well as a journey.” He also says, “Statistics related to the authors might help provide readers with feedback as to where the voices they are listening to are coming from, and hopefully encourage the reading of books from a wider variety of perspectives.” Take a moment to let that sink in.

Data with the power to change

While Open Library’s author stats can show author-related demographics, those same stats can do a lot more than that. Drini Cami went on to say that, “Author stats can help readers intelligently alter their  behavior (if they wish to).” A profound statement that Mark Twain — one of the best writers in American history — might even shout from the rooftop.

Broad, wholesome, charitable views of [people] … cannot be acquired by vegetating in one little corner of the earth all one’s lifetime. — Mark Twain

In the eyes of Drini Cami and Mark Twain, books are like miniature time machines that have the power to launch readers into new spaces while changing their behaviors at the same time. For it is only when a reader steps out of their corner of the earth that they can step forward towards becoming a better person — for the entire world.

Connecting two worlds of data

Open Library has gone far beyond the extra mile to provide data about author demographics that some readers may not realize. It started with Open Library’s commitment to providing its readers with what Drini Cami describes as “clean, organized, structured, queryable data.” Simply put, readers can trust that Open Library’s data can be used to provide its audiences with maximum value. Which begs the question, where is all that ‘value’ coming from?

Drini Cami calls it “linked data”. In not so complex terms, you may think of linked data as being two or more storage sheds packed with data. When these storage sheds are connected, well… that’s when the magic happens. For Open Library, that magic starts at the link between Wikidata and Open Library knowledge bases.

Wikidata, a non-profit community-powered project run by Wikimedia, the same team which brought us Wikipedia, is a “free and open knowledge base that can be read and edited by both humans and machines”. It’s like Wikipedia except for storing bite-sized encyclopedic data and facts instead of articles. If you look closely, you may even find some of Wikidata’s data being leveraged within Wikipedia articles.

Wikidata is where Open Library gets its author demographic data from. This is possible because the entries on Wikidata often include links to source material such as books, authors, learning materials, e-journals, and even to other knowledge bases like Open Library’s. Because of these links, Open Library is able to share its data with Wikidata and often times get back detailed information and structured data in return. Such as author demographics.

Wrangling in the Data

Linking-up services like Wikidata and Open Library doesn’t happen automatically. It requires the hard work of “Metadata Wranglers”. That’s where Charles Horn comes in, the lead Data Engineer at Open Library — without his work, author stats would not be possible.

Charles Horn works closely with Drini Cami and also the team at Wikidata to connect book and author resources on Open Library with the data kept inside Wikidata. By writing clever bots and scripts, Charles and Drini are able to make tens of thousands of connections at scale. To put it simply, as both Open Library and Wikidata grow, their resources and data will become better connected and more accurate. 

Thanks to the help of “Metadata Wranglers”, Open Library users will always have the smartest results — right at their fingertips. 

It’s in a book …

Once Upon a Time, ten-time Grammy Award Winner Chaka Kahn greeted television viewers with her bright voice on the once-popular book reading program, Reading Rainbow. In her words, she sang … “Friends to know, and ways to grow, a Reading Rainbow. I can be anything. Take a look, it’s in a book …”

Thanks to Open Library’s author stats, not only do readers have the power to “take a look” into books, they can see further, and truly change what they see.

Try browsing your author stats and consider following Open Library on twitter.

The “My Reading Stats” option may be found under the “My Books” drop down menu within the main site’s top navigation.

What did you learn about your favorite authors? Please share in the comments below.

Posted in Community, Cultural Resources, Data | Comments closed

Giacomo Cignoni: My Internship at the Internet Archive

This summer, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative to help students gain coding experience by contributing to open source projects. I was lucky enough to mentor Giacomo while he worked on improving our BookReader experience and infrastructure. We have invited Giacomo to write a blog post to share some of the wonderful work he has done and his learnings. It was a pleasure working with you Giacomo, and we all wish you the best of luck with the rest of your studies! – Drini


Hi, I am Giacomo Cignoni, a 2nd year computer science student from Italy. I submitted my 2020 Google Summer of Code (GSoC) project to work with the Internet Archive and I was selected for it. In this blogpost, I want to tell you about my experience and my accomplishments working this summer on BookReader, Internet Archive’s open source book reading web application.

The BookReader features I enjoyed the most working on are page filters (which includes “dark mode”) and the text selection layer for certain public domain books. They were both challenging, but mostly had a great impact on the user experience of Bookreader. The first allows text to be selected and copied directly from the page images (currently in internal testing), and the second permits turning white-background black-text pages into black-background-white-text ones.

Short summary of implemented features:

  • End-to-end testing (search, autoplay, right-to-left books)
  • Generic book from Internet Archive demo
  • Mobile BookReader table of contents
  • Checkbox for filters on book pages (including dark mode)
  • Text selection layer plugin for public domain books
  • Bug fixes for page flipping
  • Using high resolution book images bug fix

First approach to GSoC experience

Once I received the news that I had been selected for GSoC with Internet Archive for my BookReader project, I was really excited, as it was the beginning of a new experience for me. For the same reason, I will not hide that I was a little bit nervous because it was my first internship-like experience. Fortunately, even from the start, my mentor Drini and also Mek were supportive and also ready to offer help. Moreover, the fact that I was already familiar with BookReader was helpful, as I had already used it (and even modified it a little bit) for a personal project.

For most of the month of May, since the 6th, the day of the GSoC selection, I mainly focused on getting to know the other members of the UX team at Internet Archive, whom I would be working with for the rest of the summer, and also define a more precise roadmap of my future work with my mentor, as my proposed project was open to any improvements for BookReader.

End to end testing

The first tasks I worked on, as stated in the project, were about end-to-end testing for BookReader. I learned about the Testcafe tool that was to be used, and my first real task was to remove and explore some old QUnit tests (#308). Then I started to make end-to-end tests for the search feature in BookReader, both for desktop (#314) and mobile (#322). Lastly, I fixed the existent autoplay end-to-end test (#344) that was causing problems and I also had prepared end-to-end tests for right-to-left books (#350), but it wasn’t merged immediately because it needed a feature that I would have implemented later; a system to choose different books from the IA servers to be displayed specifying the book id in the URL.

This work on testing (which lasted until the ~20th of June) was really helpful at the beginning as it allowed me to gain more confidence with the codebase without trying immediately harder tasks and also to gain more confidence with JavaScript ES6. The frequent meetings with my mentor and other members of the team made me really feel part of the workplace.

Working on the source code

The table of contents panel in BookReader mobile

My first experience working on core BookReader source code was during the Internet Archive hackathon on May the 30th when, with the help of my mentor, I created the first draft for the table of content panel for mobile BookReader. I would then resume to work on this feature in July, refining it until it was released (#351). I then worked on a checkbox to apply different filters to the book page images, still on mobile BookReader (#342), which includes a sort of “dark mode”. This feature was probably the one I enjoyed the most working on, as it was challenging but not too difficult, it included some planning and was not purely technical and received great appreciation from users.

Page filters for BookReader mobile let you read in a “dark mode”
https://twitter.com/openlibrary/status/1280184861957828608

Then I worked on the generic demo feature; a particular demo for BookReader which allows you to choose a book  from the Internet Archive servers to be displayed, by simply adding the book id in the URL as a parameter (#356). This allowed the right to left e2e test to be merged and proved to be useful for manually testing the text selection plugin. In this period I also fixed two page flipping issues: one more critical (when flipping pages in quick succession the pages started turning back and forth randomly) (#386), and the other one less urgent, but it was an issue a user specifically pointed out (in an old BookReader demo it was impossible to turn pages at all) (#383). Another issue I solved was BookReader not correctly displaying high resolution images on high resolution displays (#378).

Open source project experience

One aspect I really enjoyed of my GSoC is the all-around experience of working on an open source project. This includes leaving more approachable tasks for the occasional member of the community to take on and helping them out. Also, I found it interesting working with other members of the team aside from my mentor, both for more technical reasons and for help in UI designing and feedback about the user experience: I always liked having more points of view about my work. Moreover, direct user feedback from the users, which showed appreciation for the new implemented features (such as BookReader “dark mode”), was very motivating and pushed me to do better in the following tasks.

Text selection layer

The normally invisible text layer shown red here for debugging

The biggest feature of my GSoC was implementing the ability to select text directly on the page image from BookReader for public domain books, in order to copy and paste it elsewhere (#367). This was made possible because Internet Archive books have information about each word and its placement in the page, which is collected by doing OCR. To implement this feature we decided to use an invisible text layer placed on top of the page image, with words being correctly positioned and scaled. This made it possible to use the browser’s text selection system instead of creating a new one. The text layer on top of the page was implemented using an SVG element, with subelements for each paragraph and word in the page. The use of the SVG instead of normal html text elements made it a lot easier to overcome most of the problems we expected to find regarding the correct placement and scaling of words in the layer.

I started working sporadically on this feature since the start of July and this led to having a workable demo by the first day of August. The rest of the month of August was spent refining this feature to make it production-ready. This included refining word placement in the layer, adding unit tests, adding support for more browsers, refactoring some functions, making the experience more fluid, making the selected text to be accurate for newlines and spaces on copy. The most challenging part was probably to integrate well the text selection actions in the two page view of BookReader, without disrupting the click-to-flip-page and other functionalities related to mouse-click events.

This feature is currently in internal testing, and scheduled for release in the next few weeks.

The text selection experience

Conclusions

Overall, I was extremely satisfied with my GSoC at the Internet Archive. It was a great opportunity to learn new things for me. I got much more fluent in JavaScript and CSS, thanks to both my mentor and using these languages in practice while coding. I learnt a lot about working on an open source project, but a part that I probably found really interesting was attending and participating in the decision making processes, even about projects I was not involved in. It was also interesting for me to apply concepts I had studied on a more theoretical level at university in a real workplace environment.

To sum things up, the ability to work on something I liked that had an impact on users and the ability to learn useful things for my personal development really made this experience worthwhile for me. I would 100% recommend doing a GSoC at the Internet Archive!

Posted in BookReader, Community, Google Summer of Code (GSoC), Open Source | Comments closed
  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives