Raising Crypto for the Greater Good

Open Library is raising 50 Ethereum (ETH) to get books our readers love! Chip-in and help us democratize our bookshelves for all.


If you donate now, WeTrust Spring will match your individual ETH donation 100% (until they’ve hit $100k), through Giving Tuesday, Nov. 27!


In 2006, Aaron Swartz founded Open Library with the vision of creating “one web page for every book ever published”. Over the last twelve years, a lot has changed. Open Library has matured not only into a book catalog spanning 25M editions and 16M unique works, but into a library initiative recognized by the state of California, under the auspices of the Internet Archive. Today, Open Library makes over 3M of Internet Archive’s digital books (2.3M public access, 800k modern borrowable) readable directly from your browser. Last year, over 1.3M books were lent to readers from openlibrary.org.

And we’re just getting started. The dream of an Open Library doesn’t end at cataloging the world’s books. Together, we have the opportunity to create a new type of library which works for its readers. To be a library of the people, by the people, and for the people. A library which democratizes the books on its shelves and empowers its readers to pursue knowledge and fuel their imaginations. But how do we get there?

As a first step, we needed some way for patrons to let us know what books they wanted in their library. In January of this year, Open Library announced a new Reading Log feature which allows readers to keep track of which books they’re reading and which books they wish we had available. Over the last 8 months, a quarter million users have been anonymously helping us identify over 400k books most desired by our community. Next comes the hard part: how can we get all these books for our readers? An answer came to us directly from of one of Aaron’s early presentations on Open Library — crowdfunding and direct democracy.

What if our patrons could help us purchase a collection of books for their library and make them available to the world through our lending library? What if, for starters, we crowdfunded just a single pallet of some of our most requested books, to be purchased and shipped in bulk, and then made lendable to an international audience on openlibrary.org? Something like a global, digital book-drive. And what better way for Open Library to accept donations than with cryptocurrency — decentralized digital currency?

Thanks to the help of a partner, we now have this chance. Starting in November, Open Library is fortunate to be one of a select group of nonprofits to be listed on WeTrust Spring, a platform whose motto is, “Raising Crypto for the Greater Good” and which helps nonprofits accept donations for their causes in cryptocurrency. Through this initiative, Open Library aims to raise 50 ETH (~$10,000 USD) which it can use to unlock a combination of books from Internet Archive’s wishlist and Open Library’s most requested works. We plan to release a blog post about our progress each month in 2019.

Book lovers, help us democratize Open Library for all:

Donate ETH now* or Learn more

*Have your individual ETH donation doubled by WeTrust Spring (until they’ve hit $100k), through Giving Tuesday, Nov. 27!

Don’t have Ethereum? You can also donate using credit card.

Posted in Fundraising, News | Leave a comment

Google Summer of Code 2018

This is Internet Archive’s second year participating in Google Summer of Code, but for Open Library, it’s an exciting first. Open Library’s mission is to create, “a web page for every book” and this summer, we’re fortunate to team with Salman Shah to advance this mission. Salman’s Google Summer of Code roadmap aims to targets two core needs of openlibrary.org: modernizing and increasing the coverage of its book catalog and improving website reliability. 

Bots & Open Library

Every day, users contribute thousands of edits and improvements to Open Library’s book catalog. Anyone with an Open Library account can add a book record to the catalog if it doesn’t already exist. There’s also a great walkthrough on adding or editing data for existing book pages. Making edits manually can be tedious and so the majority of new book pages on Open Library are automatically created by Bots which have been programmed to perform specific tasks by our amazing community of developers and digital librarians. This month, Salman programmed two new bots. The first one is called ia-wishlist-bot. It makes sure an Open Library catalog record exists for each of the 1M books on the Internet Archive’s Wishlist, compiled by Chris Freeland and Matt Miller. The second bot, named onix-bot, takes book feeds (in a special format called ONIX) from our partners (e.g. Cory McCloud at Bibliometa), and makes sure the books exist in our catalog.

Importing Internet Archive Wishlist

Earlier this year, as part of the Open Libraries initiative, Chris Freeland, with the help of Matt Miller and others, compiled a Wishlist of hundreds of thousands of book recommendations for the Internet Archive to digitize:

“Our goal is to bring 4 million more books online, so that all digital learners have access to a great digital library on par with a major metropolitan public library system. We know we won’t be able to make this vision a reality alone, which is why we’re working with libraries, authors, and publishers to build a collaborative digital collection accessible to any library in the country.”

In support of this mission, the Open Library team decided it would be helpful if the metadata for these books were imported into the openlibrary.org catalog. 

Importing thousands of books in bulk into Open Library’s catalog presents several challenges. First, many precautions have to be taken to avoid adding duplicate book and author records to the database. To avoid the creation of duplicate records, Salman used the Open Library Book API to check for existing works by ISBN10, ISBN13, and OCLC identifiers. For this project, we were specifically interested in books which had no other editions on Open Library, so any time we noticed an existing edition for the same work, we skipped it. A second check used the Open Library Search API to check for any existing editions with a similar title and author. If there’s a plausible match, we don’t add it to Open Library. This process leaves us with a much shorter list of presumably unique works to add to Open Library.

Finding book covers for this new shortlist was the next challenge to overcome. These book covers typically come from an Open Library partner like Better World Books. Because Better World Books doesn’t have book covers for every book in our list, we had to be mindful that sometimes their service returns a default fallback image (which we had to detect). We wouldn’t want to add these placeholder images into Open Library’s catalog.

The last step is to make sure we’re not accidentally creating new Author records when we add our shortlist of books to Open Library. Even if we’ve taken precautions to ensure that a book with the same identifiers, title, and author doesn’t already exist doesn’t guarantee that the author isn’t already registered in our database. If they are, duplicating the author record would result in a negative and confusing user experience for readers searching for this author. We check to see if an author already exists on Open Library by using the Author search API and faceting on their name, as well as birth and death dates (where available in our shortlist).

In summary:

  • The Project started with 1 million books which were to be added to Open Library, out of those 1 million books.
  • A lot of these works were duplicates and already existed on Open Library and were merged on Open Library. The number of works that were left after this round were 255,276.
  • The parameters that were matched were ISBN, Title and Author Name and we were started with the top 1000 Open Library works which were added to Open Library. One example for one of the books that were added can be found here

An important output from this step was the standardization and generalization of our bot creation process.

Importing ONIX Records

In late 2017, one of our partners, Cory McCloud from Bibliometa, gifted Open Library access to tens of thousands of book metadata records in ONIX format:

ONIX for Books is an XML format for sharing bibliographic data pertaining to both traditional books and eBooks. It is the oldest of the three ONIX standards, and is widely implemented in the book trade in North America, Europe and increasingly in the Asia-Pacific region. It allows book and ebook publishers to create and manage a corpus of rich metadata about their products, and to exchange it with their customers (distributors and retailers) in a coherent, unambiguous, and largely automated manner.”

Many publishers use ONIX feeds to disseminate the metadata and prices of their books to partner vendors. Cory and his team thought Bibliometa’s ONIX records could be a great opportunity for synergy; to get publishers and authors increased exposure and recognition, and to improve the completeness and quality of Open Library’s catalog.

The steps for processing Bibliometa’s ONIX records is similar to importing books from the Internet Archive Wishlist, especially the steps for ensuring we weren’t creating duplicate records in Open Library. At the same time, the task of determining which authors already exist and which need to be created in the catalog was exacerbated by the fact that fewer birth and death dates were available, greatly reducing our confidence in author searching & matching. In other ways, creating an ONIX import pipeline was simplified by our earlier efforts which had established key conventions for how new bots may be created using the openlibrary-bots repository. Additionally, our ONIX feeds have the advantage of coming with book covers whereas we had to manually source book covers for items in the wishlist. 

The first step towards adding these records to Open Library was to write a parser to convert these ONIX feeds into a format which Open Library can understand.  . Open Library did have an ONIX Parser and Import Script written by the co-founder of Open Library, Aaron Swartz who had written the initial script to parse ONIX Records and add them to the Open Library Database. Like much of Open Library’s scripts, this code was in Python 2.7, encoded a much earlier version of the ONIX specification, and made use of a very old xml parser which was difficult to extend. Unfortunately, we couldn’t find any drop-in python replacements for the ONIX parser on github. These factors motivated rolling our own new ONIX parser.

To start with Salman received a dump of ~70,000 ONIX records from bibliometa to be evaluated for import into Open Library. There were two checks that were implemented for this procedure:

  1. Checking if there was an existing ISBN-10 or ISBN-13 for that particular work on Open Library using the Open Library Client.
  2. Matching via Title or Author and see if the record exists on Open Library or not via an API Call.

While much of the ONIX parser is complete, the ONIX Bot project is still in development.

A Guide on Writing Bots

Interested in writing your own Open Library Bot? For more information on how to make an Open Library Bot and their capabilities, please consult our documentation. The basic steps are:

  1. Apply for a Bot Account on Open Library by contacting the Open Library Maintainer and obtain a bot account. A good way to do this is to respond to this issue on github.
  2. After registering a bot account and having it approved, you can write a bot by extending the openlibrary-client to add accomplish tasks like adding new works to Open Library. You can refer to the openlibrary-client examples.
  3. All bots that add works to Open Library have to be added, are added to the Open Library Bots Repository on Github. Every bot has its own directory with a README containing instructions on how to reproducibly run the bot. Each bot should also link to a corresponding directory within the openlibrary-bots archive.org item where the outputs of the bot may be stored for provenance.

Next Steps: Provisioning

Unfortunately, there wasn’t enough time during the GSoC program to complete all three phrases of our roadmap (Wishlist, ONIX, and Provisioning). The objective of the third phase of our plan was to make Open Library deployment more robust and reliable using Docker and Ansible. Docker has been a discussion point of several Open Library Community Calls and has catalyzed the creation of a docker branch on the Open Library Github Repository which addresses some of the basic use cases outlined in the GSoC proposal. One important outcome is the identification of concrete steps and recommendations which the community can implement to improve Open Library’s provisioning process:

  • Switch from Docker to Docker Compose: Currently the Docker branch uses single Docker files to manage the dependencies for Docker. The goal is to use a single docker-compose file which will manage all services being used.
  • Switch Open Library to use Ansible (a software that automates software provisioning, configuration management, and application deployment). Have a Production as well as a Development Playbook. Playbooks are Ansible’s configuration, deployment, and orchestration language. They can describe a policy you want your remote systems to enforce, or a set of steps in a general IT process.  
  • Use Ansible Vault which is a feature of ansible that allows keeping sensitive data such as passwords or keys in encrypted files. This will replace the current system of having a olsystem.

Retrospective

In retrospect, Google Summer of Code 2018 has resulted in thousands of new books being added to the Open Library catalog. Conventions were established both to streamline and make it easier for others to create new bots in the future and to continue and extend this summer’s work.

Some of the key points that we overlooked while going drafting the proposal were as follows:

  1. Checking whether a book exists on Open Library or not is hard. We started with a simple Title match and ended up with formatting the title, formatting the authors to ensure no new author objects are created, making changes to the code to ensure it doesn’t break when there are no authors for a work in our data.
  2. Improving the openlibrary-client as well as documenting it extensively to ensure that future developers don’t have to go through the code to understand what that particular function ends up doing and how it can be used.
  3. Setting up a structure for the openlibrary-bots directory to ensure future developers are easily able to find the required code they need if they are writing their own bot.
  4. Assuming that data would be perfect and it was a matter of copy-pasting, but in reality, Salman and Mek had to go through the data to understand where the code broke because of various reasons like having a ‘,’(comma) in the string and so on.

One learning we obtained from participating in GSoC for the first time is that we may have been better off focusing on two instead of three work deliverables. By the end of the program, we didn’t have enough time for our third phase, even though we were proud of the progress we made. On the flip side, because of discussions catalyzed during our community calls and suggestions outlined in our GSoC proposal, there is now ongoing community progress on this final phase — dockerization of Open Library — which can be found here.

A major win of this GSoC project is that the project’s complexity necessitated Salman explore writing test cases for the first time and provided first hand experience as to the importance of a test harness in developing an end to end data processing pipeline.

 Three of our biggest objective key results during this program were:

  1. Quality assuring and updating the documentation of the openlibrary-client tool to support future developers.
  2. Creating a new `openlibrary-bots` repository with documentation and processes to ensure that there is a standard way to add future bots moving forward. And also making sure our Wishlist and ONIX bot processes are well documented with results which are reproducible.
  3. Adding thousands of new modern books to the Open Library catalog

Project Links

  1. Open Library Client – https://github.com/internetarchive/openlibrary-client
  2. Open Library Bots (IA Wishlist Bot) – https://github.com/internetarchive/openlibrary-bots/tree/master/ia-wishlist-bot
  3. Open Library Bots (ONIX Bot) – https://github.com/internetarchive/openlibrary-bots/tree/master/onix-bot
  4. Docker (In Progress) – https://github.com/internetarchive/openlibrary/tree/docker
Posted in Uncategorized | Comments closed

Search Full-Text within 4M+ Books

Open Library now lets you search inside the text contents of over 4M books!

A Full-Text Search for “thanks for all the fish” on openlibrary.org

What’s Full-Text Search?

Many book websites, like Amazon and Goodreads, give you the ability to search for books by title and author, but they don’t make it easy to find books based on their contents. This type of searching is called “Full-Text Search”.

Try searching for “brewster kahle alexa internet” on Goodreads or Amazon:

A search for “brewster kahle alexa internet” on goodreads

A search for “brewster kahle alexa internet” on amazon books

Have you ever heard a quote and wished you could figure out which book it came from? Open Library full-text search gives readers the ability to locate books which reference any snippet of text like, “Let every thing have its place“:

A full-text search on openlibrary.org of “let every thing have its place”

Full-Text Search on Archive.org

I’ve been surprised to learn how many people didn’t know that Archive.org has had full-text search for several years — and its really powerful! In 2016, Giovanni Damiola (@giovannidamiola) led a major overhaul of Internet Archive’s full-text search system and unlocked the ability for users to perform full-text searches across almost 40M unique text documents — from patents, to yearbooks, to open-access research papers.

How to activate Full-Text Search mode on Archive.org

 

Full-Text Search of the quote “let every thing have its place” on Archive.org

Open Library Full-Text Search

When you search across 40M documents, it can be a challenge to find the one you’re looking for. One feature which Open Library has been missing is a way to limit Internet Archive’s full-text search to only include results from books on Open Library. So for the last two years, Open Library has patiently waited to take full advantage of full-text search for its users.

Earlier this week, Gio released an improvement to our full-text search engine which lets us get around this historical limitation — and so we jumped on this opportunity to improve our search on openlibrary.org! With the help of Razzi Abuissa, Open Library volunteer, and Mek, Open Library’s project lead, you can now search inside more than 4M Open Library books.

Try a Full-Text Search

Thanks for all the fish! …Wait, what book was that from again?

 

Posted in Search | Comments closed

Star Ratings are Here!

Over the last six months, more than 145,000 of you have tracked which books you want-to-read. Now you can record how you feel about the books you’ve finished reading using star ratings!

Next time you’re on a book page, you’ll see 5-stars beneath the book cover. By clicking one of the 5 stars, you can select the corresponding rating for this book. Your ratings are private by default, though we do intend to offer an option for making your ratings public. Also, while it’s not finished yet, we are working on adding average star ratings to our books pages so you can learn how the community feels about different titles.

We hope you enjoy this new feature as much as we are!

Have ideas or feedback for us? Let us know on twitter!

Posted in Uncategorized | Comments closed

Turn Your Website into a Library

Openlibrary.org has over 3M books lining its digital shelves, but nothing quite beats being able to embed your favorite book directly on your personal site. Last week, with the help of volunteer Galen Mancino, we launched an embed tool which lets you add any Open Library book to your website or blog. Next time you write a book review, you can place its Open Library book right next to it and, if its available, enable your audience to read it with a single click.

What does it look like?

Here’s a version of a webpage which has been modified from its original form to include an Open Library book embed side-by-side its book review.

Want to add a book to your site?

Here’s how! First, find your favorite book on openlibrary.org and click on the embed button (see figure 1). A message box will pop up containing a line of html code you can add to your to your website (see figure 2).

Figure 1

Figure 2

Looking Forward

In the future, we’re considering extending the book embed feature to support reading lists. If you’re interested in this feature, please let us know on our twitter or github.

Volunteer Spotlight

Galen Mancino is a volunteer for the Open Library project. He is passionate about sustainable and local economic growth, revitalization, and how technology can bring us there. He is currently pursuing his Master’s in Interdisciplinary Computer Science. You can learn more about what Galen is working on by going to galenmancino.com.

Posted in Uncategorized | Comments closed
  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives