2018 A Year of Victories!

Happy holidays & Happy New Year, readers! We are thrilled to announce 2018 has been an unprecedented year for openlibrary.org and a great time to be a book-lover. Without skipping a beat, we can honestly say we owe our progress to you, our dedicated community of volunteer developers, designers, and librarians. We hope you’ll join us in celebrating as we recap our 2018 achievements:

Highlighted Victories

New Features

Teamwork Makes the Dream Work

In 2018, 45 members of our community helped fix over 300 issues, contributing over 100,000 lines of code improvements to openlibrary.org and eliminating 95,000 lines of old code.

October was an especially monumental month for our community. Thanks to the organizational efforts of Salman Shah and Tabish Shaikh, Open Library participated in the Hacktoberfest challenge, attracting attention and interest from all around the globe. During this period, 22 members of our community submitted 125 bug fixes and improvements.

The Faces of Open Library

Of the many deserving, we’re proud to feature Charles Horn for his contributions to our Open Library. Charles dedicated three years volunteering as a core developer on openlibrary.org before enthusiastically joining Internet Archive as a full-time staff member this year. Charles has written bots responsible for correcting catalog data for millions of books and tens of thousands of authors. Not only has Charles been a foundational member of the community, running stand-ups and performing code reviews, he’s also designed technology which allows us to fight spam and has designed plumbing which allows millions of new book records to flow into our catalog.

Drini Cami sprung into action during a time when the Open Library’s future was most uncertain and he has left an enormous impact. Drini has written mission critical code to improve our search systems, he’s written code to merge catalog records, fixed thousands of records, worked on linking Open Library records to Wikidata, repaired our Docker build on countless occasions, and has been a critical adviser towards making sure we make the right decisions for our users. We can’t speak highly enough about Drini and our gratitude for the positive energy he’s brought to our Open Library. 

Jon Robson has nearly single-handedly brought order to Open Library’s once sprawling front-end. In just a handful of weeks, Jon has re-organized over 20,000 lines of code and eliminated 1,000 unneeded lines in the process! He is the author and maintainer of Open Library’s Design Pattern Library — the one-stop resource for understanding Open Library’s front-end components. Jon brings with him a wealth of experience in nurturing communities and designing front-end systems that he has earned while leading mobile design efforts at Wikipedia. We all feel extremely lucky and grateful Jon is in on team Open Access! 

Tabish Shaikh is one of Open Library’s most dedicated Open Library contributors, attending community calls at 12am. He’s brought an infectious enthusiasm and passion to the project and has made major contributions, including leading a redesign of our website footer, designing a mobile login experience, making numerous front-end fixes with Jon, and helping with Hacktoberfest coordination.

Salman Shah was Open Library’s 2018 resident Google Summer of Coder and community evangelist. In addition to importing thousands of new book records into Open Library, he also has been a driving force in organizing Hacktoberfest and improving our documentation. He’s a key reason so many volunteers have flocked to Open Library to help make a lasting difference.

 

“On the internet nobody knows you’re a dog”. For several years, LeadSongDog has anonymously championed better experiences for our users, opening more than 40 issues and participating in discussion for twice that number. Few people have consistently poured their energy into improving Open Library — we’re so grateful and lucky for LeadSongDog’s librarian expertise and conviction.

 

Lisa Seaberg (@seabelis), is not only an amazingly prolific Open Librarian, but one of our trusted designers for the openlibrary.org website. Lisa fixed hundreds of Open Library book records, has redesigned our logo, and actively participates in design conversation within our github issues.

 

 

Tom Morris is one of our longest-time contributors of Open Library. He serves as a champion for high-quality metadata, linked data standards, and better search for our readers. Tom has been instrumental during our Community Calls, advising us to make the right decisions for our patrons.

 

 

Christian Clauss is leading the initiative to migrate Open Library to Python 3 by the end of 2019. He’s already made incredible progress towards this goal. Because of his work, Open Library will be more secure, faster, and easier to develop.

 

 

Gerard Meijssen, one of our liaisons from the wikidata community, has coordinated efforts which have helped Open Library merge over 90,000 duplicate authors in our openlibrary.org catalog. He has also been a champion for internationalization (i18n).

 

 

James Ford paved the way for further design progress on Open Library by consolidating tens of colors in our pallet to a manageable handful, and converting them to less css.

 

 

 

You can thank Maura Church for adding average star ratings and reading log summary statistics to all of our books:

 

 

 

Galen Mancino collaborated with the Open Library team on the Book Widget feature which you can read more about here! In addition to his love for books, Galen is passionate about sustainable and local economic growth, revitalization, and how technology can bring us there.

 

 

Oh hi, I’m mek.fyi. I feel extremely privileged to serve as a Citizen of the World for the Internet Archive’s Open Library community. In 2018, I contributed thousands of high-fives and hundreds of code reviews to support our amazing community. I’m proud to work with such a capable and passionate group of champions of open access. I’m hopeful, together, we can create a universal library, run by and for the people.

… And over 40 others including Num170r, html5cat, thefifthisa, linkel, GLBW, Alexis Rossi, Jessamyn West, et al who have no less significantly worked tirelessly to make Open Library an inclusive, safe, useful place where readers can thrive!

Thank you and here’s to a wonderful 2019!

Posted in Uncategorized | Leave a comment

Raising Crypto for the Greater Good

Open Library is raising 50 Ethereum (ETH) to get books our readers love! Chip-in and help us democratize our bookshelves for all.


If you donate now, WeTrust Spring will match your individual ETH donation 100% (until they’ve hit $100k), through Giving Tuesday, Nov. 27!


In 2006, Aaron Swartz founded Open Library with the vision of creating “one web page for every book ever published”. Over the last twelve years, a lot has changed. Open Library has matured not only into a book catalog spanning 25M editions and 16M unique works, but into a library initiative recognized by the state of California, under the auspices of the Internet Archive. Today, Open Library makes over 3M of Internet Archive’s digital books (2.3M public access, 800k modern borrowable) readable directly from your browser. Last year, over 1.3M books were lent to readers from openlibrary.org.

And we’re just getting started. The dream of an Open Library doesn’t end at cataloging the world’s books. Together, we have the opportunity to create a new type of library which works for its readers. To be a library of the people, by the people, and for the people. A library which democratizes the books on its shelves and empowers its readers to pursue knowledge and fuel their imaginations. But how do we get there?

As a first step, we needed some way for patrons to let us know what books they wanted in their library. In January of this year, Open Library announced a new Reading Log feature which allows readers to keep track of which books they’re reading and which books they wish we had available. Over the last 8 months, a quarter million users have been anonymously helping us identify over 400k books most desired by our community. Next comes the hard part: how can we get all these books for our readers? An answer came to us directly from of one of Aaron’s early presentations on Open Library — crowdfunding and direct democracy.

What if our patrons could help us purchase a collection of books for their library and make them available to the world through our lending library? What if, for starters, we crowdfunded just a single pallet of some of our most requested books, to be purchased and shipped in bulk, and then made lendable to an international audience on openlibrary.org? Something like a global, digital book-drive. And what better way for Open Library to accept donations than with cryptocurrency — decentralized digital currency?

Thanks to the help of a partner, we now have this chance. Starting in November, Open Library is fortunate to be one of a select group of nonprofits to be listed on WeTrust Spring, a platform whose motto is, “Raising Crypto for the Greater Good” and which helps nonprofits accept donations for their causes in cryptocurrency. Through this initiative, Open Library aims to raise 50 ETH (~$10,000 USD) which it can use to unlock a combination of books from Internet Archive’s wishlist and Open Library’s most requested works. We plan to release a blog post about our progress each month in 2019.

Book lovers, help us democratize Open Library for all:

Donate ETH now* or Learn more

*Have your individual ETH donation doubled by WeTrust Spring (until they’ve hit $100k), through Giving Tuesday, Nov. 27!

Don’t have Ethereum? You can also donate using credit card.

Posted in Fundraising, News | Comments closed

Google Summer of Code 2018

This is Internet Archive’s second year participating in Google Summer of Code, but for Open Library, it’s an exciting first. Open Library’s mission is to create, “a web page for every book” and this summer, we’re fortunate to team with Salman Shah to advance this mission. Salman’s Google Summer of Code roadmap aims to targets two core needs of openlibrary.org: modernizing and increasing the coverage of its book catalog and improving website reliability. 

Bots & Open Library

Every day, users contribute thousands of edits and improvements to Open Library’s book catalog. Anyone with an Open Library account can add a book record to the catalog if it doesn’t already exist. There’s also a great walkthrough on adding or editing data for existing book pages. Making edits manually can be tedious and so the majority of new book pages on Open Library are automatically created by Bots which have been programmed to perform specific tasks by our amazing community of developers and digital librarians. This month, Salman programmed two new bots. The first one is called ia-wishlist-bot. It makes sure an Open Library catalog record exists for each of the 1M books on the Internet Archive’s Wishlist, compiled by Chris Freeland and Matt Miller. The second bot, named onix-bot, takes book feeds (in a special format called ONIX) from our partners (e.g. Cory McCloud at Bibliometa), and makes sure the books exist in our catalog.

Importing Internet Archive Wishlist

Earlier this year, as part of the Open Libraries initiative, Chris Freeland, with the help of Matt Miller and others, compiled a Wishlist of hundreds of thousands of book recommendations for the Internet Archive to digitize:

“Our goal is to bring 4 million more books online, so that all digital learners have access to a great digital library on par with a major metropolitan public library system. We know we won’t be able to make this vision a reality alone, which is why we’re working with libraries, authors, and publishers to build a collaborative digital collection accessible to any library in the country.”

In support of this mission, the Open Library team decided it would be helpful if the metadata for these books were imported into the openlibrary.org catalog. 

Importing thousands of books in bulk into Open Library’s catalog presents several challenges. First, many precautions have to be taken to avoid adding duplicate book and author records to the database. To avoid the creation of duplicate records, Salman used the Open Library Book API to check for existing works by ISBN10, ISBN13, and OCLC identifiers. For this project, we were specifically interested in books which had no other editions on Open Library, so any time we noticed an existing edition for the same work, we skipped it. A second check used the Open Library Search API to check for any existing editions with a similar title and author. If there’s a plausible match, we don’t add it to Open Library. This process leaves us with a much shorter list of presumably unique works to add to Open Library.

Finding book covers for this new shortlist was the next challenge to overcome. These book covers typically come from an Open Library partner like Better World Books. Because Better World Books doesn’t have book covers for every book in our list, we had to be mindful that sometimes their service returns a default fallback image (which we had to detect). We wouldn’t want to add these placeholder images into Open Library’s catalog.

The last step is to make sure we’re not accidentally creating new Author records when we add our shortlist of books to Open Library. Even if we’ve taken precautions to ensure that a book with the same identifiers, title, and author doesn’t already exist doesn’t guarantee that the author isn’t already registered in our database. If they are, duplicating the author record would result in a negative and confusing user experience for readers searching for this author. We check to see if an author already exists on Open Library by using the Author search API and faceting on their name, as well as birth and death dates (where available in our shortlist).

In summary:

  • The Project started with 1 million books which were to be added to Open Library, out of those 1 million books.
  • A lot of these works were duplicates and already existed on Open Library and were merged on Open Library. The number of works that were left after this round were 255,276.
  • The parameters that were matched were ISBN, Title and Author Name and we were started with the top 1000 Open Library works which were added to Open Library. One example for one of the books that were added can be found here

An important output from this step was the standardization and generalization of our bot creation process.

Importing ONIX Records

In late 2017, one of our partners, Cory McCloud from Bibliometa, gifted Open Library access to tens of thousands of book metadata records in ONIX format:

ONIX for Books is an XML format for sharing bibliographic data pertaining to both traditional books and eBooks. It is the oldest of the three ONIX standards, and is widely implemented in the book trade in North America, Europe and increasingly in the Asia-Pacific region. It allows book and ebook publishers to create and manage a corpus of rich metadata about their products, and to exchange it with their customers (distributors and retailers) in a coherent, unambiguous, and largely automated manner.”

Many publishers use ONIX feeds to disseminate the metadata and prices of their books to partner vendors. Cory and his team thought Bibliometa’s ONIX records could be a great opportunity for synergy; to get publishers and authors increased exposure and recognition, and to improve the completeness and quality of Open Library’s catalog.

The steps for processing Bibliometa’s ONIX records is similar to importing books from the Internet Archive Wishlist, especially the steps for ensuring we weren’t creating duplicate records in Open Library. At the same time, the task of determining which authors already exist and which need to be created in the catalog was exacerbated by the fact that fewer birth and death dates were available, greatly reducing our confidence in author searching & matching. In other ways, creating an ONIX import pipeline was simplified by our earlier efforts which had established key conventions for how new bots may be created using the openlibrary-bots repository. Additionally, our ONIX feeds have the advantage of coming with book covers whereas we had to manually source book covers for items in the wishlist. 

The first step towards adding these records to Open Library was to write a parser to convert these ONIX feeds into a format which Open Library can understand.  . Open Library did have an ONIX Parser and Import Script written by the co-founder of Open Library, Aaron Swartz who had written the initial script to parse ONIX Records and add them to the Open Library Database. Like much of Open Library’s scripts, this code was in Python 2.7, encoded a much earlier version of the ONIX specification, and made use of a very old xml parser which was difficult to extend. Unfortunately, we couldn’t find any drop-in python replacements for the ONIX parser on github. These factors motivated rolling our own new ONIX parser.

To start with Salman received a dump of ~70,000 ONIX records from bibliometa to be evaluated for import into Open Library. There were two checks that were implemented for this procedure:

  1. Checking if there was an existing ISBN-10 or ISBN-13 for that particular work on Open Library using the Open Library Client.
  2. Matching via Title or Author and see if the record exists on Open Library or not via an API Call.

While much of the ONIX parser is complete, the ONIX Bot project is still in development.

A Guide on Writing Bots

Interested in writing your own Open Library Bot? For more information on how to make an Open Library Bot and their capabilities, please consult our documentation. The basic steps are:

  1. Apply for a Bot Account on Open Library by contacting the Open Library Maintainer and obtain a bot account. A good way to do this is to respond to this issue on github.
  2. After registering a bot account and having it approved, you can write a bot by extending the openlibrary-client to add accomplish tasks like adding new works to Open Library. You can refer to the openlibrary-client examples.
  3. All bots that add works to Open Library have to be added, are added to the Open Library Bots Repository on Github. Every bot has its own directory with a README containing instructions on how to reproducibly run the bot. Each bot should also link to a corresponding directory within the openlibrary-bots archive.org item where the outputs of the bot may be stored for provenance.

Next Steps: Provisioning

Unfortunately, there wasn’t enough time during the GSoC program to complete all three phrases of our roadmap (Wishlist, ONIX, and Provisioning). The objective of the third phase of our plan was to make Open Library deployment more robust and reliable using Docker and Ansible. Docker has been a discussion point of several Open Library Community Calls and has catalyzed the creation of a docker branch on the Open Library Github Repository which addresses some of the basic use cases outlined in the GSoC proposal. One important outcome is the identification of concrete steps and recommendations which the community can implement to improve Open Library’s provisioning process:

  • Switch from Docker to Docker Compose: Currently the Docker branch uses single Docker files to manage the dependencies for Docker. The goal is to use a single docker-compose file which will manage all services being used.
  • Switch Open Library to use Ansible (a software that automates software provisioning, configuration management, and application deployment). Have a Production as well as a Development Playbook. Playbooks are Ansible’s configuration, deployment, and orchestration language. They can describe a policy you want your remote systems to enforce, or a set of steps in a general IT process.  
  • Use Ansible Vault which is a feature of ansible that allows keeping sensitive data such as passwords or keys in encrypted files. This will replace the current system of having a olsystem.

Retrospective

In retrospect, Google Summer of Code 2018 has resulted in thousands of new books being added to the Open Library catalog. Conventions were established both to streamline and make it easier for others to create new bots in the future and to continue and extend this summer’s work.

Some of the key points that we overlooked while going drafting the proposal were as follows:

  1. Checking whether a book exists on Open Library or not is hard. We started with a simple Title match and ended up with formatting the title, formatting the authors to ensure no new author objects are created, making changes to the code to ensure it doesn’t break when there are no authors for a work in our data.
  2. Improving the openlibrary-client as well as documenting it extensively to ensure that future developers don’t have to go through the code to understand what that particular function ends up doing and how it can be used.
  3. Setting up a structure for the openlibrary-bots directory to ensure future developers are easily able to find the required code they need if they are writing their own bot.
  4. Assuming that data would be perfect and it was a matter of copy-pasting, but in reality, Salman and Mek had to go through the data to understand where the code broke because of various reasons like having a ‘,’(comma) in the string and so on.

One learning we obtained from participating in GSoC for the first time is that we may have been better off focusing on two instead of three work deliverables. By the end of the program, we didn’t have enough time for our third phase, even though we were proud of the progress we made. On the flip side, because of discussions catalyzed during our community calls and suggestions outlined in our GSoC proposal, there is now ongoing community progress on this final phase — dockerization of Open Library — which can be found here.

A major win of this GSoC project is that the project’s complexity necessitated Salman explore writing test cases for the first time and provided first hand experience as to the importance of a test harness in developing an end to end data processing pipeline.

 Three of our biggest objective key results during this program were:

  1. Quality assuring and updating the documentation of the openlibrary-client tool to support future developers.
  2. Creating a new `openlibrary-bots` repository with documentation and processes to ensure that there is a standard way to add future bots moving forward. And also making sure our Wishlist and ONIX bot processes are well documented with results which are reproducible.
  3. Adding thousands of new modern books to the Open Library catalog

Project Links

  1. Open Library Client – https://github.com/internetarchive/openlibrary-client
  2. Open Library Bots (IA Wishlist Bot) – https://github.com/internetarchive/openlibrary-bots/tree/master/ia-wishlist-bot
  3. Open Library Bots (ONIX Bot) – https://github.com/internetarchive/openlibrary-bots/tree/master/onix-bot
  4. Docker (In Progress) – https://github.com/internetarchive/openlibrary/tree/docker
Posted in Uncategorized | Comments closed

Search Full-Text within 4M+ Books

Open Library now lets you search inside the text contents of over 4M books!

A Full-Text Search for “thanks for all the fish” on openlibrary.org

What’s Full-Text Search?

Many book websites, like Amazon and Goodreads, give you the ability to search for books by title and author, but they don’t make it easy to find books based on their contents. This type of searching is called “Full-Text Search”.

Try searching for “brewster kahle alexa internet” on Goodreads or Amazon:

A search for “brewster kahle alexa internet” on goodreads

A search for “brewster kahle alexa internet” on amazon books

Have you ever heard a quote and wished you could figure out which book it came from? Open Library full-text search gives readers the ability to locate books which reference any snippet of text like, “Let every thing have its place“:

A full-text search on openlibrary.org of “let every thing have its place”

Full-Text Search on Archive.org

I’ve been surprised to learn how many people didn’t know that Archive.org has had full-text search for several years — and its really powerful! In 2016, Giovanni Damiola (@giovannidamiola) led a major overhaul of Internet Archive’s full-text search system and unlocked the ability for users to perform full-text searches across almost 40M unique text documents — from patents, to yearbooks, to open-access research papers.

How to activate Full-Text Search mode on Archive.org

 

Full-Text Search of the quote “let every thing have its place” on Archive.org

Open Library Full-Text Search

When you search across 40M documents, it can be a challenge to find the one you’re looking for. One feature which Open Library has been missing is a way to limit Internet Archive’s full-text search to only include results from books on Open Library. So for the last two years, Open Library has patiently waited to take full advantage of full-text search for its users.

Earlier this week, Gio released an improvement to our full-text search engine which lets us get around this historical limitation — and so we jumped on this opportunity to improve our search on openlibrary.org! With the help of Razzi Abuissa, Open Library volunteer, and Mek, Open Library’s project lead, you can now search inside more than 4M Open Library books.

Try a Full-Text Search

Thanks for all the fish! …Wait, what book was that from again?

 

Posted in Search | Comments closed

Star Ratings are Here!

Over the last six months, more than 145,000 of you have tracked which books you want-to-read. Now you can record how you feel about the books you’ve finished reading using star ratings!

Next time you’re on a book page, you’ll see 5-stars beneath the book cover. By clicking one of the 5 stars, you can select the corresponding rating for this book. Your ratings are private by default, though we do intend to offer an option for making your ratings public. Also, while it’s not finished yet, we are working on adding average star ratings to our books pages so you can learn how the community feels about different titles.

We hope you enjoy this new feature as much as we are!

Have ideas or feedback for us? Let us know on twitter!

Posted in Uncategorized | Comments closed
  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives