To the World: Introducing Brad Rubenstein

by Mek & Pallavi Devaraj

This is the first installment of an interview series called, “To the World” which goes behind the scenes to explore what inspires authors to write and share their work with the world.


In this interview, we receive a master class on effective project management by Brad Rubenstein, co-author of Risk Up Front.

Introducing Brad Rubenstein

Image result for Brad RubensteinBrad Rubenstein has been, at times, a software developer and systems architect, a classical musician, a project manager, and a theatre and indie film producer. Most recently, he is co-author, with Adam Josephs, of the project management bible, Risk Up Front: Managing Projects in a Complex World (RUF). RUF is a treasure trove of resources Brad and Adam produced as a reference guide for their clients at Celerity Consulting, a boutique project management consulting firm. In his free cycles, Brad volunteers as an advisor for Open Library, where his book has been put to use countless times.

Growing up in Southeast Asian, Brad describes himself as a third culture kid.

“My childhood was in Thailand and Indonesia”, Brad says, “I came back to the United States to study computer science, just as Silicon Valley was taking off.”

After finishing a PhD at U.C. Berkeley, Brad cut his teeth in Silicon Valley as an early employee at Sun Microsystems. In the ’90s, he moved to New York to design and develop the distributed trading and risk management infrastructure employed by Goldman Sachs as they automated their operations. Since leaving Goldman in 2000, Brad transitioned into a project coaching and team building role as a co-founder of Celerity consulting where he has spent nearly two decades sharing his expertise with a wide variety of for-profit and non-profit organizations on how to get things done.

“Nothing teaches you how to run complex projects better than actually running complex projects.”

So, you’ve done projects all your life and for the first time you are being asked to do a project which is bigger than a single person can do. Maybe the solution is not obvious. It’s too complex to keep in your head or it requires you to make difficult choices.

One of the most important assets a project manager can bring to the table is wisdom and experience from having gone through many projects and seeing where things went wrong or right. But someone who is new to project management may not have these first hand experiences to call upon. The first question many of us have is, “Where do I start?”

“Teams are much more comfortable solving the problems they know how to solve.”

Many of us instinctively start by tackling the problems we know how to solve and trying to build momentum. Instead of letting ourselves get blocked, there is an impulse to move fast and break things; to follow the path of least resistance and get something working. Maybe this means working on the easy stuff first, or getting the difficult stuff out of the way, or maybe even ramping up on the most interesting stuff.

“It’s unnatural, but critical,” Brad advises, “for project teams to talk about how their project might get derailed and proactively mitigate those risks early, rather than letting them strike later when changes are most expensive. Although projects seem to have a logical order in which it makes sense to get the pieces done, we encourage teams to focus on doing the risky parts first, which sometimes takes a bit of creativity (and scaffolding).”

“You can’t do a risk-reward trade-off if the risks are hidden in your blind spot.”

Many teams, especially startups in Silicon Valley, feel pressure to move quickly and are more open to taking risks. Brad suggests this may be fallacy:

“When people talk about risk and reward, there is a presupposition that one understands what the risks are in order to take them. The Risk Up Front method focuses on helping teams become aware of the risks in the first place, so they can make sensible decisions about how to prioritize and mitigate them. As you come up with a list of risks articulated in a way that moves the team to take action, you might choose to leave them and place your bets. The trouble is when teams are blindsided by an expensive risk late in the game that they wouldn’t have taken had they known about it earlier.”

“The fact that an individual knows something is different than a team knowing something.”

We asked Brad how someone who is new to project management is supposed to magically be able to anticipate what could go wrong on a project.

“There’s a better question a project manager in this situation might ask”, Brad says, “How do I create the space in my team to reliably surface the things that might go wrong? This space does not happen by accident. You have to make it happen.”

Whether you’re new to project management or a veteran, Brad emphasizes the value of leveraging the experience and expertise of ones teammates. The wisdom of the whole team is greater than the sum of its parts. So what does Risk Up Front look like and how does it happen?

Brad explains, “Many engineers may be familiar with the concept of an Agile stand-up. In Risk Up Front we have Weekly Accountability Meetings (WAM) where the project’s full cross-functional team accounts for their progress, commits to what they’ll do next week, and matches this with what they’ve finished, so the team can ensure commitments are being kept and forward progress is being made.”

A key element of the WAM is a Risk Review where a crucial question is posed to each member of the team in turn:

“What is the most likely reason this project will fail?”

Going around on a regular basis and asking this one question creates the space to uncover hidden issues and moves the team to act.”, Brad says. “A central part of RUF is making sure the right questions get asked.”

“If your project has too many risks, it’s time to evaluate your Project Definition.”

We wondered what advice Brad would give to a team if they discover their project has too many high risks:

“In Risk Up Front, projects begin with a series of Definition Meetings that force the team to arrive at a mutual agreement as to how they will trade off the 5 W’s — who, what, why, when, and why-not (i.e. the risks) of the project, in order to settle on a project that the team can commit to deliver. A lot bends during these discussions. You might change what you decide to build, change who you build it for. All of these choices will have consequences and risk profiles. Perhaps building for one audience means building fewer components which might mean less risk. The outcome of such a meeting is a project definition statement whose risks your team is comfortable accepting.”

“Are you afraid something is going to go wrong?”

Something we imagine Brad gets asked frequently is, what are early warning signs, before a disaster occurs, that might clue someone in that they need his help? Brad tells us that Celerity is commonly brought into a project after a big disaster.

“People who are new to managing big projects may not know they need something like Risk Up Front and thus don’t call us at the beginning. Looking on the bright side, one advantage of this is that teams in this situation are often malleable and willing to put new processes in place after something has gone awry.”

Brad mentions that recognizing small process failures early on, such as not rigorously following through on commitments, falling behind schedule, or experiencing integration issues of the work product of teammates, are great opportunities to use techniques from the book to course correct.

“4 principles, 4 pieces of paper, and 2 meetings.”

Risk Up Front includes 4 documents with instructions, including a Risk Action Plan, to help teams methodically surface risks out of blind-spots through targeted conversation.

Brad notes, “It is the process of negotiating the elements of these documents within the team that is valuable, not the documents themselves. In a sense, the paperwork is the booby prize. We want to keep it as simple as possible.”

Successful projects depend more on your team’s behavior than on their project tools.”

There are a lot of books about project management out there. We wondered what motivated Brad and Adam to write Risk Up Front.

Although certain tools and methods are more appropriate for certain types of projects and teams, we repeatedly see projects with first-class tools and methods get into trouble. On the flip side, we’ve seen complex projects succeed with the barest of project management tools — often no more than a few spreadsheets, lists, and documents.
p. xvi Risk Up Front

Image result for risk up front“The prosaic reason is that it makes teaching our method to teams a lot easier (they can use the book as a reference), and my favorite work is with the teams themselves. But we do get calls from individuals in far-off places telling us they’re using RUF for projects that have never crossed our paths before, which is very gratifying.”

Before writing “Risk Up Front”, we wondered if there was a book Brad turned to for answers.

“When I was in college, I had to facilitate large and boisterous board meetings run according to Roberts Rules of Order, and that was an education. At the time, I read How to Make Meetings Work (Michael Doyle, David Strauss, 1976), and that made a big impression.  The skills I learned about facilitating meetings, helping people in the meeting actually get something done, have served me well.”

“Every project is different in the details, but people are people”

We ask Brad, if he were to invest time in another guide, if he would go deeper into risk mitigation or focus on demystifying another complimentary area of project management:

“The book [RUF] is very good at laying out the philosophy and why it hangs together. There’s a difference between presenting theory and facilitating practice. For example, we describe in detail how to do a weekly accountability meeting and yet it’s not easy for people having read the book to run a weekly accountability meeting because it takes practice and thought to line everything up and turn it into a routine. I think simplified checklists of instructions and suggestions would make it easier for readers to use our documents and put the theory of Risk Up Front into practice.”

The simple but powerful idea of formalizing a project management methodology into easily applicable checklists is an idea which is celebrated in Atul Gawande’s Checklist Manifesto, and a prospect that is very exciting to the Open Library Team.

Risk Up Front & Open Library

We ask Brad how he became an advisor to the Open Library project:

“I met Mek Karpeles, who heads up the Open Library project, earlier in 2019, at a Grand Re-opening of the Public Domain celebration hosted by the Internet Archive in San Francisco. He was one of the presenters. I had just finished my book, and was exploring what was next for me. He was looking for someone to help get the process of wrangling their open source development workflow, along with a group of enthusiastic volunteers, running more smoothly. Mek had big plans for new features at Open Library, including a Book Sponsorship project, that would allow Open Library users to help underwrite the acquisition of new books that others could borrow directly from their web site. I was fascinated by it, and it seemed like a fun and interesting place to plug in. Mek offered to scan in a copy of Risk Up Front himself (in return for my autograph on the frontispiece), and add it to the library.”

You can read more about Open Library’s Book Sponsorship program on Open Library’s blog https://blog.openlibrary.org/2019/10/23/scan-on-demand-building-the-worlds-open-library-together/ or learn how to participate here: https://openlibrary.org/sponsorship.

“I am a big fan of libraries, and openlibrary.org is the biggest library.  So I’m thrilled [for my work to be available on the website].”

Brad remarks that he is drawn to the importance of Open Library’s long term vision, “[…] to make all the published works of humankind available to everyone in the world.” A vision he hopes will long outlast his work, and even those of the current core team. In order for this to be true, Brad believes the right architectures and a culture need to be in place to keep patrons and contributors excited about their progress toward this vision. Open Library is also a great case study of “Risk Up Front” and an opportunity for Brad to apply the lessons of his book directly to help a team faced with big decisions and challenges. For instance, like meeting the difficulties of coordinating a geographically distributed team like Open Library’s.

We asked Mek Karpeles, Open Library’s program lead and Brad’s mentee, if he’s found Brad’s book applicable to problems Open Library faces:

For me and Open Library, I have to admit, it was as simple as, “how do I run a meeting?”. How many times have I taken notes at a meeting and never opened them up again? This was a clear indicator that my notes or meetings are not structured correctly, but I had never asked why. As our advisor, Brad directed us to ask fundamental questions we had taken for granted. Like, “Who are these meeting minutes for and what purpose do they serve?”.

Each of our community calls now starts with short updates from each teammate. As soon as a point seems too long, we add it to a list of Open Mic topics we’ll discuss after the updates are finished. And we have a section in our notes dedicated to decisions and action items. Brad’s book has a great template for understanding what an action item is; what will be delivered, who owns it, and by when (explicitly not how). We take our notes in google docs and use the comment feature to assign each decision to a member of our team. Sometimes, these involve a next step of creating issues on our GitHub bug tracker. Brad’s book helped us keep our weekly meeting useful.

The Risk Up Front book includes a handful of Project Documents and templates which represented hidden opportunities for us. The ‘Team List” document — having an inventory of stakeholders and the projects they lead — in particular was a big win for us. The exercise helped us realize the benefits of delegation by encouraging members of the community to publicly step up and commit to owning parts of the project, and at the same time helping community members know where to direct their questions. We have our Team List posted publicly on our wiki.

Beyond the Book

Outside of being an expert project manager and an author, we wondered how Brad would like to be remembered:

“I don’t want to bill myself these days as an expert project manager (any more than the coach of the basketball team is necessarily the best basketball player). I have fallen into the coaching role, which helps me develop my skills in listening and noticing what stands in the way of people being great, along with some strategies for helping people get that stuff out of their way. Many folks do that with individuals – I’m focused on doing it for groups or teams. My passion is for getting things that are stuck, unstuck. In the unlikely event that I’m going to become famous, I guess I’d hope to be famous for that.”

As a final question, we asked Brad, what is one thing you wish more of the world knew or thought about more? And why is this important to you or us?

“Tough question. Groups of people can accomplish great things that no single person can accomplish, and they can also spin horribly out of control in ways an individual never would. In this particular moment, it seems valuable to keep both these things in mind.”

Thank you

Open Library offers a deep thank you to Brad Rubenstein for his contributions to Open Library, his commitment to sharing knowledge, and for his patience answering our questions.

If you liked these tips and are interested in putting Brad and Adam’s process into practice, pick up a copy of Risk Up Front and let us know what you think by tweeting @RiskUpFront.

Want to hear more from Brad? Check out this other interview on Green Planet Blue Planet:

Posted in Uncategorized | Leave a comment

Scan On Demand: Building the World’s Open Library, Together

By Omar Rafik El-Sabrout & Mek

Earlier this week, Open Library’s Mek Karpeles, Internet Archive Summer of Code fellow Tabish Shaikh, and members of the community announced the launch of a new Book Sponsorship program which, boingboing.net explains, “lets you direct a cash donation to pay for the purchase and scanning of any books. In return, you are first in line to check that book out when it is available, and then anyone who holds an Open Library library card can check it out.”

So far, the program has been met by enthusiasm by readers and authors who are eager to play a role in shaping the world’s largest online digital library.

One generous reader, Tom in Yokohama, Japan, explains why he choose to sponsor a book:

“I saw the blog post about sponsoring books and I thought it was a wonderful idea. The book I sponsored is one I enjoyed as a child. I’m not likely to read it again, but I am happy to make it available to others who might want to read it. (Several other books in the same series are checked out, so there must be interest!)”

Author, VM (Vicky) Brasseur, went so far as to make sure we received a signed copy of her work with a heartwarming message to her readers.

Other authors were quick to join on board, going so far as to offer sponsoring their own works for posterity.

The news was even touted by one of our favorite popular science fiction authors, Cory Doctorow:

Calculating the True Value of A Library that is Free

From an article posted by Omar Rafik El-Sabrout at http://blog.archive.org/2019/10/22/calculating-the-true-value-of-a-library-that-is-free/

We live in the era of Venmo and CashApp, when after a nice meal with friends, you no longer have to argue over who will pick up the bill. On the surface, this is an extremely promising way to keep people from accidentally going into debt with each other. But it also reinforces interactions that are extremely transactional. The old idea of “I’ll get you back next time” is part of the give and take that members of a close community engage in. In our transactional present, people don’t have to rely on the idea of trust–trusting the butcher at the farmer’s market won’t price gouge me, trusting my friend will pay me back. People aren’t learning that you can vote by caring, by putting your money behind something that matters to you. At a moment when “you get what you pay for” is the capitalist norm, enter the Internet Archive, which today is asking you to make an investment in community-wide sharing.


A new program at OpenLibrary.org encourages you to “put your money behind something that matters to you:” sponsoring a book so everyone can read and borrow it online for free.

The Internet Archive, which runs the Open Library project, is working to create a vast network of online book lending in order to make all books accessible to all people. Open Library cares about the input of its readers. As Open librarian and Internet Archive Software Engineer Mek Karpeles describes, “Open Library’s theory is that readers deserve a say in what’s on their bookshelves,” which is why he and his team have created a new Book Sponsorship feature.

A blue box on the book page lets you know that this is a book you can sponsor. With your donation, we will buy the book, digitize it, store it, and make the ebook available for borrowing–first by you.

Founded on the idea that a library ought to have books that “reflect [a] community’s needs and values,” Book Sponsorship allows any of the more than two and a half-million users of Open Library to #saveabook. This is a natural follow-up to the long standing “Want to Read” functionality whereby a reader can indicate a book is missing from the Archive that they wish to read.

You can contribute just $11.32 to make sure this book from Marley Dias’ #1000BlackGirlBooks list is available for all.

With our new book sponsorship program, readers are given the option to put money towards directly sponsoring the acquisition of a particular book, after which the Internet Archive will digitize, store, and make the ebook available for lending–for free. Among other possibilities, this would allow people to combat the lack of representation of young black protagonists that Marley Dias, creator of the #1000BlackGirlBooks, found at her school and local library. We currently feature almost 400 of the #1000BlackGirlBooks on archive.org and with your support, we can buy and digitize all of them.

When people are given the opportunity to be generous in an obligation-free way, we find that typically brings out their desire to do good.

By giving people a say and making them feel represented, they become more invested. The care that comes from the investment of individuals is what eventually creates a community, and our hope is that the Open Library community will use this feature to help disenfranchised patrons gain access to materials that would enrich their education. When people are given the opportunity to be generous in an obligation-free way, we find that typically brings out their desire to do good. It’s relatively easy to put a price on a book, to calculate printing costs and publishing costs, but what’s harder to determine is the value of giving a gift. If you’re interested in sponsoring a book, either for yourself or for someone else, just click on a Sponsor an eBook button or visit https://openlibrary.org/sponsorship to learn more.

Go to https://openlibrary.org/sponsorship to lear more about how to #saveabook
Posted in Uncategorized | Leave a comment

2018 A Year of Victories!

Happy holidays & Happy New Year, readers! We are thrilled to announce 2018 has been an unprecedented year for openlibrary.org and a great time to be a book-lover. Without skipping a beat, we can honestly say we owe our progress to you, our dedicated community of volunteer developers, designers, and librarians. We hope you’ll join us in celebrating as we recap our 2018 achievements:

Highlighted Victories

New Features

Teamwork Makes the Dream Work

In 2018, 45 members of our community helped fix over 300 issues, contributing over 100,000 lines of code improvements to openlibrary.org and eliminating 95,000 lines of old code.

October was an especially monumental month for our community. Thanks to the organizational efforts of Salman Shah and Tabish Shaikh, Open Library participated in the Hacktoberfest challenge, attracting attention and interest from all around the globe. During this period, 22 members of our community submitted 125 bug fixes and improvements.

The Faces of Open Library

Of the many deserving, we’re proud to feature Charles Horn for his contributions to our Open Library. Charles dedicated three years volunteering as a core developer on openlibrary.org before enthusiastically joining Internet Archive as a full-time staff member this year. Charles has written bots responsible for correcting catalog data for millions of books and tens of thousands of authors. Not only has Charles been a foundational member of the community, running stand-ups and performing code reviews, he’s also designed technology which allows us to fight spam and has designed plumbing which allows millions of new book records to flow into our catalog.

Drini Cami sprung into action during a time when the Open Library’s future was most uncertain and he has left an enormous impact. Drini has written mission critical code to improve our search systems, he’s written code to merge catalog records, fixed thousands of records, worked on linking Open Library records to Wikidata, repaired our Docker build on countless occasions, and has been a critical adviser towards making sure we make the right decisions for our users. We can’t speak highly enough about Drini and our gratitude for the positive energy he’s brought to our Open Library. 

Jon Robson has nearly single-handedly brought order to Open Library’s once sprawling front-end. In just a handful of weeks, Jon has re-organized over 20,000 lines of code and eliminated 1,000 unneeded lines in the process! He is the author and maintainer of Open Library’s Design Pattern Library — the one-stop resource for understanding Open Library’s front-end components. Jon brings with him a wealth of experience in nurturing communities and designing front-end systems that he has earned while leading mobile design efforts at Wikipedia. We all feel extremely lucky and grateful Jon is in on team Open Access! 

Tabish Shaikh is one of Open Library’s most dedicated Open Library contributors, attending community calls at 12am. He’s brought an infectious enthusiasm and passion to the project and has made major contributions, including leading a redesign of our website footer, designing a mobile login experience, making numerous front-end fixes with Jon, and helping with Hacktoberfest coordination.

Salman Shah was Open Library’s 2018 resident Google Summer of Coder and community evangelist. In addition to importing thousands of new book records into Open Library, he also has been a driving force in organizing Hacktoberfest and improving our documentation. He’s a key reason so many volunteers have flocked to Open Library to help make a lasting difference.

 

“On the internet nobody knows you’re a dog”. For several years, LeadSongDog has anonymously championed better experiences for our users, opening more than 40 issues and participating in discussion for twice that number. Few people have consistently poured their energy into improving Open Library — we’re so grateful and lucky for LeadSongDog’s librarian expertise and conviction.

 

Lisa Seaberg (@seabelis), is not only an amazingly prolific Open Librarian, but one of our trusted designers for the openlibrary.org website. Lisa fixed hundreds of Open Library book records, has redesigned our logo, and actively participates in design conversation within our github issues.

 

 

Tom Morris is one of our longest-time contributors of Open Library. He serves as a champion for high-quality metadata, linked data standards, and better search for our readers. Tom has been instrumental during our Community Calls, advising us to make the right decisions for our patrons.

 

 

Christian Clauss is leading the initiative to migrate Open Library to Python 3 by the end of 2019. He’s already made incredible progress towards this goal. Because of his work, Open Library will be more secure, faster, and easier to develop.

 

 

Gerard Meijssen, one of our liaisons from the wikidata community, has coordinated efforts which have helped Open Library merge over 90,000 duplicate authors in our openlibrary.org catalog. He has also been a champion for internationalization (i18n).

 

 

James Ford paved the way for further design progress on Open Library by consolidating tens of colors in our pallet to a manageable handful, and converting them to less css.

 

 

 

You can thank Maura Church for adding average star ratings and reading log summary statistics to all of our books:

 

 

 

Galen Mancino collaborated with the Open Library team on the Book Widget feature which you can read more about here! In addition to his love for books, Galen is passionate about sustainable and local economic growth, revitalization, and how technology can bring us there.

 

 

Oh hi, I’m mek.fyi. I feel extremely privileged to serve as a Citizen of the World for the Internet Archive’s Open Library community. In 2018, I contributed thousands of high-fives and hundreds of code reviews to support our amazing community. I’m proud to work with such a capable and passionate group of champions of open access. I’m hopeful, together, we can create a universal library, run by and for the people.

… And over 40 others including Num170r, html5cat, thefifthisa, linkel, GLBW, Alexis Rossi, Jessamyn West, et al who have no less significantly worked tirelessly to make Open Library an inclusive, safe, useful place where readers can thrive!

Thank you and here’s to a wonderful 2019!

Posted in Uncategorized | Comments closed

Raising Crypto for the Greater Good

Open Library is raising 50 Ethereum (ETH) to get books our readers love! Chip-in and help us democratize our bookshelves for all.


If you donate now, WeTrust Spring will match your individual ETH donation 100% (until they’ve hit $100k), through Giving Tuesday, Nov. 27!


In 2006, Aaron Swartz founded Open Library with the vision of creating “one web page for every book ever published”. Over the last twelve years, a lot has changed. Open Library has matured not only into a book catalog spanning 25M editions and 16M unique works, but into a library initiative recognized by the state of California, under the auspices of the Internet Archive. Today, Open Library makes over 3M of Internet Archive’s digital books (2.3M public access, 800k modern borrowable) readable directly from your browser. Last year, over 1.3M books were lent to readers from openlibrary.org.

And we’re just getting started. The dream of an Open Library doesn’t end at cataloging the world’s books. Together, we have the opportunity to create a new type of library which works for its readers. To be a library of the people, by the people, and for the people. A library which democratizes the books on its shelves and empowers its readers to pursue knowledge and fuel their imaginations. But how do we get there?

As a first step, we needed some way for patrons to let us know what books they wanted in their library. In January of this year, Open Library announced a new Reading Log feature which allows readers to keep track of which books they’re reading and which books they wish we had available. Over the last 8 months, a quarter million users have been anonymously helping us identify over 400k books most desired by our community. Next comes the hard part: how can we get all these books for our readers? An answer came to us directly from of one of Aaron’s early presentations on Open Library — crowdfunding and direct democracy.

What if our patrons could help us purchase a collection of books for their library and make them available to the world through our lending library? What if, for starters, we crowdfunded just a single pallet of some of our most requested books, to be purchased and shipped in bulk, and then made lendable to an international audience on openlibrary.org? Something like a global, digital book-drive. And what better way for Open Library to accept donations than with cryptocurrency — decentralized digital currency?

Thanks to the help of a partner, we now have this chance. Starting in November, Open Library is fortunate to be one of a select group of nonprofits to be listed on WeTrust Spring, a platform whose motto is, “Raising Crypto for the Greater Good” and which helps nonprofits accept donations for their causes in cryptocurrency. Through this initiative, Open Library aims to raise 50 ETH (~$10,000 USD) which it can use to unlock a combination of books from Internet Archive’s wishlist and Open Library’s most requested works. We plan to release a blog post about our progress each month in 2019.

Book lovers, help us democratize Open Library for all:

Donate ETH now* or Learn more

*Have your individual ETH donation doubled by WeTrust Spring (until they’ve hit $100k), through Giving Tuesday, Nov. 27!

Don’t have Ethereum? You can also donate using credit card.

Posted in Fundraising, News | Comments closed

Google Summer of Code 2018

This is Internet Archive’s second year participating in Google Summer of Code, but for Open Library, it’s an exciting first. Open Library’s mission is to create, “a web page for every book” and this summer, we’re fortunate to team with Salman Shah to advance this mission. Salman’s Google Summer of Code roadmap aims to targets two core needs of openlibrary.org: modernizing and increasing the coverage of its book catalog and improving website reliability. 

Bots & Open Library

Every day, users contribute thousands of edits and improvements to Open Library’s book catalog. Anyone with an Open Library account can add a book record to the catalog if it doesn’t already exist. There’s also a great walkthrough on adding or editing data for existing book pages. Making edits manually can be tedious and so the majority of new book pages on Open Library are automatically created by Bots which have been programmed to perform specific tasks by our amazing community of developers and digital librarians. This month, Salman programmed two new bots. The first one is called ia-wishlist-bot. It makes sure an Open Library catalog record exists for each of the 1M books on the Internet Archive’s Wishlist, compiled by Chris Freeland and Matt Miller. The second bot, named onix-bot, takes book feeds (in a special format called ONIX) from our partners (e.g. Cory McCloud at Bibliometa), and makes sure the books exist in our catalog.

Importing Internet Archive Wishlist

Earlier this year, as part of the Open Libraries initiative, Chris Freeland, with the help of Matt Miller and others, compiled a Wishlist of hundreds of thousands of book recommendations for the Internet Archive to digitize:

“Our goal is to bring 4 million more books online, so that all digital learners have access to a great digital library on par with a major metropolitan public library system. We know we won’t be able to make this vision a reality alone, which is why we’re working with libraries, authors, and publishers to build a collaborative digital collection accessible to any library in the country.”

In support of this mission, the Open Library team decided it would be helpful if the metadata for these books were imported into the openlibrary.org catalog. 

Importing thousands of books in bulk into Open Library’s catalog presents several challenges. First, many precautions have to be taken to avoid adding duplicate book and author records to the database. To avoid the creation of duplicate records, Salman used the Open Library Book API to check for existing works by ISBN10, ISBN13, and OCLC identifiers. For this project, we were specifically interested in books which had no other editions on Open Library, so any time we noticed an existing edition for the same work, we skipped it. A second check used the Open Library Search API to check for any existing editions with a similar title and author. If there’s a plausible match, we don’t add it to Open Library. This process leaves us with a much shorter list of presumably unique works to add to Open Library.

Finding book covers for this new shortlist was the next challenge to overcome. These book covers typically come from an Open Library partner like Better World Books. Because Better World Books doesn’t have book covers for every book in our list, we had to be mindful that sometimes their service returns a default fallback image (which we had to detect). We wouldn’t want to add these placeholder images into Open Library’s catalog.

The last step is to make sure we’re not accidentally creating new Author records when we add our shortlist of books to Open Library. Even if we’ve taken precautions to ensure that a book with the same identifiers, title, and author doesn’t already exist doesn’t guarantee that the author isn’t already registered in our database. If they are, duplicating the author record would result in a negative and confusing user experience for readers searching for this author. We check to see if an author already exists on Open Library by using the Author search API and faceting on their name, as well as birth and death dates (where available in our shortlist).

In summary:

  • The Project started with 1 million books which were to be added to Open Library, out of those 1 million books.
  • A lot of these works were duplicates and already existed on Open Library and were merged on Open Library. The number of works that were left after this round were 255,276.
  • The parameters that were matched were ISBN, Title and Author Name and we were started with the top 1000 Open Library works which were added to Open Library. One example for one of the books that were added can be found here

An important output from this step was the standardization and generalization of our bot creation process.

Importing ONIX Records

In late 2017, one of our partners, Cory McCloud from Bibliometa, gifted Open Library access to tens of thousands of book metadata records in ONIX format:

ONIX for Books is an XML format for sharing bibliographic data pertaining to both traditional books and eBooks. It is the oldest of the three ONIX standards, and is widely implemented in the book trade in North America, Europe and increasingly in the Asia-Pacific region. It allows book and ebook publishers to create and manage a corpus of rich metadata about their products, and to exchange it with their customers (distributors and retailers) in a coherent, unambiguous, and largely automated manner.”

Many publishers use ONIX feeds to disseminate the metadata and prices of their books to partner vendors. Cory and his team thought Bibliometa’s ONIX records could be a great opportunity for synergy; to get publishers and authors increased exposure and recognition, and to improve the completeness and quality of Open Library’s catalog.

The steps for processing Bibliometa’s ONIX records is similar to importing books from the Internet Archive Wishlist, especially the steps for ensuring we weren’t creating duplicate records in Open Library. At the same time, the task of determining which authors already exist and which need to be created in the catalog was exacerbated by the fact that fewer birth and death dates were available, greatly reducing our confidence in author searching & matching. In other ways, creating an ONIX import pipeline was simplified by our earlier efforts which had established key conventions for how new bots may be created using the openlibrary-bots repository. Additionally, our ONIX feeds have the advantage of coming with book covers whereas we had to manually source book covers for items in the wishlist. 

The first step towards adding these records to Open Library was to write a parser to convert these ONIX feeds into a format which Open Library can understand.  . Open Library did have an ONIX Parser and Import Script written by the co-founder of Open Library, Aaron Swartz who had written the initial script to parse ONIX Records and add them to the Open Library Database. Like much of Open Library’s scripts, this code was in Python 2.7, encoded a much earlier version of the ONIX specification, and made use of a very old xml parser which was difficult to extend. Unfortunately, we couldn’t find any drop-in python replacements for the ONIX parser on github. These factors motivated rolling our own new ONIX parser.

To start with Salman received a dump of ~70,000 ONIX records from bibliometa to be evaluated for import into Open Library. There were two checks that were implemented for this procedure:

  1. Checking if there was an existing ISBN-10 or ISBN-13 for that particular work on Open Library using the Open Library Client.
  2. Matching via Title or Author and see if the record exists on Open Library or not via an API Call.

While much of the ONIX parser is complete, the ONIX Bot project is still in development.

A Guide on Writing Bots

Interested in writing your own Open Library Bot? For more information on how to make an Open Library Bot and their capabilities, please consult our documentation. The basic steps are:

  1. Apply for a Bot Account on Open Library by contacting the Open Library Maintainer and obtain a bot account. A good way to do this is to respond to this issue on github.
  2. After registering a bot account and having it approved, you can write a bot by extending the openlibrary-client to add accomplish tasks like adding new works to Open Library. You can refer to the openlibrary-client examples.
  3. All bots that add works to Open Library have to be added, are added to the Open Library Bots Repository on Github. Every bot has its own directory with a README containing instructions on how to reproducibly run the bot. Each bot should also link to a corresponding directory within the openlibrary-bots archive.org item where the outputs of the bot may be stored for provenance.

Next Steps: Provisioning

Unfortunately, there wasn’t enough time during the GSoC program to complete all three phrases of our roadmap (Wishlist, ONIX, and Provisioning). The objective of the third phase of our plan was to make Open Library deployment more robust and reliable using Docker and Ansible. Docker has been a discussion point of several Open Library Community Calls and has catalyzed the creation of a docker branch on the Open Library Github Repository which addresses some of the basic use cases outlined in the GSoC proposal. One important outcome is the identification of concrete steps and recommendations which the community can implement to improve Open Library’s provisioning process:

  • Switch from Docker to Docker Compose: Currently the Docker branch uses single Docker files to manage the dependencies for Docker. The goal is to use a single docker-compose file which will manage all services being used.
  • Switch Open Library to use Ansible (a software that automates software provisioning, configuration management, and application deployment). Have a Production as well as a Development Playbook. Playbooks are Ansible’s configuration, deployment, and orchestration language. They can describe a policy you want your remote systems to enforce, or a set of steps in a general IT process.  
  • Use Ansible Vault which is a feature of ansible that allows keeping sensitive data such as passwords or keys in encrypted files. This will replace the current system of having a olsystem.

Retrospective

In retrospect, Google Summer of Code 2018 has resulted in thousands of new books being added to the Open Library catalog. Conventions were established both to streamline and make it easier for others to create new bots in the future and to continue and extend this summer’s work.

Some of the key points that we overlooked while going drafting the proposal were as follows:

  1. Checking whether a book exists on Open Library or not is hard. We started with a simple Title match and ended up with formatting the title, formatting the authors to ensure no new author objects are created, making changes to the code to ensure it doesn’t break when there are no authors for a work in our data.
  2. Improving the openlibrary-client as well as documenting it extensively to ensure that future developers don’t have to go through the code to understand what that particular function ends up doing and how it can be used.
  3. Setting up a structure for the openlibrary-bots directory to ensure future developers are easily able to find the required code they need if they are writing their own bot.
  4. Assuming that data would be perfect and it was a matter of copy-pasting, but in reality, Salman and Mek had to go through the data to understand where the code broke because of various reasons like having a ‘,’(comma) in the string and so on.

One learning we obtained from participating in GSoC for the first time is that we may have been better off focusing on two instead of three work deliverables. By the end of the program, we didn’t have enough time for our third phase, even though we were proud of the progress we made. On the flip side, because of discussions catalyzed during our community calls and suggestions outlined in our GSoC proposal, there is now ongoing community progress on this final phase — dockerization of Open Library — which can be found here.

A major win of this GSoC project is that the project’s complexity necessitated Salman explore writing test cases for the first time and provided first hand experience as to the importance of a test harness in developing an end to end data processing pipeline.

 Three of our biggest objective key results during this program were:

  1. Quality assuring and updating the documentation of the openlibrary-client tool to support future developers.
  2. Creating a new `openlibrary-bots` repository with documentation and processes to ensure that there is a standard way to add future bots moving forward. And also making sure our Wishlist and ONIX bot processes are well documented with results which are reproducible.
  3. Adding thousands of new modern books to the Open Library catalog

Project Links

  1. Open Library Client – https://github.com/internetarchive/openlibrary-client
  2. Open Library Bots (IA Wishlist Bot) – https://github.com/internetarchive/openlibrary-bots/tree/master/ia-wishlist-bot
  3. Open Library Bots (ONIX Bot) – https://github.com/internetarchive/openlibrary-bots/tree/master/onix-bot
  4. Docker (In Progress) – https://github.com/internetarchive/openlibrary/tree/docker
Posted in Uncategorized | Comments closed
  • open library logo
  • follow us on twitter

  • Recent Posts

  • Archives