Posts Tagged ‘newfeature’

Duplicate Authors? Wave your Magic Wand!

By George Oates

In your wanderings around Open Library, you may occasionally have seen two records for a person you know to be a single author, like Brooks, Terry & Terry Brooks.

Look for the Magic Wand around the site to start merging!

Today, we’re releasing a new feature to help you merge those two separate Terry entries into one. This, in turn, will update all the Works listed under each Terry and try to reconcile each Work by each Author to try to reconcile a tighter list of Works for the newly merged Terry. Magic!

Try a search for your favourite author now, browse recent author merges, or read on…

A few things bear explaining:

  • The merge feature works on the idea of a Master author and its Duplicates. As you do the merge, it will be up to you to elect the most suitable Master. We select the author record with the most Works as the default, but you can change that
  • Only people with an Open Library account can merge authors
  • Updating the search engine after a merge takes a little while at the moment, up to about 10 minutes, so you won’t see the list of the new Master’s Works updated immediately. We’re looking to speed this up, but are very happy to release this as a “minimum viable product.” As I mentioned, merging an author with either lots of works, lots of editions, or both, takes a long time to update, so please be patient.
  • Duplicate authors’ names will be saved as an alternate on the Master record. For example, the (new) Master record for H. P. Lovecraft now lists alternates like Howard Philips Lovecraft, H. P Lovecraft, Howard P. Lovecraft and H.P Lovecraft. These alternates are often just subtle differences in spacing or capitalization, and we’re hoping they might prove useful later if we begin to stockpile them now.
  • If you’re in any doubt about whether or not to merge an author, don’t. It’s possible you might come across an odd-looking author name like August (re: H. P. Lovecraft) Derleth or H. P. (introduction by Lin Carter) (with Harry Houdini on Pharoahs) Lovecraft in a search for H. P. Lovecraft… these are trickier, because they’re noting contributors in the author name. Ideally, those contributors would be siphoned out into the contributors field per edition, and not merged into the H. P. Lovecraft Master. That would be a loss of information. So, it’s probably easier to just leave those long, odd “authors” alone for now.

I’ve actually found it really fun to test this new feature. I found a useful directory listing of authors on Yahoo of a ton of authors that I began to merge in Open Library. By referring to an external list like this, I could just move from one to the next, rather than trying to come up with authors to search for.

We’ve also bundled another enhancement into this release: Recent Changes V2: There’s a new little bit of navigation to the recent changes page, so you can see things like all the authors merged on 8/16/2010, or all the bot edits made in June 2010. We’re looking forward to adding other bits and pieces to these new filtered views, for example, all the new ebooks made available on a certain day, or all the new covers uploaded in a certain month. Perhaps these could also have feeds available too, so you could subscribe to a feed of changes to keep your version of the Open Library dataset up-to-date.

As well as Recent Changes V2, we’ve introduced the concept of “save_many” for transactions that contain lots of little updates. This is a performance improvement, and entered as a single line in Recent Changes – look for the little “expand” link to open up the contents of the save_many transaction.

So, why not have a shot at merging two duplicate authors? The best place to start is the Author search page.

Anyhoo, we’re excited to show you the first major feature we’ve rolled out since the launch of the redesign back in May, and we’re excited to see what you make of it. Go forth and merge!

Easy permanent links to book page images

By mang

We just launched a new image permalinks feature for downloading and linking to page images of books hosted on the Internet Archive. Using a page image permalink makes it easier to references the contents of a book hosted on the Archive without having to know the details of how or where the book is stored. Since a book’s data could be moved around within the multiple petabytes of data in the Archive at any time the permalinks provide a consistent and stable way to access the page images.

Here are a few quick examples. For each of these URLs you would add http://www.archive.org/download/{item identifier} to the beginning (hover over an image to see its full URL).

Referencing the cover image for a book at thumbnail size:
/page/cover_thumb.jpg

You can also request other sizes and rotations (in 90 degree increments):
/page/n194_rotate90_medium.jpg
Vertical Migration of Plankton

The full list of options is given in the Downloading / Linking Page Images section of the Book URLs developer documentation.

We’re hoping that the image permalinks will make it easier for people to access the wealth of books hosted on the Archive and stimulate new uses of the images. Let us know if you do something cool!

Bundle o' Updates

By George Oates

In addition to our Lending Launch last week, we’re constantly adding new bits and pieces to the site almost every day. Here’s a bunch of things we’ve been working on:

  • Over 10,000 human edits last week!
    Wow! So great to see so many people making large and small contributions to the catalog. Some standout editors include Fiction Addiction, menolly42, Anonymous (no, really), and an actually anonymous editor who has taken an interest in books written in Raeto Romance languages. We’re also seeing new books being added in a variety of languages, by R. Knoop, for example. [this is good]
  • Data Out
    In an effort to make it easier to extract records from Open Library, we’ve begun sprinkling links to RDF and JSON versions of Editions, Works and Authors, as well as improving the way you can access MARC for editions where it’s available (e.g.). There’s a link to the MARC source in the history list for editions.
  • Revised Author RDF
    There was some great discussion on our ol-tech mailing list to approach a consensus on what approach we should take with our Author RDF offering (e.g.). Thank you very much to all the contributors to that, and to Karen Coyle for handling the final construction. We’ll be releasing updated Work and Edition RDF in the coming weeks.
  • Send to Kindle
    There are new Send to Kindle links alongside all the ebooks you can read via Open Library. That’s in addition to viewing the gorgeous hi-res scans in the BookReader, downloading a PDF, plain text, ePub, DAISYs and MOBI.
  • Bot + Write API = Magic!
    Our superstar intern, Daniel, has created a new bot whose sole task in life is writing some 4 million LibraryThing IDs to our edition records. He’s also been working on refining the general process of writing a bot for Open Library in the hope that this might be functionality we could offer more broadly to developers outside the core team.
  • Working with Librivox?
    For the audiophiles out there, we’ve begun initial conversations with Hugh at Librivox to get audio editions from them into the Open Library catalog. Edward has also been poking into the CERN biblio data that was released a few months ago. We’d like to get that online!
  • Two stellar enquiries
    I was probably a little over-excited to get 2 awesome emails within a couple of days of each other. The first was from the Tolkien Librarian asking when he’ll be able to merge duplicate Works, and the next was from the Marylebone Cricket Club, wondering if we’d be interested to try to help blend cricket-related bibliographies from 3 of the premiere cricketing libraries around the world. The answers were, of course, Yay! Woo! Soon! And YES!
  • Ariel Backenroth has joined the Open Library team
    We have been looking for an additional Python developer for a while, and I’m very pleased to say that Ariel has come on board. He’s a long time Freebase developer, and we’re looking forward to seeing how we will be able to leverage Freebase as a comparative dataset we could learn from and exchange with.
  • Dumps of the entire Open Library dataset
    Anand has been working on getting more regular dumps available. As it stands, we’re generating monthly dumps that are available for download via our data page in the developer section of the site
  • Planning out the next 4 months
    We’re thinking through the next batch of work we’re going to try to roll out: tools for merging duplicates, operational stability, lists v1, annotations, and bringing full text search back online.

Whew! Onward!

Small Moves: Open Library Integrates Digital Lending

By George Oates

Today, the Internet Archive is pleased to announce 2 new borrowing options through Open Library:

  1. Borrowing ebooks through OverDrive – an ebook through your local library
    We have worked with the team at OverDrive to import about 70,000 new ebook editions into Open Library. All loans via OverDrive are managed through the OverDrive system. Once you click on borrow for these titles, you’ll need to tell OverDrive where you are so it can find your local library.

    Want to try a search through the OverDrive titles?

  2. Borrowing Scanned Books through participating libraries - an ebook to you, anywhere in the world
    Three long-time Internet Archive library partners are now offering scanned books from their collections for loan through Open Library. Boston Public Library, the Biblioteca Ludwig von Mises at the Universidad Francisco Marroquin in Guatemala, the Marine Biological Laboratory in Wood’s Hole as well as the Internet Archive itself are proud to make around 200 titles available for loan as ebooks through Open Library.

    You’ll need to download the free AdobeĀ® Digital Editions software to manage borrowing scanned books.

  3. Borrowing Physical Books through WorldCat- from your local library
    Since Open Library was launched back in 2007, we’ve added links wherever possible into the WorldCat catalog, which you can search using your location to find a copy of the book near you.


Remix Edition page on Open Library

As you may have seen in our recent 1 million accessible books announcement, we used an Open Library subject to group those works with an accessible edition together. We’re doing the same thing with both an OverDrive subject page and a Lending Library subject page, to help you browse what’s available, or click through to a search for something specific.

There are some classic technology titles in the Scanned Collection, from The Media Lab: inventing the future at MIT by Stewart Brand to that 1986 gem, Voice/data telecommunications systems by Michael Gurrie.

Here are a few shortcuts to dive into these new borrowable books…

OverDrive titles:

From the teeny scanned books offering:

If you need any help trying to borrow a book, please be sure to review the borrowing a book through Open Library FAQ. Check out the official announcement over on archive.org!

Over 1 Million Digital Books Now Available Free to the Print-Disabled

By George Oates

More than doubling the number of books available to print disabled people of all ages, today the Internet Archive launched a new service that brings free access to more than 1 million books – from classic 19th Century fiction and current novels to technical guides and research materials – now available in the specially designed format to support those who are blind, dyslexic or otherwise visually impaired.

Read the full press release on archive.org

Part of this announcement was to tell people about the brand new, rebuilt Open Library, which came online yesterday. Knowing that the site would be a new front door to books for print-disabled and visually impaired visitors, we had to make sure that it was accessible, both at the code level, and on all sorts of browsers and devices.

Luckily, one of the team, Lance Arthur, is a total markup rockstar who has been working to web standards throughout the reconstruction of the Open Library site. But, we wanted to make sure, so we’ve worked with several people, tools and organizations over the last month or so to test our accessibility from a variety of angles.

Mike, George, Jessie (and Nacho)
L-R: Mike McCabe, Me, Jessie

This is a picture of Mike McCabe (Internet Archive staffer responsible for generating these new accessible DAISY files), me and the fabulous Jessie Lorenz, who worked with us on what it was like to use Open Library as a blind person. It was amazing to watch her jump around the site using the keyboard and some screenreading software called Jaws. A entirely different experience to what I’m used to. Jessie suggested a number of straightforward tweaks that we could make to help make the site more navigable. For example:

  • Make sure we have title/alt attributes on everything,
  • Add the total result count for a search results page into the <h1> so she didn’t have to go search for it;
  • Describe the graph on the Subject pages in plain English (before all the numbers) – “There is a graph on this page that displays the publishing history for this subject. On the x-axis is time, and on the y-axis is the count of editions published.”
  • Make sure we labelled the top/bottom search fields differently, even though they do the same thing, and more.

Thank you, Jessie!

Jessie talks to Lance &amp; George Jessie Lorenz
Jessie & Lance; Jessie and her lovely pup, Nacho

We also used the W3C Markup Validation Service to check out our pages at the code level, and have passed with flying colours for virtually every page on the site, and made sure all understood and listened to what Open Library sounds like, through the use of a simulator for the Jaws Screen Reader software, which is unfortunately very expensive.

So, back to the announcement we made this morning. The easiest way to access the over 1 million accessible books on Open Library is to head for our new “Accessible Books” subject page that displays them all or to search within them. (There’s a link to that subject page on the new Open Library home page.)

There’s more information about the special DAISY accessible eBook format in the Open Library FAQ, and here’s a useful video from Open XML that explains more about being visually impaired and DAISY:

We’re really excited – not just about the new Open Library site finally being revealed to all, but to be a part of delivering a massive, new, free resource to so many people. It was also a pleasure to collaborate with lots of other Internet Archive staff, so thank you to Brewster Kahle, Jon Hornstein, Mike McCabe, Hank Bromley, Alexis Rossi, Raj Kumar, Sam Stoller, Laura Milvy, Jeff Kaplan, Karen Coyle, Mary Murrell, Calvin Yee, Ralf Muehlen and Michael Ang for all your awesome support.

Thumbnail View in BookReader!

By mang

We’re pleased to introduce a new thumbnail view for the Internet Archive BookReader. The thumbnail view gives you a quick visual impression of a book by seeing thumbnails of many pages at once. It’s a great way to quickly scan through a book.

Here’s how it looks for a book about the painter Goya:

The thumbnail view also makes it easy to pick out particular pages of interest, for example if you were trying to find the Burrowing Owl in Bird life in an Arctic Spring. Hint: here’s what he looks like:

You might also try looking at Old English colour prints or some of the other books about color prints.

This feature was submitted by Stephanie Collett of the California Digital Library via our BookReader GitHub account. It’s great to have this feature come in from the open source community building around the BookReader!

Announcing the Open Library redesign

By George Oates

Announcing the Open Library redesign!
Screenshot on Flickr – CC Attribution

Hooray! And yay! We’re very excited to announce the “soft launch” of our brand new Open Library site! This is version 1 of a reconstructed Open Library, and we’re going to keep it “soft” at a special URL until we’re sure it’s stable enough to make the final transition to openlibrary.org. We’re hoping that will happen soon.

As we mentioned in two previous blog posts [1][2], the main features of the new design are:

1. Works
The previous version of Open Library was only aware of editions of books, or “manifestations” in FRBR-speak. We’re excited to release Works, which helps catch all editions of the same book and collect them all under this one umbrella. Each work also has its own URI too – we’re hoping these propagate.

Note that our representations of Works is imperfect. We’re the first to acknowledge that there are lots of duplicate edition records in Open Library, and these dupes clog up our ability to derive or create works from editions. That means that we might have 25 Jane Eyres for a while, and that the next logical feature to release is a way for people to help merge things.

2. Subject pages
We wanted to find a way to help people browse the catalog rather than having to know what they’re looking for before they start. So, we’ve gone through a process of breaking down and reconstructing the subject headings on our records, giving each heading a URL, and displaying a whole bunch of data about each heading: works about that subject, publishing history, related subjects, authors who write about it, and publishers who publish in that subject area.

3. Revamped search
We’ve rewritten search from scratch and upgraded to SOLR 1.4. Our ranking is very basic for now, so “relevance” doesn’t mean a lot yet. We can’t wait to improve on it, and in the meantime, you can also sort your searches by the number of editions, when things were published, or filter using facets.

4. UI Improvements
The whole site’s had an overhaul in terms of the user interface. All the major operations (editing, searching, adding covers etc) have been redesigned. Even changing the size and position of the Edit button will hopefully make it clearer that these records are open to correction. We’ll be blogging over the coming weeks with specifics about the user interface enhancements.

5. Links, link, links
Another major component of the redesign is to begin the process of connecting our records to other references out there on the interwebs. If you get to an Edit Edition page, you’ll notice that you can add different identifiers from a variety of systems to the Edition record, and even add a new type of identifier to the system. The more IDs we can collect, the more connections there’ll be into and out of Open Library.

Caveats!
The redesign is just out of the oven, so it’s important to be clear that there are still things missing, unclear, coming soon, or potentially even broken:

1. The API

A lot of the revisions we’ve made to the API are undocumented. We’re looking forward to changing that, and will update you as we do. We’d also like to expand the range of ways you can write to Open Library via the API.

2. The Data
Now that we’ve improved on the ways to browse the Open Library catalog, we’ve exposed a lot of the corners and content in there that may never have seen the light of day, or are just plain wrong.

It might be odd to say, but we sympathize with Google’s recent position on metadata quality[3]. Trying to merge records from lots of different catalogs means there will be duplicates, and that any errors in those different catalogs are imported as well. That’s not to say we’re not happy with what we’ve got at this first stage. Edward has done a fantastic job to get this far, and we’re looking forward to continual improvement of the dataset.

The fun thing — the best thing? — about Open Library is that you can correct any errors you come across, and those corrections can be propagated.

3. Under construction
This is a “soft launch,” our very first release at a new take on the Open Library system. There will be things that seem a bit weird, particularly if you’ve used the previous version.

We’re fairly sure that all the major operations work though, so if you find something that’s broken, or would like to suggest an improvement or discuss something, we’re all ears!

So, please go and explore the new Open Library. This is just the beginning!

http://www.openlibrary.org

Enjoy!