There can be more than way to say the same thing, for example gramophone record, phonograph record and vinyl records. When libraries write catalog records they pick one of these terms and sticks to it, they use what is known as a ‘controlled vocabulary‘. This makes it easier to browse library catalogs.
Traditionally it has been thought that patrons want to browse by author and subject headings, so these fields have been controlled. The data in these fields can be used in other ways, Ross Singer has been experimenting with geographic subject headings.
Publisher is an uncontrolled field. Penguin and Penguin Books are the same publisher, but their name has been entered in catalog records differently, making it difficult to browse by publisher.
A workaround is to use the ISBN field in the catalog record. Almost every book published since 1970 has an ISBN. English-language books start with a 0 or 1, followed by a variable-length publisher code, item number and finally a checksum digit.
For example: 0-14-043531-X
0 = English language
14 = Publisher code
043531 = Item number
X = checksum
We are able to build a list of ISBN publisher codes by picking the most popular publisher name, as it appears in library records, for each code. Using ISBN we can start the process of making publisher a controlled field.
The results:
- To browse: list of 2 and 3-digit publisher codes for ISBNs that start with a 0. Includes details for each code
- To download: complete list of publisher codes for ISBN that start with a 0 or 1.
I noticed that openlibrary.org has been giving a 503/500 since 1:30 PM PDT today (7/20). Since there isn’t any other contact info, I thought I’d let you know via this blog.
I find it interesting to look at the list of publishers in order by frequency. The top ones are:
0-19 Oxford University Press 251368
0-16 U.S. G.P.O. 245442
0-521 Cambridge University Press 175340
0-415 Routledge 117731
0-13 Prentice-Hall 116635
0-471 Wiley 112967
0-06 Harper & Row 109797
0-07 McGraw-Hill 98202
0-312 St. Martin’s Press 91149
0-02 Macmillan 75461
And it’s interesting how many of the top ones are university presses.
Also, it looks like there are either typo’d ISBNs in the data (I’m sure there are some) or some publishers have shared ISBNs. For example, I find MacGibbon & Kee under a couple of different numbers assigned to others, as well as their own. Anyway, fascinating data here.
You did a fascinating job here.
Could You also build a publisher list for the language group 3 (german)?
regards
When I worked in a bookshop, we had a book that listed all the different ISBN publisher codes – came in handy on occasion.
How are you managing “imprints” of publishers?
“Imprint” can mean a variety of things, from a particular publisher’s series, like “Vintage Classics” to once independent publishing houses that have been purchased by a larger publishing company (increasingly common).
What we have to work with, however, is simply the publisher name that we receive in the metadata, and that is generally the name that appeared on the title page of the book. There is nothing to link that name to an actual corporate entity (either the one owning the imprint at the time, or the one that may own it today). I think that our data is closer to “imprint” than it is to “publisher”, and that bringing the two together will require some external data. The ISBN prefixes may provide some help in that area. For example, code 0-02, which is generally listed as belonging to Macmillan, has appeared in metadata with these listed in the publisher field:
Macmillian
Collier Macmillan
Free Press
Maxwell Macmillan International
Maxwell Macmillan Canada
Collier Books
Macmillian Reference USA
Collier-Macmillan
If these are what you intend my “Imprint” then imprint is generally what we have in the data.