ASSOCIATES (vol. 10, no. 3, March 2004) - associates.ucr.edu
*Paradise Lost is Found: Typographical Errors in Online Catalogs*
University of California, Riverside
Typographical errors in online catalogs is a problem shared by all libraries. At the University of California, Riverside (UCR), in the Cataloging Department in the Science Library, the cataloger reviews the contents of the bibliographic record as each book is cataloged and routinely corrects, verifies and modifies each bibliographic record. Also, all headings, such as names, subjects, and traced series are sent to MARCIVE to verify the headings against the Library of Congress Authority File, and MARCIVE provides cross references and updates to changed headings. In addition, one staff member reviews every first time use of a heading and is continually updating and correcting indexed heading fields.
UCR has been cataloging material since the early 1950s. The individually typed catalog card was abandoned by the mid-1970s and replaced by printed cards provided by OCLC. Finally, in 1996, the generation of the typed catalog card was replaced by UCR's online catalog, Scotty (INNOPAC). One of the major, initial loads of online records into Scotty was dated "5-20-96." The majority of the initial loads consisted of machine readable cataloging records from many sources: our own OCLC records from tape; retrospective conversion records from vendors, such as Carrolton Press, Utlas, and Auto-Graphics; and various other tape loaded records, including government publications.
Back in 1991, Terry Ballard, currently an Associate Professor and Automation Librarian at Quinnipiac University, in Hamden, Connecticut, did a keyword inspection of the Adelphi University online catalog. He looked through the entire keyword database, found the words that were typos, fixed them, and maintained a list of these words. He confirmed that most libraries have a typo problem because libraries receive their cataloging records from similar sources, such as OCLC or retrospective conversion vendors. One cataloging record could be used by hundreds or even thousands of libraries. If the source record contained a typo, the probability is that many libraries unknowingly retained the typo. Since the original study in 1991, Terry has added more words to his list, has published two articles about the topic, and integrated material from his list with a confirming study performed at the University of North Florida by Bob Jones. More recently, the whole subject was given a boost thanks to Phalbe Henriksen, library director of the Bradford County Public Library in Florida. Terry and Phalbe started an online discussion at Yahoo Groups (Libtypos-L) and found others were interested in this work. Tina Gunther of Biola University keeps and maintains the master list. (Online addresses for the web site and more information are listed at the end of this article.) At this time, the majority of the list is devoted to typos in American English-language words. And, OCLC is being notified of these typos so that the base cataloging record for most libraries can be corrected as well.
I came across Terry Ballard's original list in 1998 and made over 1300 corrections to Scotty bibliographic records by calling up the typos in the keyword ("Word") option in Scotty. The "Word" option provides a screen with a list of 8 words with the number of entries for which that word is found. Over the years, I would look for more misspellings and correct those records. In January 2004, I again became active in my quest to find and eliminate typos in Scotty. Terry Ballard's newly revised typo list now has about 4,000 words. I've checked all of them and corrected 4,500 Scotty records to date. Often the same word would be misspelled in several records. About 60% of the 4,000 word list of typos were identified in Scotty, but more typos were found when checking those typos. For example, the list had "representaton" as a typo to correct. However, when I checked the root word "represent" and forwarded through the Scotty Word list, this lead to 19 other words that needed correction, such as "repesentation," "reprensentation," "reprensetative," "reprentative," etc.
Generally, 80% of the typos I have identified are from the initial load of 5-20-96, mostly from vendor-generated records. The typos are in headings, subjects, notes and title fields. The remaining 20% are from "current" cataloging after 1996, titles not yet cataloged, and various tape-loaded records that are added to Scotty directly from the source without cataloger review.
Typos are often found in lengthy fields--in long, complex titles and particularly, in long contents note fields. Typos difficult to see are those words that are spelled correctly but are incorrect in context, such as "Untied States." Many times what appears to be a typo is in fact the correct usage of the word. For example, UCR has a rich collection of microform titles of old English and American titles. A word such as "publick" is spelled according to the convention at the time, and is not considered a typo in that context.
Some typos are actually author's names. While correcting the typo "touher" for "tougher," an actual author of Patrick Touher was found; while correcting the typo "versino" for "version," Cristina Versino was an author in the contents note. The typo "workship" was changed to either "worship" or "workshop." The use of initialisms for workshops, conferences, or organizations cause some time-consuming checking. Seeing "ecafe" or "ecai" in the Scotty Word listing lures me to check if these words are misspelled (both are correct as is). A rather recent usage of computer language and programming words and online resources in bibliographic records has compounded the serendipitous identification of typos. "Wordperfect" and "Dotcom" are examples. Medical and scientific terms are a nightmare! "Psychiatry" was misspelled as "pysch…," "phych…," etc. Is "lacz" a typo for "lazy" or "lace," or used correctly as a term?
Non-English words also complicate the typo problem. The word "trafic" is correct in French, but a typo in English. Authors freely spell, misspell, and even make up words for their works. What may be a typo in one title is meant to be spelled in a certain way in another work. Sound recordings have titles such as "Spacin' home" and " Con-soul and sax"; science fiction and fantasy records have words such as "Chthon," "Soma," and "Beegu."
Scotty is the UCR finding tool for cataloged works. A simple typo can mislead the user to think UCR does not own the work. Finding and correcting Milton's "Pradise lost" gives the same satisfaction as adding a new title to UCR's collection.
"Typographic Errors in Library Databases" web site, by Terry Ballard, Revised January, 2004 by Tina Gunther and Terry Ballard:
PDF PowerPoint presentation by Terry Ballard, given at the American Library Association convention in San Diego in 2004:
"More Typographical Errors in Library Databases" web site written and maintained by Phalbe Henriksen: