ASSOCIATES (2006, March, v. 12, no. 3) - associates.ucr.edu
*LibTypos: Playground for a Compulsive Proofreader*
by
Tina Gunther
Cataloging Technician I,
Biola University Library,
La Mirada CA USA
tina.gunther@biola.edu
Have you ever searched for a title that you knew full well was in a database, only to have it missing from what you knew was an accurate title or keyword search? When you finally tracked down the record, you found that the initial search failed because one of the key words in the record was mistyped.
Do you wince when you see "Intorduction to bibliograpical refrences" or "Offfice manger preformance reprot"? Do you find your eyes involuntarily drawn to typographical errors in text you are reading? Do you wish for a way to counter the typos that seem to be ever more common around you? If you said "Yes!" to those questions, then you understand why I was instantly attracted to an announcement I read back in June of 2000 about a new discussion group called "LibTypos-L".
Terry Ballard (currently an Associate Professor and Automation Librarian at The Arnold Bernhard Library, Quinnipiac University, Hamden, Conn.) had set up a website with a list of typographical errors that were common to large online library catalogs. Phalbe Henriksen (Library Director, Bradford County Public Library, Starke, Florida) gave him the idea of starting an online discussion list for librarians and other database managers who were interested in sharing information about common typographical errors. (Online addresses for pertinent websites are listed at the end of this article.)
The focus of LibTypos-L is typos made in the transcription of descriptive cataloging. If an item being cataloged actually has a typo on it, that typo would not count. (In fact, if a title page includes a typo and the cataloger transcribing the title page fixes the word instead of copying it as-is, then _that_ would be a transcription typo.)
As a self-confessed compulsive proofreader, the lure of the topic was impossible to resist. I had already found that cataloging was one of a minority of professions where compulsive proofreading was actually a marketable skill instead of an annoying personality quirk. I quickly found that my peculiar talents fit well with the pursuit of the goal of LibTypos-L, "to maintain an effective file of the most likely problem words." Before long, I found myself not just contributing to the problem-word collection, but helping Terry to update the website as more and more entries were added to the list of problem words. I also found that pursuing typos was fun and relaxing for me.
Terry's original list has grown from around 500 entries to over 5,700. Of those, 5,686 are currently on what the LibTypos group refers to as our "MainList" at:
http://faculty.quinnipiac.edu/libraries/tballard/typoscomplete.html
The entries are color-coded to indicate the oldest and newest entries:
Terry's Adelphi List is Brown (0445 entries)
Added in 2000-2004, Black (440+1308+701+86+275=2,810 entries)
Added in January 2005, Blue (1,165 entries)
Added in February 2006, Red (1,266 entries) (List available in a PDF document at the website.)
Already this year, 22 more entries have been confirmed and posted to the discussion list. They will go into the next update.
Some of the original entries have been truncated and added as new entries.
Truncation is indicated with an asterisk (*). For example, "Chirst*"
covers: Chirst, Chirstian, Chirstianity, Chirstians, Chirstmas, Chirstoph, etc. If we tried to list every variation of every truncated entry, the list would be too massive to be practical. It is already almost 200 screens long in its one-column format.
We use OhioLINK as our "hit score" source because it is very large, indexes a large number of fields, and is easily available online for free. Since this is a volunteer project, there is no funding for any search fees. My own efforts are done almost entirely off-the-clock, seldom coinciding with my official duties as a Cataloging Technician at Biola University Library.
The "MainList" is divided into five groups to facilitate dealing with the most urgent typos first.
Section A -- Highest Probability (84 terms with 100+ hits in OhioLINK)
Section B -- High Probability (962 terms with 16-99 hits in OhioLINK)
Section C -- Moderate Probability (780 terms with 8-15 hits in OhioLINK)
Section D -- Low Probability (2271 terms with 2-7 hits in OhioLINK)
Section E -- Lowest probability (1589 terms with only 1 hit in OhioLINK)
For those who want to print out the full five-section MainList in a compact form, we have a multiple-column PDF version available for download at the website. There is also a PDF version that has the whole list in one section, in one alphabetical sequence, but multiple columns.
A companion webpage, maintained by Phalbe Henriksen, lists typos that result in real words that are incorrect in context, such as "Los Angels, Calf." or "Office Manger" or "Untied States." All of the words on what LibTypos calls the "MoreList" have been found as typos, but also can be found used correctly as-is in OhioLINK.
It cannot be stressed enough that caution needs to be used when making corrections for any of the entries in either of the lists supported by LibTypos-L. This may be even truer for the "MoreTypos" list than it is for the "MainList," since _all_ of the words listed there are correct in some places, but are easily-missed typos in other places.
Correcting a typo requires assessment of the context of the word, not just the sequence of its letters. Blind search-and-replace routines can easily turn into search-and-destroy operations, or at least search-and-mangle.
What is wrong with a fast automated fix-it blitz? Here are several points to consider:
1. The LibTypos List focuses on Anglo-American English typographical errors. Some of the entries are valid non-English words: (profesional, recource)
2. Anything can be used as a name or a pseudonym. (Dogg, Versino)
3. English spelling was inconsistent until 1800 or so. Even now, what looks like a typographical error in a bibliographic record may be what really _is_ on the item being described. Not every error or spelling variant is marked with "[sic]" or an equivalent marker.
4. Some typos need different fixes in different records. Is "workship" supposed to be "worship" or "workshop" or "workslip" or is it correct as-is in that record?
5. What initially looks like a typo may instead be a word the reader has not encountered before. Here are some examples that had to be moved from the MainList to the MoreList: Eduction, Exodos, Insolation, Mexica
All of these factors argue against trying to cut corners and do heedless global replacements with most of the entries in the MainList or the MoreList. Although the lists have already proved valuable to many people, they need to be used in a prudent manner.
The LibTypos-L group has never been very large. The activity level varies as reports are made of new entries and questions are asked and answered.
If you are the type who enjoys fixing typos and chasing errors, not to mention increasing your vocabulary, then come join us. You can visit the group at:
http://groups.yahoo.com/group/Libtypos-L/
You do need to be a member to access the group's message archives.
Typo Trivia:
Have you checked your catalog lately for geographical typos? If it were a contest, Mississippi, with nine different recorded typos in OhioLINK, would be trailing California and Cambridge (a dozen each). Massachusetts and Cincinnati, with 17 and 18 respectively, would be neck and neck, with Philadelphia way out in front with 40 different ways it has been found to be misspelled in OhioLINK.
Typos that tickle my funny bone?
"Untied States" -- "poopular" -- "office manger"
The Dozen Top-scoring typos in the MainList:
12. Unives* (0298)
11. Commeric* (0331)
10. L95* (0335)
9. Univerist* (0363)
8. MacMillian* (0367)
7. L98* (0459)
6. Universt* (0461)
5. Amd (0488)
4. Accomodat* (0518)
3. Repons* (0605)
2. L96* (0692)
1. L97* (1114)
Pertinent Web Sites
History of the LibTypos project:
"Paradise Lost is Found: Typographical Errors in Online Catalogs," by Wendee Eyler
http://associates.ucr.edu/feyl304.htm
LibTypos-L homepage
http://groups.yahoo.com/group/Libtypos-L/
Terry Ballard's Web Page:
http://faculty.quinnipiac.edu/libraries/tballard
LibTypos MainList
"Typographic Errors in Library Databases" by Terry Ballard, Revised February, 2006 by Tina Gunther and Terry Ballard:
http://faculty.quinnipiac.edu/libraries/tballard/typoscomplete.html
LibTypos MoreList
"More Typographical Errors in Library Databases"
http://bradford.newriver.lib.fl.us/moretypos/moretypos.htm
OhioLINK keyword search URL
http://olc1.ohiolink.edu/search/X
OneLook website for checking words
http://www.onelook.com
Tina Gunther tina.gunther@biola.edu
"Keeper of the MainList"