The information explosion, with the creation of the Internet, has
presented problems and opportunities for libraries to apply and augment
traditional methods of cataloging. This research paper will cover three
major topics, in an effort to explain some of the issues. The first topic
will provide an overview of how the process of cataloging developed to
establish an understanding of current systems. The second topic will
explain issues or difficulties in applying classification systems to the
information available on the Internet. And finally, the third topic will
show the possibilities and plans for libraries to use cataloging for
improving research on the Internet.
Since the advent of the printing press in the mid-15th century,
mass-produced books have contained �conventions for representing
information in published texts. Principle among these was the convention
of the title page, which named the author and the title of the work
contained therein, and also acknowledged the printing source (Tillett
2).� The key data of title, author, and source was then used to create
the first bibliographic records.
Libraries began to place those bibliographic records into what was called
a catalog. To catalog is to make a systemized list and so, the list of
bibliographic records for the material housed in the library was called
the catalog. Barbara B. Tillett explains that libraries first recorded
lists in books. By the 1800�s, the American Library Association had
adopted the Anglo-American cataloging rules, published in a volume
entitled AACR2, which is in use today. In 1901, the Library of Congress
began selling printed cards to other libraries. Unlike book catalogs,
card catalogs enabled the user to find the complete bibliographic
description under many access points through the use of the newly-termed
�main entries� and �added entries.� Main entries served as collating and
arranging devices (Tillett 5). The ability to provide multiple access
points developed the concept of indexing. Indexing used keywords or
phrases to describe the content while pointing to the main entry or the
bibliographic record.
In
the 1800's, the Library of Congress classification system and the Dewey
Decimal system were developed. Each system used letters and numbers to
make up call numbers which represented the specific subject of a book.
That allowed books to be organized on the shelf by subject matter
("Classification" 1). Because decimal numbers were used, the
subject areas could easily be expanded using fractions of the whole
numbers. In 1967, because of electronic databases, the Library of
Congress converted bibliographic records into machine-readable cards or
MARC. MARC format has five types of data: bibliographic, holdings,
authority, classification, and community information. MARC records encode
the data elements to help describe, retrieve, and control the
information.
Another impact on the development of cataloging occurred in
1967, when a consortia called OCLC (Ohio College Library Center), formed
a network of 54 Ohio Colleges using MARC records. In 1977, that network
was opened to all libraries. In 1981, the legal name of the corporation
became OCLC Online Computer Library Center, Inc. Today more than 30,000
libraries in the U.S. and other countries participate in the shared
system (�History�).
The ability to operate as a collective requires consistent standards for
precise communication. An example is the word, movie. When referring to a
book about the movie "Gone with the Wind", does a cataloger use
moving picture, motion picture, cinema, film, or movie? To have
consistent indexing requires an authority list or what may also be called
a controlled vocabulary. The vocabulary list mentions each term, but
states Motion Picture as the authority to be used in the record created.
The Library of Congress publishes a volume entitled the LC
Subject Headings, which is accepted and used by most libraries. The
volume lists the subject headings that are accepted for use when being
cataloged. There are problems, though, when specialties require more
precise categories. Some organizations publish a list of terms to provide
the exact term used in a more concise subject classification. One such
organization is Engineering Information, Incorporated, which has created
a list called the Ei Thesaurus (Milstead).
So, this evolution has resulted in a system of collective consistency
that each library classifies a book using the same key data, assigns
keywords based on a controlled vocabulary, and places the records in a
common database has enabled users to have quality results in the search
for information.
With the advent of the Internet and the capability of sharing information
electronically, the library world continues to evolve. The information
explosion has increased the number of users, the amount of information
available, and the speed of retrieval. This new direction causes problems
in the attempt of library staff to apply traditional methods of
cataloging. The search engines available on the Internet look for words
in either the title, first few lines, or full text of the files.
Searching can take too long and can produce results that have too many
records, irrelevant records, or omissions to relevant records.
To perform cataloging of web sites requires consistent field entries
similar to a MARC record. There are available fields within the
programming language that make cataloging a viable idea. Within the
Hypertext Markup Language (HTML) coding there is the ability to insert a
field called a metatag. Metadata inserted into the metatag is similar to
the information within a MARC record. Search engines may look
specifically for matching terms in the metatag at amazing speed, but the
terms input in the tags must be accurate. Today, web sites are thrown in
the middle of the Internet without cataloging. It would be the same as
just piling books in the center of a library with no system of indexing.
The Internet lacks the structure of the library cataloging system.
This brings us to the first problem, which is controlled vocabulary.
There is no source accepted by web creators that gives authority to the
vocabulary words assigned to a site. Asking a web author to tag a site is
like asking a book author to make his own MARC record after writing his
book. This has always been the function of skilled librarians, using the
common tools of authority lists, classification systems, or shared
databases.
Other problems evolve when the information changes. If a book changes, it
becomes a new edition with a new bibliographic record. Serials, also
known as magazines, change frequently, but the change is predictable. In
other words, the change could happen daily, monthly, or yearly, depending
on the frequency of publication. The web sites on the Internet change
erratically. Cataloging with a system using a main entry and added
entries would not work because there is no main entry. "David
Seaman, director of the Electronic Text Center at the University of
Virginia/Charlottesville, pointed out, 'It's difficult to justify the
time and expense of doing MARC cataloging of Internet materials on a
large scale because what you have to catalog is so fluid. You go to the
Web on a certain day and the item is there. Return in six months and it's
not there. Or it's still there but has changed so dramatically that the
record doesn't match anymore.' (Chepesiuk)."
The final problem is quality standards. Authors approach a publisher who
has a legal obligation and a professional reputation to produce a quality
product. Librarians rely on consistent quality from reputable publishers
to set the standards. One thing books had that resources on the Internet
do not have is the accountability of a publisher. Publishers have a legal
obligation to print the verifiable truth. They edit the content,
structure, and grammar of their publications. They also verify the
sources mentioned. So, this brings up the issue as to whether the
Internet is even worth the time to catalog due to the varied
quality.
There are three major problems in cataloging the Internet:
the lack of universally accepted controlled vocabulary; the lack of
stability due to frequency of change to the data; and the lack of quality
standards.
There are many people trying to develop projects with the goal of
establishing standards for all to use. The fact that there are so many
efforts is a real problem in solidifying consistency. But there are three
that seem to be getting the most attention, partly due to the
institutions from which they started, the sponsorship, and the members.
Three main current projects include the Dublin Core, OCLC (CORC), and the
Coalition for Networked Information (CNI).
"In March 1995, fifty-two librarians, archivists, and scholars
attended an OCLC-sponsored workshop to reach some agreement on what the
core of a descriptive record for items on the Internet might include. The
result was thirteen elements that they named the Dublin Core Metadata
Element Set (Chepesiuk 60)." The Dublin Core has become a
prominent candidate for cataloging electronic material. Their goal was to
create a set of metadata elements that, when defined, could be easily
understood by web developers. Along with that basic ability, the elements
provide the capability to further modify the data for more precise
specialized communities of topics. The data elements selected include:
title; author; subject; description; publisher; other contributor; date;
resource type; format; identifier; source; language; relation; coverage;
rights management.
Another OCLC effort is the Cooperative Online Resource Cataloging (CORC)
Project. CORC is a research project exploring the cooperative creation
and sharing of metadata by libraries. The goal is to allow libraries to
integrate material available on the Internet with current library
resources. According to Dorman, OCLC will build on the prior activities
of NetFirst and InterCat, by seeding the initial CORC database with
145,000 records using full MARC and Dublin Core metadata (66).
Coalition for Networked Information (CNI) is another effort.
"The goal of the coalition is to advance scholarship and
intellectual productivity. Founded in 1990 by the Association of Research
Libraries, Educom, and CAUSE. The members, who represent over two hundred
institutions and organizations, meet bi-annually ("Coalition"
1). Bernbom informs that the coalition has created the Institution Wide
Information Strategies project. Since each individual representative is
gathering, delivering, and storing electronic information, the strategic
plan allows "networked information resource and service development
practices applicable to all (88).
Historically, the process of cataloging has proven a very effective
method of organizing material for those seeking information. As the
evolution of the electronic world continues, libraries have the
opportunity to provide new ways of applying cataloging methods. As with
all change, the transition can present problems, but the end result can
be, hopefully, more than ever imagined.
Bibliography
Bernbom, Gerald. "Institution wide information strategies: a CNI
initiative." Information Technology and Libraries
June 1998:87-92.
Chepesiuk, Ron. "Organizing the Internet: The "Core" of
the Challenge." American Libraries
Jan. 1999:60-63.
Milstead, Jessica, ed. Ei Thesaurus 2nd
ed. Hoboken: Engineering Information Inc., 1995.
Tillett, Barbara B. �Cataloging Rules and Conceptual Models.�
OCLC Distinguished Seminar Series 9 Jan.
1996:1-14. Online. Internet. 25 Jan. 1999. Available
http://www.oclc.org:5046/~emiller/misc/tillett.html.
[These buttons are no longer active. To return to Table of Contents for this issue, click here.]