Modern
Information Retrieval. Ricardo Baeza-Yates &
Berthier Ribeiro-Neto. New York, NY:
ACM Press; 1999; 513 pp. Price: $??.?? (ISBN: 0-201-39829-X.)
This book is a comprehensive
presentation of information retrieval from a computer science point of
view. It presents the algorithms, formulae
and operational details of information retrieval models, query languages, indexes,
user interfaces and visualization. The
two principal authors use the first nine chapters to give a straightforward
exposition of the major aspects of algorithmic information retrieval. The remaining six chapters are authored by
leading researchers such as Edward Fox, Christos Faloutsos and Edie
Rasmussen. These ancillary chapters
stand alone as “state of the art” contributions that enhance the core text.
The
treatment throughout is expository, setting out the main themes and discussing
the major aspects of every topic. The
book can be used as a textbook at various levels of readership from
undergraduate to graduate. There are
schemas for the navigation among topics and chapters for various classes of
readers. Each chapter includes a bibliographic
discussion and there is an extensive bibliography. Happily, the authors have a web page for elaborations, updates
and corrections.
This
is useful book that works successfully at several levels. There is, of course, the surface expository
level that is an encyclopedic treatment of information retrieval. At a deeper level, however, the book works
as a snapshot of the changing discipline of information retrieval. Perhaps the authors’ greatest success is the
thorough integration of the Internet into the presentation of all aspects of information
retrieval. It is apparent that the web
has shifted the paradigm of information retrieval: “some web search engines are
opting for avoiding text operations altogether” (p. 167). A whole series of traditional ideas are
challenged: Stemming and stopwords are less useful in the web environment; Structured
retrieval models are promoted; The metaphor of navigating directed graphs
becomes important; Nonsequential organization of text replaces traditional linear
text; Text markup eclipses record structures; and Retrieval by classification
proves more useful than keyword indexing.
The profound implications of these Internet changes are so exciting that
the classic information retrieval material seems dated and becomes some of the
least interesting parts of the book.
The
authors distinguish their algorithmic approach from the user-centered
perspective. In fact, human judgment is
never far from the surface of the discussion.
Relevance assessment is claimed to be central to information retrieval
as early as page 2. Many of the traditional methods represent certain values
and assumptions about the nature of text, as well as arbitrary threshold
settings and so on. Text processing
itself stands on assumptions about how to tokenize and normalize text into “words.” No matter how impressive the formulae, it
appears that information retrieval is a very human process.
The
books suffers a certain amount of compartmentalization: The assumptions of one
algorithm or model may directly conflict with another. So, in one spot we read that there are
fundamental lexical problems in processing text , while in another spot the
book presents a technique that assumes text processing is trivial. Apparently, information retrieval still
awaits a single, evaluative exposition. To some degree this conceptual chopiness drives a wedge between
the core text and the ancillary chapters.
For example, the core text
echoes the standard complaint that commercial vendors continue to rely on Boolean
approaches while ignoring superior weighted term methods. Only Rasmussen in an ancillary chapter mentions
the weighted term tools introduced by commercial vendors years ago.
In
general the ancillary chapters are well done.
Special mention should go to the human-computer interface chapter by
Marti A. Hearst that comprehensively covers user interfaces and
visualization. The chapter on digital
libraries by Edward A. Fox and Ohm Sornil is authoritatively written and
includes architectural issues and multilingual documents.
Overall,
the authors have done an admirable job in surveying a rapidly changing
field. It has very good prospects as a
textbook. It serves as an indicator how
the web is changing everything.
Terrence A. Brooks
School of Library and Information Science
University of Washington
tabrooks@u.washington.edu