Modern Information Retrieval

Modern Information Retrieval. Ricardo Baeza-Yates & Berthier Ribeiro-Neto. New York, NY: ACM Press; 1999; 513 pp. Price: $??.?? (ISBN: 0-201-39829-X.)

This book is a comprehensive presentation of information retrieval from a computer science point of view. It presents the algorithms, formulae and operational details of information retrieval models, query languages, indexes, user interfaces and visualization. The two principal authors use the first nine chapters to give a straightforward exposition of the major aspects of algorithmic information retrieval. The remaining six chapters are authored by leading researchers such as Edward Fox, Christos Faloutsos and Edie Rasmussen. These ancillary chapters stand alone as “state of the art” contributions that enhance the core text.

The treatment throughout is expository, setting out the main themes and discussing the major aspects of every topic. The book can be used as a textbook at various levels of readership from undergraduate to graduate. There are schemas for the navigation among topics and chapters for various classes of readers. Each chapter includes a bibliographic discussion and there is an extensive bibliography. Happily, the authors have a web page for elaborations, updates and corrections.

This is useful book that works successfully at several levels. There is, of course, the surface expository level that is an encyclopedic treatment of information retrieval. At a deeper level, however, the book works as a snapshot of the changing discipline of information retrieval. Perhaps the authors’ greatest success is the thorough integration of the Internet into the presentation of all aspects of information retrieval. It is apparent that the web has shifted the paradigm of information retrieval: “some web search engines are opting for avoiding text operations altogether” (p. 167). A whole series of traditional ideas are challenged: Stemming and stopwords are less useful in the web environment; Structured retrieval models are promoted; The metaphor of navigating directed graphs becomes important; Nonsequential organization of text replaces traditional linear text; Text markup eclipses record structures; and Retrieval by classification proves more useful than keyword indexing. The profound implications of these Internet changes are so exciting that the classic information retrieval material seems dated and becomes some of the least interesting parts of the book.

The authors distinguish their algorithmic approach from the user-centered perspective. In fact, human judgment is never far from the surface of the discussion. Relevance assessment is claimed to be central to information retrieval as early as page 2. Many of the traditional methods represent certain values and assumptions about the nature of text, as well as arbitrary threshold settings and so on. Text processing itself stands on assumptions about how to tokenize and normalize text into “words.” No matter how impressive the formulae, it appears that information retrieval is a very human process.

The books suffers a certain amount of compartmentalization: The assumptions of one algorithm or model may directly conflict with another. So, in one spot we read that there are fundamental lexical problems in processing text , while in another spot the book presents a technique that assumes text processing is trivial. Apparently, information retrieval still awaits a single, evaluative exposition. To some degree this conceptual chopiness drives a wedge between the core text and the ancillary chapters. For example, the core text echoes the standard complaint that commercial vendors continue to rely on Boolean approaches while ignoring superior weighted term methods. Only Rasmussen in an ancillary chapter mentions the weighted term tools introduced by commercial vendors years ago.

In general the ancillary chapters are well done. Special mention should go to the human-computer interface chapter by Marti A. Hearst that comprehensively covers user interfaces and visualization. The chapter on digital libraries by Edward A. Fox and Ohm Sornil is authoritatively written and includes architectural issues and multilingual documents.

Overall, the authors have done an admirable job in surveying a rapidly changing field. It has very good prospects as a textbook. It serves as an indicator how the web is changing everything.

Terrence A. Brooks

School of Library and Information Science

University of Washington

tabrooks@u.washington.edu