LIS 528

Autumn 2000

Literature Searching

 

Course Goals

Finding and retrieving information is central to libraries, and searching for specific information in large collections of text—known as information retrieval—has long been of interest of computer scientists.  Until the development of the web, browsing received less research effort, despite its importance.  Digital libraries bring information retrieval and browsing together as the general problem of information discovery—how to find information.”  W.Y. Arms, Digital Libraries, p. 66

 

 

The focus of this course is on information retrieval in both structured (i.e., databases on Dialog) and semi-structured (i.e., web pages) information environments. We will pursue a two-fold strategy: (1) How to maximize "relevant" retrieval in the mature, structured information environment of Dialog, and (2) How to apply similar strategies to the rapidly maturing information environment of the web.

Specific strategies include ranking, duplicate detection, finder databases and natural language searching with Dialog's Target and Lexis/Nexis FreeStyle. There is a Recall/Precision Database Searching context that challenges students to employ clever searching strategies, and the analysis of a Gold Standard Search that challenges students to match skills with the experts. Each student will analyze a web tool.

Multi-media event: Film excerpt from Saracevic, Mokros and Su: "Nature of Interaction Between Users and Intermediaries in Online Searching."


Assignments 

 

            Please establish (if you haven’t already) a personal website.  On this website, please place a link to your 528 assignments.  The work for this course consists of two graded assignments and two homeworks.

 

 

 

            Graded assignments [Grade weight in brackets]

Homework [This work is not graded, but it must be completed.]

  • Comparison of Dialog's Target and Nexis' FreeStyle
  • WWW Assignment

There is no final exam, midterm exam or quizzes.


Pick a Search Tool
A sophisticated information intermediary (read: Librarian) must possess a familiarity with a large number of web tools. Our class is going to be enriched by student presentations of web tools. Search Engine Watch is a good source of information about web tools, as is listings of search engines on Yahoo.
Every student should choose one search tool and (1) Present an analysis for the class (see schedule below) and (2) Write a thorough analysis of the web tool (this should be an html document).
Elements of a critique of a web tool:

Publicize your choose of web tool and select a presentation date:

 


Gold Standard Search


Meta-Search Tools

  • DIALINDEX 411

Some Discussion

  • Cooper, W.S. (1971) "A Definition of Relevance for Information Retrieval" Information Storage & Retrieval, 7, 19-37.
  • Froehlich, T. J. (1994) "Relevance Reconsidered--Towards an Agenda for the 21st Century: Introduction to Special Topic Issue on Relevance Research" JASIS, 45(3), April 1994
  • Foskett, D.J. (1972) "A Note on the Concept of 'Relevance'" Information Storage & Retrieval, 8, 77-78.
  • Kemp, D.A. (1974). "Relevance, Pertinence and Information System Development" Information Storage & Retrieval, 10, 37-47
  • Salton, G. & McGill, M. (1983) Introduction to Modern Information Retrieval.
  • Swanson, D.R. (1988) "Historical note: Information Retrieval and the Future of an Illusion" JASIS, 39, 92-98.
  • Harter, S. (1996) "Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness" JASIS 47(1):37-49
  • Belew, R. K. Finding out about: Search engine technology from a cognitive perspective.

Target and Free-Style Searching

Interesting reading: "Measuring Search-Engine Quality and Query Difficulty: Ranking with Target and Freestyle" by Robert M. Losee and Lee Anne H. Paris. Journal of the American Society for Information Science, 50(10):882-889, 1999. The results suggest that slightly better subject-based retrieval performance is obtained with best-case Boolean searching or the ranking engine used by Freestyle when compared tothe ranking engine used by Target....there is little difference between the two commercial search engines in terms of performance....The research discussed here has been based on tests using the CF dataset...However, fulltext systems containing entire documents...can be expected to perform somewhat differently, and this study provides only an approximation of the performance that would be obtained with retrieving full documents using these particular commercial search engines.


Recall/Precision Database Searching Contest

Some Discussion:

  • Most Specific Facet First
  • Building Block Approach
  • Citation Pearl Growing Strategy
  • Vigil, P.J. (1988) "Search Strategy" [Chapter 5] from his book Online Retrieval, Wiley 1988. See his Closed Loop Relevance Clustering Algorithm, p. 103
  • Saracevic, T., Kantor, P. (1988). "A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap. JASIS, 39, 197-215.
  • Saracevic, T., Mokros, H., Su, L. (1990). "Nature of Interaction Between Users and Intermediaaries in Online Searching: A Qualitative Analysis. Proceedings of the 53rd Annual Meeting of ASIS, (27) 47-54

Rank Command
Removing Duplicates


Some Discussion:

  • Bates, M.J. (1979). Information Search Tactics. JASIS, 205-214.
  • Fox, E., et al. (1993) Users, User Interfaces and Objects: Envision, a Digital Library. JASIS, 44(8), 480-491
  • Harter, S. P. (1990). Search Term Combinations and Retrieval Overlap: A Proposed Methodology and Case Study. JASIS, 41, 132-146.