Orthographic Survey of a Database
"Orthography is the linguistic study of written language: elements of text such as letters, punctuation marks and spelling. Information retrieval systems operate in the orthographic realm matching some text strings (i.e., index entries) from documents with other text strings (i.e., query terms) from patrons. During the early history of information retrieval, it has been convenient to assume the rationality and uniformity of orthography in order to concentrate effort building information retrieval systems. Fundamental orthographic problems have persisted into modern information retrieval systems, however, where white-space normalization and the arbitrary treatment of punctuation have exacerbated the orthographic impediment to information retrieval."
Your assignment is to survey the treatment of language (an "orthographic" survey) of a group of databases. Your survey should be presented as (1) an HTML document that I can add to the homepage for LIS503, and (2) as a demonstration ("Here are some goodies that I found in this database") to our class. These demonstrations will be scheduled for the last week of the quarter.
The following is a list of suggested avenues of attack in doing an orthographic survey: