Everyone undoubtedly has seen announcements of the new, non-Boolean, natural  language  search  techniques  from  West,  DIALOG,  and  Mead Data Central.  Some  of you may already be experimenting with or using Westlaw's WIN, DIALOG's TARGET, or Mead's FREE-STYLE. All are based on the assumption that  our standard command-driven online systems coupled with Boolean logic searching  are not only difficult to learn, but may sometimes miss relevant documents. This assumption is based not only on searchers' experiences, but on years of controlled tests in the information retrieval laboratories. There  is no doubt that this new way to search old systems is gaining a lot of  attention.  Westlaw Is Natural (WIN) was introduced in the fall of 1992 and  on  its  first anniversary won the ONLINE Product of the Year award at ONLINE/CD-ROM  '93. It continues to garner favorable reviews and is now the search  method that Westlaw trainers teach first to new searchers. DIALOG's TARGET  and  Mead's  FREESTYLE were announced at ONLINE/CD-ROM '93 to great fanfare, and became publicly available in late 1993 and early 1994.

 

THE NATURAL ALTERNATIVE

                          Although  each  product  works  somewhat  differently,  all  three offer an alternative  to  searching  with  command  interfaces and Boolean/proximity operators.  (Proximity  operators  such as specifying within a paragraph or within  a  specified  number of words are an extension of the Boolean AND.) They  offer (somewhat) natural language input, with no need for commands or logical   operators.   This   input   method   is  coupled  with  so-called "associative,"  "probabilistic"  or "statistical" retrieval techniques that provide relevance ranking of search results.

                          Unlike  exact-match  Boolean  logic  systems,  where  all concepts or terms linked  with  AND  or  a  proximity  operator  must  be  present, relevance retrieval techniques are partial-match systems. They retrieve all documents that  contain  any  words used to represent a concept (as if all words were ORd  together).  These  documents  are  then  run  against  a  mathematical algorithm that weights and ranks the documents. Statistical  methods  compare, for example, how many times the search words appear  in each record with how many times they appear in the database as a whole.  Documents  that  contain  many of the search words are given higher weights.  If  those terms appear relatively less frequently in the database as a whole, the documents that contain them are weighted even more heavily. Relative  lengths  of  each  document  are  taken into account as well. The documents  are  then  sorted by the assigned weights to display first those documents  that  best  match  the  query. Pritchard-Schoch provides a clear explanation of the history and methods of these techniques, which have been tested  for  decades.  They have been available on smaller online or CD-ROM systems and in software for in-house databases for several years [2].

                          The  most  important  question  for  experienced  searchers  is  what  will relevance  search  systems  retrieve compared to the tried-and-true Boolean search engines? Should experienced searchers use the new methods? Will your search  results  be  better  with  one technique or the other? Which method should  we  teach our end-users? Do all of the new systems achieve the same results?

 

TARGET

                          DIALOG  officially  announced  TARGET in October 1993 and made it available for  all  users in December 1993. Although it was under development off and on  for several years, it wasn't until WIN's success that DIALOG decided to get  TARGET  ready for release. The DIALOG development staff looked at many different  relevance  methods  and  tested  a  variety of algorithms before programming what we see now. TARGET  works  on all DIALOG databases, but it is best suited for full-text databases  or  those  with  lengthy abstracts. These are the databases that rely  on  free-text searching and often retrieve excessive false drops with conventional  searching.  Since relevance retrieval compares how many times words  occur  in  a  document  in  relation to the length of each document, entire documents about a topic can be differentiated from those with only a single  paragraph  or a mention in passing of the desired subject. The most relevant documents should be placed at the top of the set for display first in relevance-ranked retrieval.


 

TARGET FOR SUBJECT SEARCHING

                          TARGET works best for text searching. Just as with DIALOG's Boolean system, TARGET  defaults  to  the  basic index. Those of you who are regular DIALOG searchers  are  aware  of this distinction in its indexes. Unlike NEXIS and many  other  online  systems, DIALOG maintains separate indexes for subject and  non-subject searching. The "basic index" typically includes only words from  titles,  words  from  abstracts,  words  from full text, and words or phrases from descriptors or identifiers--fields which are all considered to represent the subjects of documents. The basic index is searched by default if a searcher doesn't specify any particular field. To  search  for  an  author,  journal  name,  corporate  source,  or  other non-subject field in Boolean DIALOG, the searcher must explicitly name that field.  (e.g.,  SELECT AU=Asimov, Isaac or SELECT JN=Library Journal). This separation  helps  avoid false drops in the regular Boolean system, because you will not, for example, retrieve articles authored by Mr. Carpenter when searching for the subject carpenter. TARGET provides two ways of searching these non-subject fields:

1)  By putting the prefix search in single quotes (e.g., target 'au=asimov,isaac')

2)  An  author  set  created  in  a Boolean search can be added to a TARGET search  (e.g.,  s  au=asimov,  isaac;  target  *s1  'life  science' biology zoology)

 

HOW TO SEARCH WITH TARGET

                          TARGET can be used in a single database or in multiple databases. Searchers can  use  a  predefined  OneSearch grouping or BEGIN in whichever databases they  desire.  Databases  are  searched  with the CURRENT option by default (current calendar year plus one year) in databases that support the CURRENT feature,  but searches can be modified to include other date ranges. If the database does not support CURRENT, TARGET will search the entire database. After beginning in a database or database group, a searcher inputs the word TARGET  to  get into the TARGET menu search mode. TARGET menu mode provides helps,  prompts, and some menu choices to guide the novice user through the search  process.  Figure 1 shows the beginning of a TARGET menu mode search session.

 

NOT NATURAL LANGUAGE

                                Even  in  the  novice TARGET mode, TARGET does not claim to support natural language. It does replace the need for Boolean or proximity connectors, but only  the  actual  words  or phrases to be searched should be entered. This differs  from  Westlaw's  WIN,  since  WIN allows a user to enter a natural language  statement  directly.  WIN's natural language interface supports a search  statement  such  as  what  is  the  government's obligation to warn military  personnel  about  their  exposure  to  radiation? The system then strips  out  common phrases (e.g., "what is the"), identifies legal phrases matched from a phrase thesaurus, and eliminates stopwords. TARGET  requires formalized input of major terms, phrases, and synonyms and does  little automatic processing. A TARGET statement might look like this: government? obligation warn? ('military personnel' soldier? sailor?) expos? 'radiation.  Just  as  with  DIALOG  Boolean  searching,  understanding the required syntax is necessary.

 

                TARGET  does  not  have  a  thesaurus,  so  the  burden  of identifying and inputting  synonyms  is  completely  on the user, just as it is in DIALOG's Boolean  system. Creating a thesaurus that would serve all of the databases on  a  supermarket  system such as DIALOG would be a daunting task. Westlaw has  an easier time of it, building a thesaurus of legal terms and phrases. FREESTYLE  has  a  general synonym-type thesaurus. To make natural language search techniques truly useful for novices, databases and systems will have to  spend  the  time and effort to develop and maintain complete multitopic thesauri.

 

TARGET MODIFICATIONS AND DISPLAY

                Search  statements  in the TARGET menu mode can be modified by choosing the Modify  option  (but  only  after a search is run and after the first three items  are displayed). Modifications can be made to add or delete terms, to change the designation of a term as a required term, or to change the dates being  searched.  TARGET statements build a set which can then be used in a Boolean search.

                TARGET  examines all of the records that contain any of the input words and calculates  likely relevance of each. The formula goes beyond just counting word  frequency  by  comparing  how many of the search terms appear in each record  with  how  many times each word appears in the database as a whole Uncommon  words that appear frequently in a document are given more weight. Unequal document lengths are taken into account as well as are proximity of search words.

                The  resulting  document ranking is used as the basis for order of display. Unlike  Boolean's reverse chronological display or a user-specified sorting order  such  as  alphabetically by author, relevance ranking displays first documents  that  are  most  likely to answer the user's query. This is good output  for  browsing until an information need is satisfied, and for those questions  where  the user doesn't need a comprehensive search. "Relevance" is  always  ultimately  subjective of course, so there is no guarantee that the  fiftieth  item displayed will be of less interest in a particular case than the fortieth, or even the first, item.

 

FREESTYLE

                             Mead's  FREESTYLE  is  available for both the LEXIS legal service and NEXIS news  service.  FREESTYLE's  performance  in LEXIS is best compared to WIN, since  LEXIS and Westlaw share many of the same legal databases and compete head-to-head  in  the  legal  research  market. We chose instead to examine FREESTYLE only in NEXIS, specifically in full-text newspapers. FREESTYLE  works on all NEXIS files, either selected individually, selected as  NEXIS  pre-specified group files, or mixed together in ad hoc groupings by  the  searcher.  After selecting a filename, searchers enter the command .FR to get to FREESTYLE mode. To return to Boolean mode, enter .BOOL.

 

PLAIN ENGLISH

                             FREESTYLE  is  closer  to  plain  English  than  is TARGET, because it will automatically  strip  stopwords  from an input query. Singulars and plurals are  automatically  searched  (but  other word form variations such as past tense  and  gerunds  are  not).  As  with  the  full  NEXIS  system, common abbreviations,  British/American  spelling,  and equivalencies (e.g., 4 and four) are also automatic. As  with  WIN,  a FREESTYLE search could be directly entered as what is the government's obligation   to  warn  military  personnel  of  exposure  to  radiation?  or  using  a shorter, more formalized statement as in TARGET If entered  in  the  former  way, "what, is, the, to, of" and "to" will all be discarded  as  stop  (noise)  words.  (In plain English searches we tested, effect,  services,  and  information  were  not  discarded  as  stopwords.) Government's  will be searched as government, governments, or government's. The  other  words  will be searched as singulars or plurals, but obligation will  not  be  truncated  to oblige, warn to warning, exposure to expose or exposed, etc.

                             In  the  first  version  of  FREESTYLE  (February  15-May  30, 1994), these variations  need  to  be  explicitly  input by the searcher (oblige obliged obligation)  because  truncation,  other  than  automatic  plurals,  is not supported  by FREESTYLE. The NEXIS symbols for user-specified truncation (! and *) did not work in the version of the software we tested.

 

FREESTYLE THESAURUS

                          Unlike TARGET,  FREESTYLE  does  have  an  accompanying  thesaurus  where searchers  can  look  for  synonyms  or  variant word forms to add to their search.  The  thesaurus is not invoked automatically; searchers must select the  thesaurus  option  from  a  Search Options screen and specify which of their  search terms they want to check for synonyms [4].

 

SEARCH OPTIONS/RESULTS

                             After  inputting a search statement but before FREESTYLE runs the search, a Search  Options  screen  is  displayed.  Search Options include viewing the thesaurus, editing the search statement, or running the search as is. Edit   choices include  adding  or  deleting  search  terms  or  phrases, designating  terms  as  mandatory,  or  adding  restrictions  such as date, byline,  etc. (If date restrictions are not selected, FREESTYLE defaults to searching  the full file. Date edits allow users to specify a specific date or  date  range.)  If  more than one edit is desired the process can take a while.  The Search Options screen must be entered for each modification and each  must  be  done  individually.  Command  stacking  provides a shortcut through  the restrictions and allows users to enter more than one option at a time.

                             Like the asterisk (*) in TARGET, designating a term as mandatory means that the  term  must  be  present  in  any  documents  retrieved  and  ranked by FREESTYLE. It adds more precision to the search by combining a Boolean-like search technique  with  relevance  ranking.  However,  in  FREESTYLE  the mandatory  designation  must  be  made after an initial search statement is entered, and the desired term must be retyped after the mandatory option is selected.

                             Since  NEXIS  has  one  large inverted index, rather than a subject-related basic  index  and non-subject field additional indexes like DIALOG, authors (bylines)  and  publication  years  will be searched if they are entered as part  of  the  initial  search  statement.  Searching for isaac asimov as a byline  in  FREESTYLE  can be done just by entering his name, but documents that  include  mentions of Isaac Asimov in the text or as a subject will be retrieved in addition to articles written by him. To gain more precision by searching  for  him  only  as  a byline, use the Restrictions choice on the Search  Options screen, followed by selecting byline. This can only be done if  you  have  already  entered  a  basic search query, however. You cannot select an author alone. When  the  search  is run, a Search Results screen is displayed. The screen reports  any  stopwords  that  were  input  in the search statement and any phrases  found  in  the  phrase  dictionary. It summarizes which terms were designated mandatory and any restrictions applied.

 

WHERE AND WHY

                While  DIALOG  has  included information about the occurrences of words and relevance  ranking  score  as  a  display option with each record, Mead has chosen  to  make this diagnostic information part of two separate commands. The WHERE and WHY commands are unique to Mead. WHERE shows which documents contain each of the search terms, and WHY shows the  level  of  importance assigned to each term by the system. If you have changed  the  display  to  more  than 25 documents, WHERE will only display information about the first 25 documents retrieved in any FREESTYLE search. This will be changed in the new release expected in June 1994. WHERE  and  WHY  have  been  favorably  received, especially by experienced searchers  [5].  WHERE  helps searchers determine which documents to browse according  to  their  own  idiosyncratic  view  of  relevance. WHY helps an experienced  searcher  determine  if  a  new  strategy should be used, if a Boolean  search  might get better results, or even if they are in the wrong database.

 

COMPARING TARGET AND FREESTYLE

                The  main  purpose  of  this article is to compare DIALOG Boolean searching with  DIALOG  TARGET and NEXIS Boolean with NEXIS FREESTYLE. We did not set out  to compare TARGET and FREESTYLE head-to-head, although some comparison is  obvious.  Most  of  the  differences in the approaches taken by the two systems reflect their differing basic philosophies.

                TARGET   puts   the   searcher   more  in  control  and  does  very  little automatically. FREESTYLE, on the other hand, does some things automatically and  attempts  to  lead  the  searcher  by  the  hand  a  bit more. This is consistent   with  the  different  focuses  of  these  systems--experienced searchers  for  DIALOG  and novice end-users for NEXIS.

 

COMPARING RELEVANCE AND BOOLEAN

                An  in-depth  comparison  of  these  Boolean  search engines with relevance search techniques requires testing real questions and searches. This should be done over time by many searchers--we have just scratched the surface. We  gathered  questions  from  reference  librarians  in four libraries and selected  six  questions  to  test.

 

 

                 We did all of the searches in the same newspapers, in an ad hoc grouping of the  Los  Angeles  Times,  Boston  Globe,  and  Washington  Post papers for 1993-1994.

 

 

QUESTION #1. What can you find out about EMFs? [Note: EMF = electromagnetic field]

 

QUESTION #2. Find any mention of Hopis.

 

QUESTION #3. What is the effect of PCBs on fish? [Note: PCB = polychlorinated biphenyl]

 

QUESTION #4.   Is   there  abusive  behavior  and  battering  in  lesbian relationships?

 

QUESTION #5 Should the state provide emergency medical services for illegal immigrants?

 

TEST SEARCH RESULTS

                On DIALOG, an average precision ratio (relevant retrieved/all retrieved) of 56%  was achieved by TARGET, compared to 61% by Boolean. NEXIS results were similar,  with 53% for FREESTYLE and 64% by Boolean

                The  better  overall  precision  with Boolean should be contrasted with the greater  number  of total documents retrieved and, at times, greater number of relevant documents, obtained by relevance searching.

 

REFERENCES

[1]  Pritchard-Schoch,  Teresa. "Natural Language Comes of Age." ONLINE 17,No. 3 (May 1993): pp. 33-43.

[2] Tenopir, Carol. "The New Generation of Online Search Software." Library Journal 117 (October 1, 1993): pp. 67-68.

[3]  WIN is not the first commercially available online system to go beyond Boolean.   That   honor   probably  belongs  to  Congressional  Quarterly's Washington Alert, which has used the Personal Librarian search engine since 1989.

[4]  The  FREESTYLE  thesaurus is a synonym list thesaurus such as Roget's, not  the kind of thesaurus defined in the ANSI (American National Standards Institute)  or ISO (International Standards Organization) standards for use with  indexing. It lists only synonyms and word form variants, and does not specify term hierarchies, such as broader terms, narrower terms, etc.

[5]  Bjorner, Susanne N. "Output Options: The .WHERE and .WHY of FREESTYLE" ONLINE 18, No. 2 (March 1994): pp. 88-91.

 

>>>>>>>>>>>>  Your Assignment   <<<<<<<<<<<<<<

 

                Suppose your boss reads the foregoing excerpt from the Tenopir and Cahn article and shouts "Why didn't they compare them head to head!?!  Here in the Pacific Northwest, we need to know which works best with The Seattle Times !!"

 

            Before you know it, you have been assigned the job of comparing Target and FreeStyle in a "head to head" competition with The Seattle Times.  You are to write a short report detailing how you compared them and your recommendations as regards their use with The Seattle Times.

 

Target Strategy:

            1. Choose one of the five questions used by Tenopir and Cahn.

            2. Think about what terms you will use in your "natural language" search, and perhaps more importantly, which terms will you demand be present, e.g.: the mandatory terms.

            3. Search Dialog's file of The Seattle Times first.  Begin file 707 and issue the command "target" to get into the target menu mode:

                        ? b 707

                        ? target

            [Note: the time frame of  Dialog's CURRENT feature.  This is crucial because you will want to restrict the FreeStyle search to the same time frame.]

            4. Do your search.  Browse the results and capture the results electronically so that you can compare the Target results with the FreeStyle results.

           

FreeStyle Strategy:

            1. Do the Dialog Target Search first!

            2. Log on to Nexis.

            3. Choose the library

                        REGNWS

            4. Choose the file for the Seattle Times

                        SEATTM

            5. Set the searching mode to FreeStyle

                        .fr

            6. Enter the Same search as you did for the Dialog Target Search.  You will have to "translate" from Target to FreeStyle to make sure that the two searches are equivalent.

 

            Note that the Nexis interface strategy is to have you enter your search terms and after you press the Enter key, to present you with the following menu:

                        <=1> Edit Search Description

                        <=2> Enter/edit Mandatory Terms

                        <=3> Enter/edit Restrictions (e.g. date)

                        <=4> Thesaurus

                        <=5> Change number of documents

 

            7. Before you command Nexis to do the search, set two restrictions:

                        1. Set the document restriction to 50.  [Dialog's target automatically sets a document restriction to 50].

                        2. Set the date restriction to the same range of date used by the Dialog search.  For example, suppose that you noted that Dialog's Current feature used the time frame of 1994 - 1995.  Then set the date restriction on FreeStyle to

                        AFT 01/01/94

            8. Do the search and capture the necessary information electronically.

 

Report

 

            Write a short report detailing your search on both Target and FreeStyle, and compare the search results.  Decide which system is "better" for your search.  If your results are inconclusive, consider the effects on your results if you had altered the search terms (i.e., changed the mandatory nature of some terms).  You may also want to broaden your results by including the results of more than one topic.