Precedence Errors
September 14, 1995
[In this document, incorrect query syntax is italicized.]
Query-oriented information retrieval systems like Dialog, DataStar and EPIC feature both Boolean operators and proximity operators. Boolean operators such as
And, Or, Not,
are meant to be used at a document level. Search arguments connected by these operators may occur in several fields of a document, and the exact placement of a search argument in a certain field is not important. Examples are:
[Dialog] ? s dog and cat
[DataStar] 1_: dog and cat
[EPIC] 1=> f dog and cat
These queries will find records that include both "dog", and as well "cat". The exact location of these search arguments in the records is not important.
Proximity operators permit an inquirer to specify the location of a search argument as next to another word, in a sentence or in a field of a document. Word-level operators permit a searcher to require two or more words in a specific sequence: i.e., one word immediately before another. Examples are:
[Dialog] ? s dog () food
[DataStar] 1_: dog adj food
[EPIC] 1=> f dog food
These examples require that the word "dog" immediately precede the word "food".
Online vendors use a variety of different proximity operators. Most of them follow the natural units of text: word-level, sentence-level, and paragraph-level operators.
The Order of Precedence
The most efficient method for a computer to perform a combination of Boolean and proximity operators is to do the most specific ones first, and then progressively do broader ones. This means that the order of operator precedence is the following:
First: 1. Word level operators
2. Sentence level operators
3. Paragraph level operators
Last: 4. Document level operators
The "or" operator is the exception to this precedence order. It is a document level operator that has the effect of a word level operator. One may use the "or" operator with impunity, because its role is to split a query into parts. The parts are then processed separately. For example:
(dog or cat) adj food
will be processed as if the query was
dog adj food or cat adj food
The operator that causes the greatest difficulty is the "and" operator. The "and" operator demands that satisfactory records contain both of the arguments. Many of the examples below illustrate the compromises made to accommodate the misuse of the "and" operator.
The natural order of precedence is to do the narrow operators before the broad operators, and this is what will occur if you do not force some elements to be processed before other elements. Examples of queries with operators at more than one level are the following:
[Dialog] ? s dog () food and meat
[DataStar] 1_: dog adj food and meat
[EPIC] 1=> f dog food and meat
The meaning of these queries is to find the phrase "dog food" and as well the word "meat" in the same record.
Precedence Errors
Precedence errors occur when you force a query system to do some parts of a query before other parts, and this process forces the system to do broad operators before it does narrow operators. Therefore, there is really only one kind of error in precedence order that one make; that is, to force the query system to perform a broad operator before a more narrow one. One can make this error in two ways which leads to precedence errors of two types: (1) Parenthetical errors and (2) Back reference errors. These two types of precedence errors are illustrated in the remainder of this document.
Parenthetical Precedence Errors
A parenthetical precedence error is committed by using parentheses to force a query system to do a broader operator before a more narrow one. Commonly in query language systems, parenthetical elements are given priority. The most deeply embedded parenthetical element in a query is done first. Depending on the system one is using, various strategies are used to create a more acceptable query and thereby nullify the effect of the parenthetical precedence error. Sometimes, the results of these strategies make the retrieval document characteristics unpredictable.
DataStar
1_: (cat and dog).ti.
The parenthetical portion of this query specifies that acceptable records have both "cat" and "dog", regardless of their location in one or more paragraphs. It is unclear how the suffix portion is to qualify the parenthetical elements: either one or both elements might be desired to be in the title. Since the query is ambiguous, DataStar objects with the diagnostic that a document level "and" operator cannot precede the title field qualification.
Less ambiguous queries are any of the following. Each clearly specifies which word(s) is to be in the title field:
2_: cat.ti. and dog
3_: cat and dog.ti.
4_: cat.ti. and dog.ti.
5_: (cat and dog) adj food
The parenthetical portion of this query specifies that acceptable records have both "cat" and "dog", regardless of their location in one or more paragraphs. It is unclear how the suffix portion should qualify the parenthetical arguments. The inquirer may be looking for just "dog food", or just "cat food", or perhaps both "cat food" and "dog food". Since the query is ambiguous, DataStar objects that the document level "and" operator cannot precede the more narrow word level operator "adj".
Less ambiguous queries are any of the following. Each clearly specifies how the "food" is to qualify the parenthetical arguments:
6_: cat adj food and dog
7_: dog adj food and cat
8_: cat adj food and dog adj food
EPIC
1=> f (cat and dog) w food
The parenthetical portion is at the document level, and EPIC objects to the word level suffix. Less ambiguous queries would be:
2=> f cat w food and dog
3=> f dog w food and cat
4=> f cat w food and dog w food
5=> f ti (cat and dog)
While this is theoretically a parenthetical precedence error, EPIC is programmed to accept parentheses after a index label and apply the index label to each term in the parenthetical expression. In effect, EPIC translates the given expression to the more acceptable expression, which is not a precedence error:
6=> f ti cat and ti dog
Dialog
? s (cat and dog)/ti
Theoretically this is a parenthetical precedence error. Dialog makes the assumption that each parenthetical term is to be qualified and searches for the following:
cat/ti
dog/ti
? s (cat and dog) () food
Theoretically this is a parenthetical precedence error. The strategy that Dialog uses is to process the "(cat and dog)" records looking for "cat food" or "dog food". This is equivalent to the following query:
? s ((cat and dog) and cat () food) or ((cat and dog) and dog () food)
The essence of this search is that satisfactory records will include both "cat" and "dog" and may include either "cat food" or "dog food". Note that not every record will have both "cat food" and "dog food", hence there is some unpredictability about the results.
Back Reference Precedence Errors
A back reference is a query that includes the set number of one or more previously created sets (these previously created sets may themselves include references to prior sets, and so on). A back reference error is committed by qualifying an already existing set by an operator that is more narrow that the one(s) used to create the set.
DataStar
3_: cat and dog
4_: 3.ti.
5_: 3 adj food
Set three was created with the document level "and" operator. All subsequent manipulations of this set must be at the document level. Statements 4 and 5 are back reference precedence errors. To attempt to qualify set 3 with a paragraph qualifier or a word level operator commits a precedence error.
EPIC
3=> f dog and cat
4=> f ti S3
EPIC is programmed to reject this back reference. An index label may only qualify a search argument. Statement 4 is also a back reference precedence error.
5=> f S3 w food
This is not a back reference error. EPIC takes the records of S3 and then reprocesses them looking for "dog food" or "cat food". It is the equivalent to the following command:
f ((dog and cat) and dog w food) or ((dog and cat) and cat w food)
The essence of this operation is that all the records will have "dog" and as well "cat", some of the records may have "dog food", some may have "cat food", and some may have both of these expressions.
Dialog
? s cat and dog
s3 cat and dog
? s s3/ti
Theoretically this is a back reference error. Dialog cautions users about the unpredictable nature of the results when a back reference is made to a set created with "and" or "not". Records may have "cat" in the title, or "dog" in the title, or both. In other words, making this back reference is equivalent to the following command:
? s ((cat and dog) and cat/ti) or ((cat and dog) and dog/ti)
The essence of this query is that a record may have "cat" in the title, or may have "dog" in the title, or may have both of these words in its title.
? s s3 () food
This is not a back reference error. Dialog translates this request into the following:
? s ((cat and dog) and cat () food) or ((cat and dog) and dog () food)
The essence of this query is that a record may contain "cat food" or it may contain "dog food" or it may contain both of these expressions.