next up previous
Next: Discussion and Recommendations Up: Measuring Search Engine Quality Previous: Performance Superiority over a

Query Difficulty and Correlates with Other Performance Characteristics

The difficulty associated with an individual query, A, may be compared to other query-specific performance figures in an effort to validate the use of the A measure. A strong correlation between the measures would support the validity of the proposed A measure. While a correlation between A and a measure M may show a relationship, it does not necessarily imply that A is measuring the same phenomenon as M. In Paris and Tibbo [PT98], a set of E values are reported that were correlated in our study with A values. The E values were obtained at the highest recall available for that particular query from the CF database. An E value was unreported for query number 2 which had no relevant documents. We conservatively chose to use the worst-case E value (1) as the E value for this query in this study. The Spearman rank correlation between the A values and the E values is .523, and the Pearson product moment correlation is .407. We may interpret these strong correlations as indicating the degree to which the value of a traditional performance measure such as E is due to the difficulty of the individual queries. There is a positive correlation between the A values and the number of natural language terms in the query, with the Pearson correlation being .172 and the Spearman rank correlation being .126. This suggests that shorter queries produce better results than do longer queries, which is contrary to the idea that the increased richness obtained with longer queries makes up for the additional noise created by adding terms. Several factors may be at work here. Some of the longer queries include details about what the searcher wants, for example, the clause at the end of query 34, ``... what are their relative advantages and disadvantages?" Query 37 adds a second question ``... and what factors contribute to erroneous results of these tests?" These longer queries express information needs that are inherently more abstract and are less topical. They add little to the performance of a term-matching or weighting search engine, although these additional clauses are certainly helpful to human searchers in developing queries and evaluating documents. The correlation between the number of terms from a public domain medical dictionary and the A values was negligible, suggesting that query difficulty isn't simply a matter of adding or deleting sublanguage terms from natural language queries. The unnamed machine readable medical dictionary was obtained from the PC-SIG library of public domain software (Disk 4160, 13th edition, CDROM version) and was manually supplemented to include most of the specialized medical terms found in the CF database.
next up previous
Next: Discussion and Recommendations Up: Measuring Search Engine Quality Previous: Performance Superiority over a
Bob Losee
1999-07-29