next up previous
Next: Bibliography Up: Measuring Search Engine Quality Previous: Query Difficulty and Correlates

Discussion and Recommendations

The work here has addressed the question of the relative performance of several different retrieval engines and systems, as well as the difficulty associated with retrieving documents for specific queries or topics. We have developed a technique for estimating the probability of optimal ranking for a retrieval engine, allowing us to isolate this value which characterizes the quality of a search engine from the query-retrieval difficulty, associated with retrieving the specific query and the documents relevant to the query. These query-specific A values correlate with other performance measures, such as E, providing empirical support for the usefulness of A.

The results suggest that slightly better subject-based retrieval performance is obtained with best-case Boolean searching or the ranking engine used by Freestyle when compared to the ranking engine used by Target.

Sembok and Van Rijsbergen [SV90] noted, before the introduction of Target or Freestyle, that ``the keyword approach with statistical techniques has reached its theoretical limit and further attempts for improvement are considered a waste of time." While this statement may be a bit strong, there is little difference between the two commercial search engines in terms of performance, despite commercial pressures to develop a better search engine, and this performance may be about the best obtainable without using much more sophisticated techniques and knowledge, that is, without revolutionary changes in retrieval theory or practice.

The research discussed here has been based on tests using the CF dataset, described in Shaw et al. [SWWT91]. This dataset has exhaustive relevance judgments and is thus an excellent database for many research purposes. While the CF database can be used in experimental systems, the same set of documents also can be retrieved from existing commercial systems, making the dataset invaluable for the study of commercial system performance. However, full-text systems containing entire documents, instead of just titles, abstracts, and descriptors, can be expected to perform somewhat differently, and this study provides only an approximation of the performance that would be obtained with retrieving full documents using these particular commercial search engines. Future research might address further aspects of the query and how its characteristics affect performance. By computing the correlations between A and other factors, we can look at measures of query-specific retrieval difficulty and other factors that may cause the query to be effective or ineffective at separating the documents that the user considers to be relevant from those documents the user considers to be non-relevant. We may also consider more elaborate analytic multivariate approaches to the study of retrieval performance. Given much larger sets of documents, multivariate techniques [Los98] can be used to more accurately estimate both the performance of different search engines and the query difficulty due to specific characteristics of a query.


next up previous
Next: Bibliography Up: Measuring Search Engine Quality Previous: Query Difficulty and Correlates
Bob Losee
1999-07-29