next up previous
Next: Comparing Retrieval or Search Up: Measuring Search Engine Quality Previous: Experimental Rankings

System Quality and Query or Subject Difficulty

Given the analytic model of retrieval, we may compute the A values for each query (Ai represents the A value for the $i^{\mathit th}$ query) and the Q for each retrieval engine, where Qj represents the quality (probability of optimal ranking) of search engine j. The A values may be interpreted as the level of difficulty associated with retrieving the relevant documents on the topic represented by various formulations of the query. The Q values may be interpreted as the quality of each search mechanism. We compute these values by performing a rather lengthy regression. Our goal is to solve for the various values of Ai and Qj for each query and each search engine, finding the set of A and Q values that minimize the errors made in estimating the ASL values. This is a complex problem, and there are no standard simple procedures for solving it. We can treat the problem as being to solve a non-linear regression of the form
\begin{multline*}ASL = N \Bigl[
\left(x_1 A_1 + x_2 A_2 + \cdots + x_{100} A_{1...
...eft(y_1 Q_1 + y_2 Q_2 + \cdots + y_6 Q_6\right)\right)
\Bigr].
\end{multline*}
Here the ASL is the dependent variable and the parameters $Q_1, Q_2, \ldots, Q_6$ and $A_1, A_2, \ldots, A_{100}$ are independent variables to be estimated by the regression package. The variable xi is an indicator variable that has the value 1 when the query in question is query i, and 0 otherwise. The variable yi similarly is an indicator variable that has the value 1 when the retrieval engine being used is retrieval engine number i, and 0 otherwise. The data set contains 600 document rankings, one for each combination of the six search techniques and for each of the 100 queries. The N values are set to the correct number of documents for each database. The numbers that are obtained from these regressions are inexact. They are estimates that would be better with a larger sample of queries and documents from which to make the estimates. The standard errors for estimating Q values are all approximately 0.014, while the standard errors for estimating A values are approximately 0.056. The Q values reflect the database from which they are derived. The A values are query specific and reflect the nature of the relevance judgments and the documents available. The Q values are computed so as to mathematically complement the A values so the regression formula produces an ASL values with minimal error. While Q values clearly will vary due to the characteristics of a specific database, the variance should be relatively small compared to the variation obtained with other measures of retrieval performance quality, such as precision. In the following section we examine the Q values and their robustness.
next up previous
Next: Comparing Retrieval or Search Up: Measuring Search Engine Quality Previous: Experimental Rankings
Bob Losee
1999-07-29