|
|
Abstract:
The performance of probabilistic information retrieval systems and search engines is studied where differing statistical dependence assumptions are used when estimating the probabilities inherent in the retrieval model. Experimental results using the Bahadur Lazarsfeld expansion suggest that the greatest degree of performance increase is achieved by incorporating term dependence information in estimating Pr(d|rel). It is suggested that incorporating dependence in Pr(d|rel) to degree 3 be used; incorporating more dependence information results in relatively little increase in performance. Experiments examine the span of dependence in natural language text, the window of terms in which dependencies are computed and their effect on information retrieval performance. Results provide additional support for the notion of a window of 3 to 5 terms in width; terms in this window may be most useful when computing dependence.
Return to Losee home page at http://www.ils.unc.edu/~losee