WIT Press

On The Relationship Between Click Rate And Relevance For Search Engines


Free (open access)








261 kb

Paper DOI



WIT Press


K. Ali & C. C. Chang


Evaluation of search engine result relevance has traditionally been an expensive process done by human judges. Researchers have sought cheap automated proxies for such judgments. This paper examines the relationship between relative click rates (of two engines) and relative human judgments of result sets returned by those engines. Previous work has indicated that human judgments are more consistent if provided in a relative form. We additionally observe that clicks are a function not only of the clicked result, but also of its competing neighborhood. These observations force an experimental design where we collect relative judgments of sets of results, rather than judgments on individual results. We conduct a large empirical study using forty judges, thousands of live users and hundreds of queries. Our results comparing Yahoo with another search engine in October 2003 show that in aggregate, higher click rate is indicative of higher relevance but the strength of the association is only moderate 40%. Qualitative analysis suggests the association is not stronger because users click for reasons other than relevance such as curiosity and confusion. However, there are classes of queries (such as navigational queries) for which click rates are good indicators of relevance. Keywords: information retrieval, evaluation, relevance, modeling, statistical tests, Bootstrap, Wilcoxon, correlation, association. 1 Introduction The predominant methodology for evaluating the quality of information retrieval systems is based on per-document relevance judgments. Given a set of topics, documents for each topic, and per-document judgments, metrics for precision are computed and compared across different systems [1]. For search engines, implicit user behavior in the form of click data has been assumed to be a key proxy for


information retrieval, evaluation, relevance, modeling, statistical tests,Bootstrap, Wilcoxon, correlation, association.