Mining The Web To Validate Answers To Natural Language Questions
Free (open access)
B Magnini, M Negri, R Prevete & H Tanev
Answer validation is the ability to automatically judge the relevance of a candidate answer with respect to a given question. This paper investigates answer validation following a data-driven approach that considers textual passages (i.e. snippets) retrieved from the Web as the main source of information. Snippets are then analyzed in order to maximize the density of relevant keywords belonging both to the question and to the answer. Results obtained on a corpus of human-judged factoid question-answer pairs submitted by participants at TREC-2001 show a satisfactory degree of success rate (i.e. 86Yo). In addition, the efficiency of the methodology (documents are not downloaded) makes the approach suitable to be integrated as a module in the architecture of a question answering system. 1 Introduction Textual Question Answering (QA) aims at identifying the answer to a question in large collections of documents. QA systems are presented with natural language questions and the expected output is either the actual answer identified in a text or small text fragments containing the answer. A common QA architecture includes three basic components: a question processing component, generally based on some kind of linguistic analysis; a search component, based on information retrieval techniques; an answer processing component, which exploits the similarity between the question and the documents to identify the correct answer. Answer validation is a recent issue in QA, where open domain systems are often required to rank huge amounts of candidate answers. More precisely, given a question q and a candidate answer a, the answer validation task is defined as the capability to assess the relevance of a with respect to q.