Log-Based Evaluation
Resources for Question
Answering
Thomas Mandl, Julia Maria Schulz
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
1/10
Information Retrieval Logs
and Question Answering
 Users are not always aware that such different
systems exist
 The short query is a preferred way of asking for
information, but sometimes also phrases or
complete sentences are entered
 Demand for query specific treatment (Mandl &
Womser-Hacker 2005)
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
2/10
Logfile resources at CLEF
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
3/10
Information Retrieval Evaluation Resources
 GeoCLEF 2007:
 investigated and provided evaluation resources for geographic
information retrieval (Mandl et al. 2008)
 The query identification task was based on a query set from
MSN, which is no longer distributed by Microsoft
 LogCLEF 2009
 “action logs” from The European Library portal (TEL), covered
period: 1st January 2007 until 30th June 2008
 web search engine query log from the Tumba! search engine
 LogCLEF 2010
 Extended TEL query and action logs
 DIPF query logs (raw server log representing three months of
activities on the portal is made available. The size of the files is 5
GB.)
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
4/10
TEL
 The most significant columns of the table are:
 A numeric id, for identifying registered users or
“guest” otherwise;
 User’s IP address;
 An automatically generated alphanumeric, identifying
sequential actions of the same user (sessions) ;
 Query contents;
 Name of the action that a user performed;
 The corresponding collection’s alphanumeric id;
 Date and time of the action’s occurrence.
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
5/10
Question Style Queries in Query Logs I
Examples for queries from the MSN query logfile.
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
6/10
Question Style Queries in Query Logs II
 Examples for queries from the TEL logfile.
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
7/10
Stop Words in Query reformulations
 over 1/4 of all reformulations in the TEL are
additions or deletions of stop words (Ghorab et
al. 2009).
 Also question words like “where” or “when” are
common stop words in information retrieval
systems.
 Prepositions are typical in the reformulation set,
too.
 frequent use of prepositions in the Tumba!
search engine log.
 prepositions belong to the most frequent terms
in the MSN log.
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
8/10
Outlook
 CLEF has created evaluation resources for logfile
analysis which can be used for comparative
system evaluation.
 The available files do contain queries which could
be interesting for question answering systems.
 They contain full sentences as questions or
phrases which cannot be processed
appropriately by the “bag of words” approach.
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
9/10
References
Ghorab, M.R.; Leveling, J.; Zhou, D.; Jones, G.; Wade, V.: TCD-DCU at LogCLEF 2009: An Analysis of
Queries, Actions, and Interface Languages. In: Peters, C.; Di Nunzio, G.; Kurimo, M.; Mandl, T.;
Mostefa, D.; Peñas, A.; Roda, G. (Eds.): Multilingual Information Access Evaluation Vol. I Text Retrieval
Experiments: Proceedings 10th Workshop of the Cross$Language Evaluation Forum, CLEF 2009,
Corfu, Greece. Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science] to
appear. Preprint in Working Notes: http://www.clef- campaign.org/2009/working_notes/
Li, Z., Wang, C., Xie, X., Ma, W.-Y. (2008). Query Parsing Task for GeoCLEF2007 Report. In: Workingnotes
8th Workshop of the Cross$Language Evaluation Forum, CLEF 2007, Budapest, Hungary,
http://www.clef-campaign.org/2007/working_notes/LI_OverviewCLEF2007.pdf
Mandl, T., Gey, F., Di Nunzio, G., Ferro, N., Larson, R., Sanderson, M., Santos, D., Womser-Hacker, C.,
Xing, X. (2008). GeoCLEF 2007: the CLEF 2007 Cross- Language Geographic Information Retrieval
Track Overview. In: Peters, C.; Jijkoun, V.; Mandl, T.; Müller, H.; Oard, D.; Peñas, A.; Petras, V.; Santos, D.
(Eds.): Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the
Cross$Language Evaluation Forum. CLEF 2007, Budapest, Hungary, Revised Selected Papers. Berlin et
al.: Springer [Lecture Notes in Computer Science 5152] pp. 745--772.
Mandl, T., Womser-Hacker, C. (2005). The Effect of Named Entities on Effectiveness in Cross-Language
Information Retrieval Evaluation. In: Proceedings of 2005 ACM SAC Symposium on Applied
Computing (SAC). Santa Fe, New Mexico, USA. March 13.-7. pp. 1059--1064.
Mandl, T.; Agosti, M.; Di Nunzio, G.; Yeh, A., Mani, I.; Doran, C.; Schulz, J.M. (2010): LogCLEF 2009: the
CLEF 2009 Cross-Language Logfile Analysis Track Overview. In: Peters, C.; Di Nunzio, G.; Kurimo, M.;
Mandl, T.; Mostefa, D.; Peñas, A.; Roda, G. (Eds.): Multilingual Information Access Evaluation Vol. I Text
Retrieval Experiments: Proceedings 10th Workshop of the Cross$Language Evaluation Forum, CLEF
2009, Corfu, Greece. Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer
Science] to appear. Preprint in Working Notes: http://www.clefcampaign.org/2009/working_notes/LogCLEF-2009-Overview-Working-Notes-2009-09-14.pdf
Thomas Mandl, Julia Maria Schulz
LREC 2010, Web Logs & QA, 22.05.2010
10/10
Descargar

Folie 1 - University of Limerick