Towards a better understanding of language model information retrieval
Publication year
2008Publisher
Glasgow : University of Glasgow
In
Proceedings of the 2nd BCS IRSG Symposium on Future Directions in Information Access 2008, pp. 30-37Publication type
Article in monograph or in proceedings
Display more detailsDisplay less details
Organization
SW OZ DCC AI
Former Organization
SW OZ NICI KI
Book title
Proceedings of the 2nd BCS IRSG Symposium on Future Directions in Information Access 2008
Page start
p. 30
Page end
p. 37
Subject
Cognitive artificial intelligence; DI-BCB_DCC_Theme 4: Brain Networks and Neuronal CommunicationAbstract
Language models form a class of successful probabilistic models in information retrieval. However, knowledge of why some methods perform better than others in a particular situation remains limited. In this study we analyze what language model factors influence information retrieval performance. Starting from popular smoothing methods we review what data features have been used. Document length and a measure of document word distribution turned out to be the important factors, in addition to a distinction in estimating the probability of seen and unseen words. We propose a class of parameter-free smoothing methods, of which multiple specific instances are possible. Instead of parameter tuning however, an analysis of data features should be used to decide upon a specific method. Finally, we discuss some initial experiments.
This item appears in the following Collection(s)
- Academic publications [243179]
- Electronic publications [129877]
- Faculty of Social Sciences [29982]
- Open Access publications [104407]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.