Improving cross-domain n-gram language modelling with skipgrams
[S.l.] : Association for Computational Linguistics
InProceedings of the 54th Conference of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 137-142
54th Conference of the Association for Computational Linguistics, 7 augustus 2016
Article in monograph or in proceedings
Display more detailsDisplay less details
Communicatie- en informatiewetenschappen
Proceedings of the 54th Conference of the Association for Computational Linguistics (Volume 2: Short Papers)
SubjectLanguage & Speech Technology; Language in Society; Nederlab; What's in the bag for latent variable language modelling; Nederlab; What's in the bag for latent variable language modelling
In this paper we improve over the hierarchical Pitman-Yor processes language model in a cross-domain setting by adding skipgrams as features. We find that adding skipgram features reduces the perplexity. This reduction is substantial when models are trained on a generic corpus and tested on domain-specific corpora. We also find that within-domain testing and cross-domain testing require different backoff strategies. We observe a 30-40% reduction in perplexity in a cross-domain language modelling task, and up to 6% reduction in a within-domain experiment, for both English and Flemish-Dutch.
Upload full text
Use your RU credentials (u/z-number and password) tolog in with SURFconextto upload a file for processing by the repository team.