Creating and Using Large Monolingual Parallel Corpora for Sentential Paraphrase Generation
Publication year
2014Publisher
Paris : European Language Resources Association (ELRA)
ISBN
9782951740884
In
Calzolari, N; Choukri, K; Declerck, T (ed.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp. 4295-4299Annotation
Ninth International Conference on Language Resources and Evaluation (LREC'14), 26 mei 2014
Publication type
Article in monograph or in proceedings
Display more detailsDisplay less details
Editor(s)
Calzolari, N
Choukri, K
Declerck, T
Loftsson, H
Maegaard, B
Mariani, J
Moreno, A
Odijk, J
Piperidis, S
Organization
CLST - Centre for Language and Speech Technology
Communicatie- en informatiewetenschappen
Languages used
English (eng)
Book title
Calzolari, N; Choukri, K; Declerck, T (ed.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Page start
p. 4295
Page end
p. 4299
Subject
ADNEXT (Adaptive Information Extraction over Time); Language & Speech Technology; Language in Society; NederlabAbstract
In this paper we investigate the automatic generation of paraphrases by using machine translation techniques. Three contributions we make are the construction of a large paraphrase corpus for English and Dutch, a re-ranking heuristic to use machine translation for paraphrase generation and a proper evaluation methodology. A large parallel corpus is constructed by aligning clustered headlines that are scraped from a news aggregator site. To generate sentential paraphrases we use a standard phrase-based machine translation (PBMT) framework modified with a re-ranking component (henceforth PBMT-R). We demonstrate this approach for Dutch and English and evaluate by using human judgements collected from 76 participants. The judgments are compared to two automatic machine translation evaluation metrics. We observe that as the paraphrases deviate more from the source sentence, the performance of the PBMT-R system degrades less than that of the word substitution baseline system.
This item appears in the following Collection(s)
- Academic publications [246216]
- Electronic publications [133861]
- Faculty of Arts [30004]
- Open Access publications [107344]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.