DutchSemCor: Building a semantically annotated corpus for Dutch
Publication year
2011Publisher
Ljubljana : Trojina, Institute for Applied Slovene Studies
ISBN
9789619298336
In
Kosem, I.; Kosem, K. (ed.), Electronic lexicography in the 21st century: New applications for new users, pp. 286-296Annotation
eLex 2011
Publication type
Article in monograph or in proceedings
Display more detailsDisplay less details
Editor(s)
Kosem, I.
Kosem, K.
Organization
Communicatie- en informatiewetenschappen
Former Organization
Bedrijfscommunicatie
Languages used
English (eng)
Book title
Kosem, I.; Kosem, K. (ed.), Electronic lexicography in the 21st century: New applications for new users
Page start
p. 286
Page end
p. 296
Subject
Aligned constructions in machine translation; Professional CommunicationAbstract
State of the art Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years, while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. Part of this
corpus (circa 300K examples) is manually tagged. The remainder is automatically tagged using different WSD systems and validated by human annotators. The project uses existing corpora compiled in other projects; these are extended with Internet
examples for word senses that are less frequent and do not (sufficiently) appear in the corpora. We report on the status of the project and the evaluations of the WSD systems with the current training data.
This item appears in the following Collection(s)
- Academic publications [238430]
- Electronic publications [122512]
- Faculty of Arts [29387]
- Open Access publications [97507]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.