
Fulltext:
127062.pdf
Embargo:
until further notice
Size:
847.1Kb
Format:
PDF
Description:
Publisher’s version
Publication year
2014Number of pages
14 p.
Source
Information Processing & Management, 50, 4, (2014), pp. 554-567ISSN
Publication type
Article / Letter to editor

Display more detailsDisplay less details
Organization
CLST - Centre for Language and Speech Technology
Communicatie- en informatiewetenschappen
Journal title
Information Processing & Management
Volume
vol. 50
Issue
iss. 4
Languages used
English (eng)
Page start
p. 554
Page end
p. 567
Subject
ADNEXT (Adaptive Information Extraction over Time); Data Science; Hybrid Dependency Parsing of Technical Texts (PHASAR/IP); Language & Speech Technology; Language in Society; NederlabAbstract
We digitized three years of Dutch election manifestos annotated by the Dutch political scientist Isaac Lipschits. We used these data to train a classifier that can automatically label new, unseen election manifestos with themes. Having the manifestos in a uniform XML format with all paragraphs annotated with their themes has advantages for both electronic publishing of the data and diachronic comparative data analysis. The data that we created will be disclosed to the public through a search interface. This means that it will be possible to query the data and filter them on themes and parties. We optimized the Lipschits classifier on the task of classifying election manifestos using models trained on earlier years. We built a classifier that is suited for classifying election manifestos from 2002 onwards using the data from the 1980s and 1990s. We evaluated the results by having a domain expert manually assess a sample of the classified data. We found that our automatic classifier obtains the same precision as a human classifier on unseen data. Its recall could be improved by extending the set of themes with newly emerged themes. Thus when using old political texts to classify new texts, work is needed to link and expand the set of themes to newer topics.
This item appears in the following Collection(s)
- Academic publications [227244]
- Electronic publications [108520]
- Faculty of Arts [28658]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.