Subject:
|
Data Science Language & Speech Technology Language in Society |
Organization:
|
CLST - Centre for Language and Speech Technology Data Science Communicatie- en informatiewetenschappen |
Book title:
|
Proceedings of the 27th Benelux Conference on Artificial Intelligence (BNAIC 2015) |
Abstract:
|
In the DISCOSUMO project, we aim to develop a computational toolkit to automatically summarize
discussion forum threads. In this paper, we present the initial design of the toolkit, the data that
we work with and the challenges we face. Discussion threads on a single topic can easily consist of
hundreds or even thousands of individual contributions, with no obvious way to gain a quick overview
of what kind of information is contained within the thread. We address the summarization of forum
threads with domain-independent and language-independent methodology. We evaluate our system
on data from four different web forums, covering different domains, languages and user communities.
Our approach is largely unsupervised, using recurrent neural networks. Evaluation of the first version
should point out where in the pipeline supervised techniques and/or heuristics are required to improve
our summarization toolbox. If successful, the automatic summarization of discussion forum threads
will play an important role in facilitating easy participation in online discussions.
|