Subject:
|
Language & Speech Technology Language in Society Nederlab Nederlab |
Organization:
|
Communicatie- en informatiewetenschappen |
Journal title:
|
Traitement Automatique des Langues
|
Abstract:
|
Predicting liaison in French is a non-trivial problem to model. We compare a memory-based machine-learning algorithm with a rule-based baseline. The memory-based learner is trained to predict whether liaison occurs between two words on the basis of lexical, orthographic, morphosyntactic, and sociolinguistic features. Best performance is obtained using only a selection of lexical and syntactic features, yielding a best overall performance at a precision of .80, with recall at .85. Counter to our expectations, including sociolinguistic features even lowered the precision and recall of our predictions. The F-scores of the memory-based algorithm are higher than that of a simple baseline and three other state-ofthe-art machine-learning algorithms. Based on the results on optional liaison, it appears that predicting liaison benefits from being able to generalize from specific examples in context.
|