Phonetic Transcriptions of Large Speech Corpora
In case you object to the disclosure of your thesis, you can contact firstname.lastname@example.org
[S.l.] : [S.n.]
Number of pages
X, 158 p.
Radboud Universiteit Nijmegen, 07 april 2006
Promotor : Boves, L.W.J. Co-promotor : Cucchiarini, C.
Display more detailsDisplay less details
SubjectAutomatic Phonetic Transcriptions; Linguistic Information Processing
Each time a word is uttered, even pronounced by one and the same speaker, its pronunciation can differ, and may also be rather different from the canonical transcription. For research on pronunciation phenomena numerous samples of real-life speech need to be collected. Such large collections of speech are called speech corpora. Speech corpora constitute a rich resource for empirical investigations on spoken language. However, in order to be useful as a resource for pronunciation research, it is necessary to have phonetic transcriptions. The research reported on in this thesis is focused on, first, the generation of phonetic transcriptions of large speech corpora, and in relation to this, on gathering new phonological knowledge, and finally, on the evaluation of the quality of phonetic transcriptions. Since a complete manual phonetic transcription of a large speech corpus is practically impossible, recourse to automatic techniques is inevitable. We successfully developed and tested both data-driven and knowledge-based automatic phonetic transcription generation procedures that not only yielded more accurate transcriptions, but also increased the phonological knowledge on pronunciation phenomena in real-life speech (Dutch). Furthermore, a quality measure was developed for more objective assessments of both automatically and manually generated phonetic transcriptions. In general it can be concluded, that good quality broad phonetic transcription for read speech can be obtained fully automatically by using relatively simple techniques. By omitting human-made transcriptions of read speech, a lot of time and money can be saved that can be allocated for the benefit of phonetic transcriptions of speech styles for which larger deviations from a canonical representations are to be expected. For spontaneous speech human transcriptions are still the best option, although improved automatic techniques, together with a better understanding of the phonological processes underlying spontaneous speech, are likely to approximate human transcription quality for spontaneous speech styles in the future.
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.