Estimating feature reliability and doing speech decoding at once: a first (failed) attempt.
Display more detailsDisplay less details
SubjectSpeech Technology and Information Processing; Technologie en informatieverwerking
The acoustic environment in which speech is recorded has a strong influence on the statistical distributions of observed acoustic features. In order to make ASR insensitive to noise it is crucial that these distributions are similar in the training and testing condition. Mostly, it is attempted to compensate for the impact of noise by estimating the noise characteristics from the signal. We explored the feasibility of a new method to increase noise robustness: We try to exploit the a priori knowledge that is stored in clean speech models. Using Mel bank log-energy features, recognition accuracy was monitored while an increasing number of model components (chosen differently for each state) were ignored. This strategy aims at recognition results that are determined more strongly by the match in the high-energy rather than by the mismatch in the low-energy model components. Application of the new method to clean speech data confirms that discarding components below a certain energy threshold does not deteriorate recognition performance. Experiments with noisy data, however, show that performance gains are relatively small. An analysis of the poor results is presented and used to distill future research questions.
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.