It is crucial that ASR systems can handle the wide range of variations in speech of speakers from different demographic groups, with different speaking styles, and of speakers with (dis)abilities. A potential quality-of-service harm arises when ASR systems do not perform equally well for everyone. ASR systems may exhibit bias against certain types of speech, such as non-native accents, different age groups and gender. In this study, we evaluate two widely-used neural network-based architectures: Wav2vec2 and Whisper on potential biases for Dutch speakers. We used the Dutch speech corpus JASMIN as a test set containing read and conversational speech in a human-machine interaction setting. The results reveal a significant bias against non-natives, children and elderly and some regional dialects. The ASR systems generally perform slightly better for women than for men.
|Number of pages
|Published - 2023
|2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) - National University of Science and Technology POLITEHNICA Bucharest, Bucharest, Romania
Duration: 25 Oct 2023 → 27 Oct 2023
|2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
|25/10/23 → 27/10/23