Uncovering Bias in ASR Systems: Evaluating Wav2vec2 and Whisper for Dutch speakers

Research output: Contribution to conferencePaperAcademic

433 Downloads (Pure)

Abstract

It is crucial that ASR systems can handle the wide range of variations in speech of speakers from different demographic groups, with different speaking styles, and of speakers with (dis)abilities. A potential quality-of-service harm arises when ASR systems do not perform equally well for everyone. ASR systems may exhibit bias against certain types of speech, such as non-native accents, different age groups and gender. In this study, we evaluate two widely-used neural network-based architectures: Wav2vec2 and Whisper on potential biases for Dutch speakers. We used the Dutch speech corpus JASMIN as a test set containing read and conversational speech in a human-machine interaction setting. The results reveal a significant bias against non-natives, children and elderly and some regional dialects. The ASR systems generally perform slightly better for women than for men.
Original languageEnglish
Number of pages6
Publication statusPublished - 2023
Event2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) - National University of Science and Technology POLITEHNICA Bucharest, Bucharest, Romania
Duration: 25 Oct 202327 Oct 2023
https://sped.pub.ro/

Conference

Conference2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
Abbreviated titleSpeD
Country/TerritoryRomania
CityBucharest
Period25/10/2327/10/23
Internet address

Fingerprint

Dive into the research topics of 'Uncovering Bias in ASR Systems: Evaluating Wav2vec2 and Whisper for Dutch speakers'. Together they form a unique fingerprint.

Cite this