Increasingly, phonetic research uses data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how recording methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0, intensity, and the first and second formants, comparing files recorded using a laboratory-standard recording method (Zoom H6 recorder with an external microphone), to three remote recording methods: (1) the Awesome Voice Recorder application on a smartphone (AVR), (2) the Zoom meeting application with default settings (Zoom-default), and (3) the Zoom meeting application with the “Turn on Original Sound” setting (Zoom-raw). A linear temporal alignment issue was observed for the Zoom methods over the course of the long, recording session files; however, the difference was not significant for utterance-length files. F0 was reliably measured using all methods. Intensity and formants presented non-linear differences across methods that could not be corrected for simply. Overall, the AVR files were most similar to the H6’s, and so AVR is deemed to be a more reliable recording method than either Zoom-default or Zoom-raw.
JASA
Comparing acoustic analyses of speech data collected remotely
Cong Zhang, Kathleen Jepson, Georg Lohfink, and Amalia Arvaniti
The Journal of the Acoustical Society of America, 2021
Face-to-face speech data collection has been next to impossible globally due to COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with external microphone (henceforth H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth Phone). F0 was tracked accurately by all devices; however, for formant analysis (F1, F2, F3) Phone performed better than Zoom, i.e. more similarly to H6, though data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.
Related talks:
recording
Speech data collection at a distance: Comparing the reliability of acoustic cues across homemade recordings.
Cong Zhang, Kathleen Jepson, Georg Lohfink, and Amalia Arvaniti
179th Annual Meeting of the Acoustical Society of America USA [online], 7-11 dec 2020
Speech production data collection has been significantly impacted by COVID-19 restrictions. Sound-treated recording spaces and high-quality recording devices are inaccessible, and face-to-face interactions are limited. We investigated alternative recording methods that produce data suitable for phonetic analysis, and are accessible to people in their homes. We examined simultaneous recordings of pure tones at seven frequencies (50 Hz, every 100 Hz between 100 Hz and 600 Hz), and three repetitions of the primary cardinal vowels elicited from five trained speakers. Recordings were made using the ZOOM meeting application and non-lossy format smartphone applications (Awesome Voice Recorder, Recorder), comparing these with Zoom H6N reference recordings. F0, F1-5, and duration based on manual segmentation were measured. F0 is highly correlated between the three devices for vowels and tones. Lower formants are also significantly correlated though not as robustly. The upper formants showed more variation as reported in the literature. Both phone and ZOOM performed better for vowels than tones. Phone segmentation generated reliable duration values differing from H6N segmentation by ∼18 ms. However, irregular waveforms and filtering algorithm artefacts caused considerable differences for ZOOM (∼119 ms). Our preliminary study suggests phone recordings are a viable option for some phonetic studies (e.g., prosody). Future analysis of natural speech data will prove insightful.