Recording
Remote recording methods comparison
Due to COVID-19, we could not collect production data face to face. Therefore, we investigated a few popular and convenient recording methods.
Related articles:
- remoteInvestigating differences in lab-quality and remote recording methods with dynamic acoustic measuresCong Zhang, Kathleen Jepson, and Yu-Ying ChuangLaboratory Phonology, 2024
Increasingly, phonetic research uses data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how recording methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0, intensity, and the first and second formants, comparing files recorded using a laboratory-standard recording method (Zoom H6 recorder with an external microphone), to three remote recording methods: (1) the Awesome Voice Recorder application on a smartphone (AVR), (2) the Zoom meeting application with default settings (Zoom-default), and (3) the Zoom meeting application with the “Turn on Original Sound” setting (Zoom-raw). A linear temporal alignment issue was observed for the Zoom methods over the course of the long, recording session files; however, the difference was not significant for utterance-length files. F0 was reliably measured using all methods. Intensity and formants presented non-linear differences across methods that could not be corrected for simply. Overall, the AVR files were most similar to the H6’s, and so AVR is deemed to be a more reliable recording method than either Zoom-default or Zoom-raw.
@article{zhang2024investigating, title = {Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures}, author = {Zhang, Cong and Jepson, Kathleen and Chuang, Yu-Ying}, year = {2024}, journal = {Laboratory Phonology}, volume = {24}, issue = {1}, doi = {https://doi.org/10.16995/labphon.10492} }
- JASAComparing acoustic analyses of speech data collected remotelyCong Zhang, Kathleen Jepson, Georg Lohfink, and Amalia ArvanitiThe Journal of the Acoustical Society of America, 2021
Face-to-face speech data collection has been next to impossible globally due to COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with external microphone (henceforth H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth Phone). F0 was tracked accurately by all devices; however, for formant analysis (F1, F2, F3) Phone performed better than Zoom, i.e. more similarly to H6, though data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.
@article{zhang2021comparing, author = {Zhang, Cong and Jepson, Kathleen and Lohfink, Georg and Arvaniti, Amalia}, year = {2021}, title = {{Comparing acoustic analyses of speech data collected remotely}}, doi = {10.1121/10.0005132}, issn = {0001-4966}, journal = {The Journal of the Acoustical Society of America}, number = {6}, pages = {3910--3916}, publisher = {Acoustical Society of America}, volume = {149} }
Related talks:
- recordingSpeech data collection at a distance: Comparing the reliability of acoustic cues across homemade recordings.Cong Zhang, Kathleen Jepson, Georg Lohfink, and Amalia Arvaniti179th Annual Meeting of the Acoustical Society of America
USA [online], 7-11 dec 2020Speech production data collection has been significantly impacted by COVID-19 restrictions. Sound-treated recording spaces and high-quality recording devices are inaccessible, and face-to-face interactions are limited. We investigated alternative recording methods that produce data suitable for phonetic analysis, and are accessible to people in their homes. We examined simultaneous recordings of pure tones at seven frequencies (50 Hz, every 100 Hz between 100 Hz and 600 Hz), and three repetitions of the primary cardinal vowels elicited from five trained speakers. Recordings were made using the ZOOM meeting application and non-lossy format smartphone applications (Awesome Voice Recorder, Recorder), comparing these with Zoom H6N reference recordings. F0, F1-5, and duration based on manual segmentation were measured. F0 is highly correlated between the three devices for vowels and tones. Lower formants are also significantly correlated though not as robustly. The upper formants showed more variation as reported in the literature. Both phone and ZOOM performed better for vowels than tones. Phone segmentation generated reliable duration values differing from H6N segmentation by ∼18 ms. However, irregular waveforms and filtering algorithm artefacts caused considerable differences for ZOOM (∼119 ms). Our preliminary study suggests phone recordings are a viable option for some phonetic studies (e.g., prosody). Future analysis of natural speech data will prove insightful.
@conference{zhang2020speech-c, author = {Zhang, Cong and Jepson, Kathleen and Lohfink, Georg and and Amalia Arvaniti}, title = {Speech data collection at a distance: Comparing the reliability of acoustic cues across homemade recordings.}, booktitle = {179th Annual Meeting of the Acoustical Society of America}, year = {2020}, month = {7-11 Dec}, location = {USA [online]} }