Speech Prosody in Interaction - The form and function of intonation in human communication
SPRINT, i.e. Speech Prosody in Interaction: The form and function of intonation in human communication, is a project funded by the European Research Council (ERC-ADG-835263), PI: Professor Amalia Arvaniti. Please visit https://www.sprintproject.io/ for full information.
Related articles:
remote
Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures
Increasingly, phonetic research uses data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how recording methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0, intensity, and the first and second formants, comparing files recorded using a laboratory-standard recording method (Zoom H6 recorder with an external microphone), to three remote recording methods: (1) the Awesome Voice Recorder application on a smartphone (AVR), (2) the Zoom meeting application with default settings (Zoom-default), and (3) the Zoom meeting application with the “Turn on Original Sound” setting (Zoom-raw). A linear temporal alignment issue was observed for the Zoom methods over the course of the long, recording session files; however, the difference was not significant for utterance-length files. F0 was reliably measured using all methods. Intensity and formants presented non-linear differences across methods that could not be corrected for simply. Overall, the AVR files were most similar to the H6’s, and so AVR is deemed to be a more reliable recording method than either Zoom-default or Zoom-raw.
Speech Prosody
The many shapes of H*
Stella Gryllia, Amalia Arvaniti, Cong Zhang, and Katherine Marcoux
We examined individual and task-related variability in the realization of Greek nuclear H* followed by L-L% edge tones. The accents (N = 748) were elicited from native speakers of Greek, producing scripted and unscripted speech, and examined using functional Principal Components Analysis. The accented vowel onset was used for landmark registration to capture accent shape and the alignment of the fall. The resulting PCs were analysed using LMEMs (fixed factors: speaker; task type (scripted, unscripted); accented syllable distance from the analysis window offset, to examine the effects of tonal crowding). Tonal scaling and the steepness of the fall (reflected in PC1 and PC2 respectively) changed by task in ways that differed across speakers. PC3, which captured accent shape, also varied by speaker, reflecting shape differences between a rise-fall and (the expected) plateau-plus-fall realization. Tonal crowding did not have consistent effects. In short, the overall accent shape and the alignment of the accentual fall varied by speaker and task. These results hint at substantial variability in tonal realization. At the same time, they indicate that tonal alignment is not as consistent as is sometimes portrayed and thus it should not be the sole criterion for tone categorization.
JASA
Comparing acoustic analyses of speech data collected remotely
Cong Zhang, Kathleen Jepson, Georg Lohfink, and Amalia Arvaniti
The Journal of the Acoustical Society of America, 2021
Face-to-face speech data collection has been next to impossible globally due to COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with external microphone (henceforth H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth Phone). F0 was tracked accurately by all devices; however, for formant analysis (F1, F2, F3) Phone performed better than Zoom, i.e. more similarly to H6, though data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.
Related talks:
prosody
The many shapes of H*
Stella Gryllia, Amalia Arvaniti, Cong Zhang, and Katherine Marcoux
We examined individual and task-related variability in the realization of Greek nuclear H* followed by L-L% edge tones. The accents (N = 748) were elicited from native speakers of Greek, producing scripted and unscripted speech, and examined using functional Principal Components Analysis. The accented vowel onset was used for landmark registration to capture accent shape and the alignment of the fall. The resulting PCs were analysed using LMEMs (fixed factors: speaker; task type (scripted, unscripted); accented syllable distance from the analysis window offset, to examine the effects of tonal crowding). Tonal scaling and the steepness of the fall (reflected in PC1 and PC2 respectively) changed by task in ways that differed across speakers. PC3, which captured accent shape, also varied by speaker, reflecting shape differences between a rise-fall and (the expected) plateau-plus-fall realization. Tonal crowding did not have consistent effects. In short, the overall accent shape and the alignment of the accentual fall varied by speaker and task. These results hint at substantial variability in tonal realization. At the same time, they indicate that tonal alignment is not as consistent as is sometimes portrayed and thus it should not be the sole criterion for tone categorization.
prosody
Disentangling emphasis from pragmatic contrastivity in the English H* ~L+H* contrast.
Amalia Arvaniti, Stella Gryllia, Cong Zhang, and Katherine Marcoux
English H* and L+H* indicate new and contrastive information respectively, though some argue the difference between them is solely one of phonetic emphasis. We used (modified) Rapid Prosody Transcription to test these views. Forty-seven speakers of Standard Southern British English (SSBE) listened to 86 SSBE utterances and marked the words they considered prominent or emphatic. Accents (N = 281) were independently coded as H* or L+H* using phonetic criteria, and as contrastive or non-contrastive using pragmatic criteria. If L+H* is an emphatic H*, all L+H*s should be more prominent than H*s. If the accents mark pragmatic information, contrastivity should drive responses. Contrastive accents and L+H*s were considered more prominent than non-contrastive accents and H*s respectively. Individual responses showed different strategies: for some participants, all L+H*s were more prominent than H*s, for others, contrastive accents were more prominent than non-contrastive accents, and for still others, there was no difference between categories. These results indicate that a reason for the continuing debate about English H* and L+H* may be that the two accents form a weak contrast which some speakers acquire and attend to while others do not.
prosody
Focus and accent in English.
Amalia Arvaniti, Stella Gryllia, Cong Zhang, and Katherine Marcoux
Dutch Phonetics Day (Dag van de Fonetiek) Netherlands [online], 17 dec 2021
Contrastive focus in English is marked with a rising accent (autosegmentally L+H*) and broad (all new) focus with a high accent (H*). However, inconclusive production and perception evidence supports the idea that L+H* is simply an emphatic version of H*, not phonologically distinct from it. We used Rapid Prosody Transcription to test these two views. Forty-seven speakers of Standard Southern British English (SSBE) listened to 86 SSBE utterances and marked the words they considered prominent or emphatic. Accents (N = 281) were independently coded as H* or L+H* using phonetic criteria, and as contrastive or non-contrastive using pragmatic criteria. If L+H* is an emphatic H*, L+H*s should be rated more prominent than H*s; if the accents encode a pragmatic distinction, contrastive accents should be rated more prominent than non-contrastive ones. The results showed effects of both accent and pragmatics (L+H* > H*; contrastive > non-contrastive) and no interaction. Contrastive L+H*s were rated most prominent, non-contrastive H*s least prominent, while non-contrastive L+H*s and contrastive H*s had average and almost identical ratings. Participants used different strategies: some focused on accent type, others on pragmatics, and still others made neither distinction. These results suggest that a reason for the continuing debate about H* and L+H* may be that the accents form a weak contrast which some speakers acquire and attend to while others do not. Similarly, researchers who focus on contrastive L+H* and non-contrastive H* see distinct categories, while those who focus on non-contrastive L+H*s and contrastive H*s tend to see a continuum.
prosody
Comparing phonetic and pragmatic classifications of English H* and L+H*.
Cong Zhang, Kathleen Jepson, Katherine Marcoux, and Amalia Arvaniti
1st International Conference on Tone and Intonation 2021 Sonderborg, Denmark [online], 6-9 dec 2021