Singing from a Linguistics Perspective -- segment proportion; beat alignment
Background
This project was motivated by my industry experience of developing a text-to-sing product. As a linguist, I hoped to give my engineer colleagues some more information from a linguistics perspective to solve the badcases. This project includes two major themes across two languages, Mandarin and English, and was partially funded by a small grant I received (Postdoctoral Research Associate Summer Research Prize, University of Kent, UK, 2020). The themes included are segment proportion and beat alignment. I presented some preliminary results at a number of conferences, together with my student assistants and colleagues.
Studies
This project includes two major themes, segment proportion and beat alignment, in Mandarin and English.
Segment proportion
Paper at Speech Prosody 2020:
“Segment Duration and Proportion in Mandarin Singing” A video presenting on this topic: https://osf.io/ybdup/
Beat alignment
Video lecture at presented at the 179th annual meeting of Acoustical Society of America: “Where does the beat fall? Speech-beat alignment in Mandarin and English singing” https://osf.io/nzm9d/
Paper presented at Interspeech 2021 with Jian Zhu: “Synchronising speech segments with musical beats in Mandarin and English singing”
Related articles:
Interspeech
Synchronising Speech Segments with Musical Beats in Mandarin and English Singing
Generating synthesised singing voice with models trained on speech data has many advantages due to the models’ flexibility and controllability. However, since the information about the temporal relationship between segments and beats are lacking in speech training data, the synthesised singing may sound off-beat at times. Therefore, the availability of the information on the temporal relationship between speech segments and music beats is crucial. The current study investigated the segment-beat synchronisation in singing data, with hypotheses formed based on the linguistics theories of P-centre and sonority hierarchy. A Mandarin corpus and an English corpus of professional singing data were manually annotated and analysed. The results showed that the presence of musical beats was more dependent on segment duration than sonority. However, the sonority hierarchy and the P-centre theory were highly related to the location of beats. Mandarin and English demonstrated cross-linguistic variations despite exhibiting common patterns.
Speech Prosody
Segment Duration and Proportion in Mandarin Singing
Speech-based singing synthesis has various merits while it also has unsolved issues. One of the most noticeable issues is the segment duration and proportion in synthesised singing, which is caused by the difference in the short syllables in speech and the lengthened syllables in singing. This study therefore investigates how syllables are lengthened in Mandarin singing data. A total of 20 songs from the MIREX singing corpus were segmented and analysed. The results showed that (1) the segment proportions in Mandarin syllables are different in speech and in singing; (2) the lengthening is influenced more by the slots in the syllable structure than by the types of segments; (3) in the syllable structure of CGVX in Mandarin, the nuclear V lengthens the most and X follows. The durations of C and G also increase but their proportions in a syllable decrease.
Related talks:
singing
Synchronising speech segments with musical beats in Mandarin and English singing.
Generating synthesised singing voice with models trained on speech data has many advantages due to the models’ flexibility and controllability. However, since the information about the temporal relationship between segments and beats are lacking in speech training data, the synthesised singing may sound off-beat at times. Therefore, the availability of the information on the temporal relationship between speech segments and music beats is crucial. The current study investigated the segment-beat synchronisation in singing data, with hypotheses formed based on the linguistics theories of P-centre and sonority hierarchy. A Mandarin corpus and an English corpus of professional singing data were manually annotated and analysed. The results showed that the presence of musical beats was more dependent on segment duration than sonority. However, the sonority hierarchy and the P-centre theory were highly related to the location of beats. Mandarin and English demonstrated cross-linguistic variations despite exhibiting common patterns.
singing
Where does the beat fall? Speech-beat alignment in Mandarin and English singing.
Cong Zhang, and Charlotte A. Slocombe
179th Annual Meeting of the Acoustical Society of America USA [online], 7-11 dec 2020
Text-to-sing generates singing from text input (i.e., music score with lyrics), from which only syllable-level speech-music alignment can be acquired. To enhance the text-to-sing models, more fine-grained phoneme-level information is needed. We therefore investigate the acoustic measurements of segments and their temporal relationship with music beats as an answer from a linguistics perspective. Two research questions are addressed: (1) Do beats align with syllable onsets or nuclear vowel onsets? (2) Do different types of consonants present different speech-beat alignment results? Unaccompanied singing by professional singers in two rhythmically dissimilar languages, English (15 songs) and Mandarin (25 songs), were analysed. Data were segmented manually into phonemes by a trained annotator; a music scholar independently labelled the beats. Preliminary results suggest that Mandarin songs strongly favour vowels as anchors for beats (66.7%) while only 52.9% of beats fall on vowels in English. Both languages show that the beats have a strong preference for the end of consonants and the beginning of vowels. Phoneme types also play a significant role in the speech-beat alignment distribution. Future modelling of the speech-beat alignment in singing and comparison with speech rhythm data will also contribute to linguistic rhythm theories.
singing
Segment Duration and Proportion in Mandarin Singing.
Cong Zhang, and Xinrong Wang
Speech Prosody 2020 Tokyo, Japan [online], 25-28 may 2020
Speech-based singing synthesis has various merits while it also has unsolved issues. One of the most noticeable issues is the segment duration and proportion in synthesised singing, which is caused by the difference in the short syllables in speech and the lengthened syllables in singing. This study therefore investigates how syllables are lengthened in Mandarin singing data. A total of 20 songs from the MIREX singing corpus were segmented and analysed. The results showed that (1) the segment proportions in Mandarin syllables are different in speech and in singing; (2) the lengthening is influenced more by the slots in the syllable structure than by the types of segments; (3) in the syllable structure of CGVX in Mandarin, the nuclear V lengthens the most and X follows. The durations of C and G also increase but their proportions in a syllable decrease.