This is a tutorial for MFA 2.0 Installation and Usage on Windows. MFA official website does not have instructions for Windows installation. It is quite complicated since it requires running Linux on Windows. You need admin access to be able to install everything.
@misc{zhang2023reaper,author={Zhang, Cong},title={{MFA 2.0 Installation and Usage}},year={2022},category={tutorial},doi={10.31219/osf.io/yu48g}}
CharsiuG2P is transformer based tool for grapheme-to-phoneme conversion in 100 languages. Given an orthographic word, CharsiuG2P predicts its pronunciation through a neural G2P model.
@misc{zhu2022charsiug2p,author={Zhu, Jian and Zhang, Cong and Jurgens, David},title={{CharsiuG2P}},year={2022},category={tool}}
Data collected for the CharsiuG2P project. This is a collection of pronunciation dictionaries for over 100 languages.
@misc{zhu2022charsiug2p-dict,author={Zhu, Jian and Zhang, Cong and Jurgens, David},title={CharsiuG2P: pronunciation dictionaries},year={2022},category={tool}}
Data collected for the CharsiuG2P project. This is a collection of pronunciation dictionaries for over 100 languages.
@misc{zhu2022charsiug2p-resources,author={Zhu, Jian and Zhang, Cong and Jurgens, David},title={CharsiuG2P-Data: A Multi-lingual G2P Corpora},year={2022},category={tool}}
The rhythm.metrics package is designed for calculating and visualising speech rhythm metrics. This package provides the calculation of Delta C / Delta V, VarcoC / VarcoV, %V, rPVI_C, nPVI_V.
Word & phone alignments for 2000 hrs of English from Common Voice (https://github.com/lingjzhu/charsiu/blob/main/misc/data.md#alignments-for-english-datasets). Some data come with demographic annotations. Great for studying speech styles, accents & variations
@misc{zhu2022aligned-en,author={Zhu, Jian and Zhang, Cong},title={{Phoneme and word level forced aligned data: Common Voice - English (860,000 utterances)}},year={2022},category={dataset}}
dataset
Phoneme and word level forced aligned data: multiple datasets - Mandarin (over 1 million utterances)
Phone & word alignments for 1300 hours of open-source Mandarin speech datasets. Automatically aligned with our own Charsiu Forced Aligner.
@misc{zhu2022aligned-cn,author={Zhu, Jian and Zhang, Cong},title={{Phoneme and word level forced aligned data: multiple datasets - Mandarin (over 1 million utterances)}},year={2022},category={dataset}}
Phone & word alignments for 1300 hours of open-source Mandarin speech datasets. Automatically aligned with our own Charsiu Forced Aligner.
@misc{zhang2021TTS,author={Zhang, Cong and Zeng, Huinan},title={{Phonological feature mapping for FeatureTTS}},year={2021},category={dataset},doi={10.5281/ZENODO.5553685}}
alinger
Phone-to-audio alignment without text: A Semi-supervised Approach
Charsiu is a phonetic alignment tool, which can: (1) force-align given speech audio + text transcription to phone level; and/or (2) automatically recognise the text in speech audio without the need for any transcription. It is currently available in both Mandarin Chinese and English (mainly American English).
@misc{zhu2022phone-charsiu,title={Phone-to-audio alignment without text: A Semi-supervised Approach},author={Zhu, Jian and Zhang, Cong and Jurgens, David}}
tutorial
Compiling REAPER (Robust Epoch And Pitch EstimatoR)
This tutorial helps you with installing P2FA for Mandarin Chinese on a Windows computer.
@misc{zhang2021reapes,author={Zhang, Cong},title={{Installing and Using the Penn Forced Aligner (P2FA) Chinese}},year={2018},category={tutorial},doi={10.31219/osf.io/542qj}}