Chung-Ming Chien (簡仲明)

Chicago, Illinois, United States


Santorini, Greece

May 30, 2023

I am a 2nd-year Ph.D. student at Toyota Technological Institute at Chicago (TTIC), where I am fortunate to work with Karen Livescu. My research interests encompass the fields of speech and natural language processing technologies. Here are some topics I have been focusing on recently:

  • Speech-Text Joint Learning
    Can speech models learn better/faster with the aids of text? How should we integrate speech and audio information into pre-trained text models?
  • Speech Generation
    Control and model non-lexical information in generated speech in a more efficient and intuitive way.
  • Self-Supervised Speech Representations
    Analyze the information encoded in self-supervised speech representations and explore various applications for the learned representations and units.
  • Multi-Modal Learning
    Text-guided image generation and video-guided speech generation.

Prior to joining TTIC, I earned my Master’s degree in Computer Science from National Taiwan University (NTU), where I had the privilege of working with Lin-shan Lee and Hung-yi Lee at the Speech Processing Lab. Outside of school, I also gained valuable experience through summer internships with Amazon Alexa TTS Research and FAIR (AI at Meta).

Beyond my academic pursuits, I am a sports enthusiasts and amateur athlete. I captained the baseball varsity team of NTU during my undergraduate years. I am also broadly interested in tennis, hiking, scuba diving, swimming, badminton, and training. In 2022, I achieved a personal milestone by completing my first marathon, and I have been dedicated to improving my PB with the goal of breaking the 3:10 mark!


Jan 13, 2024 My open-source FastSpeech 2 project gets over 1.5k stars on Github :sparkles:
Dec 20, 2023 I share the honor of the Best Student Paper Award of ASRU 2023 with Mingjiamei, Ju-Chieh, and Karen. Check out our work “Few-shot SLU via Joint Speech-Text Models” for more details :trophy:
Oct 7, 2023 “Toward Joint Language Modeling for Speech Units and Text” is accepted to Findings of EMNLP 2023!
Sep 22, 2023 Our work “Few-shot SLU via Joint Speech-Text Models” is accepted at ASRU 2023, and I’ll surely go back Taiwan to present it in person!
Sep 14, 2023 “What do self-supervised speech models know about words?” and AV2Wav are both available on arXiv!

selected publications

  1. Few-Shot Spoken Language Understanding via Joint Speech-Text Models
    Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, and 1 more author
    Best Student Paper Award
    In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  2. Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
    Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, and 2 more authors
    In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  3. FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention
    Chung-Ming Chien*, Yist Y. Lin*, Jheng-Hao Lin, and 2 more authors
    *equal contribution
    In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  4. Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
    Chung-Ming Chien, and Hung-yi Lee
    In 2021 IEEE Spoken Language Technology Workshop (SLT)