Chung-Ming Chien (簡仲明)

Chicago, Illinois, United States

20230530_santorini.jpg

Santorini, Greece

May 30, 2023

I am a 4th-year Ph.D. student at Toyota Technological Institute at Chicago (TTIC), advised by Karen Livescu. My overarching ambition is to create Trustworthy and User-Friendly Human-Like Speech AI. To achieve this goal, my current research is focused on:

  • Factual Conversational Speech AI
    How to make full-duplex models more intelligent while preserving their interactiveness? How to best preserve the knowledge learned from large-scale text pre-training?
  • Aligning Speech Language Models (SLMs) with Human Preference
    How to improve various aspects of SLMs – such as content fidelity, natural turn-taking, and emotional expressiveness – with human preference? How to enhance overall user experience?

Aside from these, my past research in Controllable Speech Generation, Efficient Fine-Tuning of SLMs, Flow-Matching & Diffusion Models, and Self-Supervised Speech Representations have also been crucial experience and provided me valuable foundation on my way towards achieving the goal.

Prior to joining TTIC, I completed my Master’s degree in Computer Science at National Taiwan University (NTU), where I had the fortune to work with Lin-shan Lee and Hung-yi Lee at the Speech Processing Lab. I also gained valuable practical experience through summer internships at Amazon Alexa TTS Research, FAIR (AI at Meta), NVIDIA NeMo Team, and Kyutai.

Beyond my academic and research pursuits, I am an avid sports enthusiast and amateur athlete. During my undergraduate studies, I captained the varsity baseball team at NTU. I maintain a broad interest in tennis, hiking, snowboarding, scuba diving, and badminton. In 2022, I achieved a personal milestone by completing my first marathon and I am currently training to break the 3-hour mark.

news

Sep 5, 2025 It was a true pleasure reconnecting with old friends and meeting new colleagues at the Workshop on Foundations of Speech and Audio Foundation Models in TTIC. It is my greatest hope that you found the event valuable and enjoyable.
Jul 17, 2025 I will be co-hosting the Workshop on Foundations of Speech and Audio Foundation Models this September. Join us at TTIC to explore the newest advancements in Speech Language Models.
Apr 18, 2025 This summer, I am heading to France to join Kyutai and work on moshi, the world’s first full-duplex speech assistant. This is a fresh trial, and I can’t wait to see what we achieve! Bonjour Paris :fr:
Apr 11, 2025 The long wait is over. Our review paper on SLMs, “On the landscape of spoken language models: A comprehensive survey”, is officially on arXiv. Check out this comprehensive survey -— it was a significant, collective effort and is packed with insights!
Jun 4, 2024 “Learning Fine‑Grained Controllability on Speech Generation via Efficient Fine‑Tuning” is accepted to InterSpeech 2024!
May 16, 2024 “On the Evaluation of Speech Foundation Models for Spoken Language Understanding” is accepted to Findings of ACL 2024!
Apr 16, 2024 I gave a talk at Midwest Speech and Language Days in Ann Arbor, Michigan :microphone:
Apr 9, 2024 I successfully passed the qualifying exam of TTIC and will soon become a Ph.D. candidate :mortar_board:
Jan 23, 2024 “What Do Self‑Supervised Speech Models Know about Words” is accepted to TACL 2024!
Jan 18, 2024 I will join NVIDIA NeMo Team for my 2024 summer internship and will work on speech language models!

selected publications

  1. On The Landscape of Spoken Language Models: A Comprehensive Survey
    Chung-Ming Chien*, Siddhant Arora*, Kai-Wei Chang*, and 7 more authors
    *equal contribution
    Transactions on Machine Learning Research
  2. Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
    Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, and 3 more authors
    In Interspeech 2024
  3. Few-Shot Spoken Language Understanding via Joint Speech-Text Models
    Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, and 1 more author
    Best Student Paper Award
    In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  4. FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention
    Chung-Ming Chien*, Yist Y. Lin*, Jheng-Hao Lin, and 2 more authors
    *equal contribution
    In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)