experience | Chung-Ming Chien (簡仲明)

education

Toyota Technological Institute at Chicago (TTIC)

2022 - Present
Ph.D. in Computer Science
- Advisor: Karen Livescu

National Taiwan University (NTU)

2019 - 2021
M.S. in Computer Science & Information Engineering
- Advisors: Lin‐shan Lee and Hung‐yi Lee
2015 - 2019
B.S.E. in Electrical Engineering
- Ranked 25/256 (9%) with two Dean’s List Awards

research

Speech and Language Group, TTIC

2023 - Present
Speech Language Models
- Conducted a comprehensive comparison of SpeechLLM’s capabilities on various speech tasks
- Built a duplex speech conversation system composed of collaborative text and speech LMs
2022 - 2024
Joint Speech-Text Representation Learning
- Discovered speech‑text models with text‑to‑speech transferability which enables zero‑shot spoken language understanding
2022 - 2024
Speech Representation Learning
- Revealed word‑level language structures intrinsically encoded in self‑supervised speech representations
- Benchmarked speech foundation models on spoken language understanding tasks under various resource considerations

Kyutai

2025
Factual Speech AI with Duplex Speech Language Models
- Worked with Alexandre Défossez and Neil Zeghidour
- Built moshi-RAG, the world's first full-duplex speech assistant with Retrieval-Augmented Generation (RAG), resulting in significant improvements in factual accuracy and facilitating generalizability to unseen tasks without compromising interactiveness.

NVIDIA NeMo

2024
Speech Language Models
- Worked with Zhehuai Chen and Jason Li
- Augmented NeMo Canary LLMs with speech generation capabilities for speech‑to‑speech translation and speech question answering

FAIR (Fundamental AI Research) at Meta

2023
Controllability of Flow-Matching Speech Generation
- Worked with Andros Tjanda and Wei‐Ning Hsu
- Worked on the Voicebox project, enhancing fine-grained controllability of flow-matching speech generation models under resource-limited scenarios

Amazon Alexa, Cambridge, UK

2021
Speaker‐Adaptive Text‐to‐Speech (TTS)
- Worked with Adam Gabryś and Jaime Lorenzo‐Trueba
- Proposed Voice Filter, which improved extremely low‐resource speaker‐adaptive text‐to‐speech (TTS) by modeling content and speaker information separately
- Reduced the gap between synthesized and real speech by over 30%

Speech Processing Laboratory, NTU

2020 - 2021
Self-Supervised Speech Representations for Generation
- Disentangled speaker and phonetic information in self‐supervised speech representations for the task of voice conversion (VC)
- Proposed SOTA zero‐shot any‐to‐any VC by learning sub‐phoneme alignments between utterances with Transformer attention
2020 - 2021
Speaker Representations
- Proposed generative speaker embedding pre‐training for speech synthesis
- Won the 2nd prize of the IEEE ICASSP M2VoC Challenge on low‐resource voice cloning
2019 - 2020
Prosody in Speech Generation
- Developed hierarchical prosody modeling in TTS

honors

Honors

Scholarship
- Government Scholarship to Study Abroad, Ministry of Education of Taiwan ($32,000) (2023 - 2025)
- Advanced Speech Technologies Scholarship, NTU EECS ($17000) (2021)
- NTUEE60 Scholarship, NTU EE ($3500) (2016)
Awards
- Best Student Paper Award, ASRU (2023)
- 2rd Place, ICASSP M2VoC Challenge (2021)
- Top 20 Finalist, Trans Action Award (2020)
- Cathay United Bank Special Award, Make NTU (2019)
- Dean’s List Awards (Two‐Time), NTU EE (2016 & 2017)
Leadership
- Captain of the NTU Baseball Varsity Team (2019 - 2020)
Non-academic
- 1st Place within UChicago‑Affiliated Athletes (Three Straight Years), J.P. Morgan Corporate Challenge 3.5‑Mile Road Race (2023 - 2025)
- 5th Place (Two‐Time), University Baseball League of Taiwan (equivelent to NCAA Division III) (2019 & 2021)
- Golden Medal, Men’s Half‐Iron Relay, Yilan National Triathlon Championships (2019)

service

Reviewers

IEEE JSTSP, ICLR, ICASSP, InterSpeech

Workshop organizers

2025 TTIC Summer Workshop on Foundations of Speech and Audio Foundation Models
2024 TTIC Student Workshop

talks

Talks

Joint Speech‑Text Generation with Collaborative Spoken and Written Language Models
- TTIC Student Workshop (Chicago, IL, US, May 2025)
Few‑Shot Spoken Language Understanding via Joint Speech‑Text Models
- Midwest Speech and Language Days (Ann Arbor, MI, US, Apr. 2024)
Slides
Self‐Supervised Pre‐Trained Voice Conversion
- TTIC Student Workshop (Chicago, IL, US, Nov. 2022)
Slides
Speech Synthesis in the Deep Learning Era
- AI Summer School 2020, NTU (Taipei, Taiwan, Aug. 2020)
Slides Video

education

Toyota Technological Institute at Chicago (TTIC)

Ph.D. in Computer Science

National Taiwan University (NTU)

M.S. in Computer Science & Information Engineering

B.S.E. in Electrical Engineering

research

Speech and Language Group, TTIC

Speech Language Models

Joint Speech-Text Representation Learning

Speech Representation Learning

Kyutai

Factual Speech AI with Duplex Speech Language Models

NVIDIA NeMo

Speech Language Models

FAIR (Fundamental AI Research) at Meta

Controllability of Flow-Matching Speech Generation

Amazon Alexa, Cambridge, UK

Speaker‐Adaptive Text‐to‐Speech (TTS)

Speech Processing Laboratory, NTU

Self-Supervised Speech Representations for Generation

Speaker Representations

Prosody in Speech Generation

honors

Honors

Scholarship

Awards

Leadership

Non-academic

service

Reviewers

Workshop organizers

talks

Talks

Joint Speech‑Text Generation with Collaborative Spoken and Written Language Models

Few‑Shot Spoken Language Understanding via Joint Speech‑Text Models

Self‐Supervised Pre‐Trained Voice Conversion

Speech Synthesis in the Deep Learning Era