Chung-Ming Chien (簡仲明)

Santorini, Greece

May 30, 2023

I am a 3rd-year Ph.D. student at Toyota Technological Institute at Chicago (TTIC), where I am fortunate to work with Karen Livescu. My research interests encompass the fields of speech and natural language processing technologies. Here are some topics I have been focusing on recently:

Speech Language Models
How to facilitate speech applications with the aids of the knowledge learned from text by pre-trained large language models (LLMs)? How to fine-tune LLMs to take speech inputs and outputs and enable general-purpose speech conversational AI?
Speech Generation
Control and model non-lexical information in generated speech in a more efficient and intuitive way.
Self-Supervised Speech Representations
Analyze the information encoded in self-supervised speech representations and explore various applications for the learned representations and units.

Prior to joining TTIC, I earned my Master’s degree in Computer Science from National Taiwan University (NTU), where I had the privilege of working with Lin-shan Lee and Hung-yi Lee at the Speech Processing Lab. Outside of school, I also gained valuable experience through summer internships with Amazon Alexa TTS Research, FAIR (AI at Meta), and NVIDIA.

Beyond my academic pursuits, I am a sports enthusiast and amateur athlete. I captained the baseball varsity team of NTU during my undergraduate years. I am also broadly interested in tennis, hiking, scuba diving, swimming, badminton, and training. In 2022, I achieved a personal milestone by completing my first marathon, and I have been dedicated to improving my PB with the goal of breaking the 3:10 mark!

news

Jun 4, 2024	“Learning Fine‑Grained Controllability on Speech Generation via Efficient Fine‑Tuning” is accepted to InterSpeech 2024!
May 16, 2024	“On the Evaluation of Speech Foundation Models for Spoken Language Understanding” is accepted to Findings of ACL 2024!
Apr 16, 2024	I gave a talk at Midwest Speech and Language Days in Ann Arbor, Michigan
Apr 9, 2024	I successfully passed the qualifying exam of TTIC and will soon become a Ph.D. candidate
Jan 23, 2024	“What Do Self‑Supervised Speech Models Know about Words” is accepted to TACL 2024!
Jan 18, 2024	I will join NVIDIA NeMo Team for my 2024 summer internship and will work on speech language models!
Jan 13, 2024	My open-source FastSpeech 2 project gets over 1.5k stars on Github
Dec 20, 2023	I share the honor of the Best Student Paper Award of ASRU 2023 with Mingjiamei, Ju-Chieh, and Karen. Check out our work “Few-shot SLU via Joint Speech-Text Models” for more details
Oct 7, 2023	“Toward Joint Language Modeling for Speech Units and Text” is accepted to Findings of EMNLP 2023!
Sep 22, 2023	Our work “Few-shot SLU via Joint Speech-Text Models” is accepted at ASRU 2023, and I’ll surely go back Taiwan to present it in person!

selected publications

InterSpeech 2024

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, and 3 more authors

In Interspeech 2024

arXiv Bib Poster

@inproceedings{chien2024learning,
  title = {Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning},
  author = {Chien, Chung-Ming and Tjandra, Andros and Vyas, Apoorv and Le, Matt and Shi, Bowen and Hsu, Wei-Ning},
  year = {2024},
  booktitle = {Interspeech 2024},
  month = sep,
  eprint = {2406.06251},
  archiveprefix = {arXiv},
  primaryclass = {eess.AS},
}

ASRU 2023

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, and 1 more author

Best Student Paper Award

In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

arXiv Bib Poster Slides Video

@inproceedings{chien2023few,
  title = {Few-Shot Spoken Language Understanding via Joint Speech-Text Models},
  author = {Chien, Chung-Ming and Zhang, Mingjiamei and Chou, Ju-Chieh and Livescu, Karen},
  booktitle = {2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2023},
  month = dec,
  eprint = {2310.05919},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
}

ICASSP 2021

FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention

Chung-Ming Chien*, Yist Y. Lin*, Jheng-Hao Lin, and 2 more authors

*equal contribution

In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv Bib Code Poster Slides

@inproceedings{chien2020fragmentvc,
  author = {Chien*, Chung-Ming and Lin*, Yist Y. and Lin, Jheng-Hao and Lee, Hung-yi and Lee, Lin-shan},
  booktitle = {ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention},
  year = {2021},
  volume = {},
  number = {},
  pages = {5939-5943},
  doi = {10.1109/ICASSP39728.2021.9413699},
  month = jun,
  eprint = {2010.14150},
  archiveprefix = {arXiv},
  primaryclass = {eess.AS},
}