May 30, 2023
I am a 2nd-year Ph.D. student at Toyota Technological Institute at Chicago (TTIC), where I am fortunate to work with Karen Livescu. My research interests encompass the fields of speech and natural language processing technologies. Here are some topics I have been focusing on recently:
- Speech-Text Joint Learning
Can speech models learn better/faster with the aids of text? How should we integrate speech and audio information into pre-trained text models?
- Speech Generation
Control and model non-lexical information in generated speech in a more efficient and intuitive way.
- Self-Supervised Speech Representations
Analyze the information encoded in self-supervised speech representations and explore various applications for the learned representations and units.
- Multi-Modal Learning
Text-guided image generation and video-guided speech generation.
Prior to joining TTIC, I earned my Master’s degree in Computer Science from National Taiwan University (NTU), where I had the privilege of working with Lin-shan Lee and Hung-yi Lee at the Speech Processing Lab. Outside of school, I also gained valuable experience through summer internships with Amazon Alexa TTS Research and FAIR (AI at Meta).
Beyond my academic pursuits, I am a sports enthusiasts and amateur athlete. I captained the baseball varsity team of NTU during my undergraduate years. I am also broadly interested in tennis, hiking, scuba diving, swimming, badminton, and training. In 2022, I achieved a personal milestone by completing my first marathon, and I have been dedicated to improving my PB with the goal of breaking the 3:10 mark!
|Jan 13, 2024
|My open-source FastSpeech 2 project gets over 1.5k stars on Github
|Dec 20, 2023
|I share the honor of the Best Student Paper Award of ASRU 2023 with Mingjiamei, Ju-Chieh, and Karen. Check out our work “Few-shot SLU via Joint Speech-Text Models” for more details
|Oct 7, 2023
|“Toward Joint Language Modeling for Speech Units and Text” is accepted to Findings of EMNLP 2023!
|Sep 22, 2023
|Our work “Few-shot SLU via Joint Speech-Text Models” is accepted at ASRU 2023, and I’ll surely go back Taiwan to present it in person!
|Sep 14, 2023
|“What do self-supervised speech models know about words?” and AV2Wav are both available on arXiv!
- Few-Shot Spoken Language Understanding via Joint Speech-Text ModelsBest Student Paper AwardIn 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-SpeechIn ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention*equal contributionIn ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Hierarchical Prosody Modeling for Non-Autoregressive Speech SynthesisIn 2021 IEEE Spoken Language Technology Workshop (SLT)