Self-Introduction
Short Bio
I majored in Statistics at the School of Economics, Nagoya University, following my high school graduation. During my undergraduate years, I interned at Human Dataware Lab and held a part-time position at Tarvo. After earning my degree from Nagoya University, I joined IBM Japan, where I was involved in system development for an insurance company. In particular, my pioneering work on generative AI, for which I served as technical lead, was widely reported in the news.
Currently, I am a master's student at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. I am a member of the WAVLab, where I am mentored by Prof. Shinji Watanabe, and I am conducting research in the areas of speech and language processing.
Research Interests
Accelerating Inference Without Compromising Accuracy
The recent advancements in neural network technologies have enabled the use of Self-Supervised Learning (SSL) models in the speech domain, significantly improving the accuracy of speech-related tasks, such as speech-to-text. However, these models typically contain a large number of parameters, which can slow down inference speed in speech processing systems. My research aims to develop new techniques that allow these large models to operate efficiently on resource-constrained environments, without compromising on accuracy or inference speed.
[SLU] Speech-to-Text Understanding: Generating Structured Text from Speech
During my time at IBM Japan, I recognized the importance of generating structured text from conversations, especially in a streaming manner. This capability could have wide-ranging applications in business, such as generating meeting minutes, to-do lists, or summarizing key action points from discussions.
While there are existing services that combine Speech-to-Text and Large Language Models to achieve similar outcomes, I am interested in exploring the development of an end-to-end streaming model that can accomplish this task more efficiently and accurately.