My research work has been on biologically inspired speech and audio signal processing. This involves understanding and modeling of the human auditory system and auditory attention mechanism to be used for speech analysis, speech understanding, and speech recognition purposes. In recent years, I have been also working on deep learning for speech recognition. My research interests cover:
- Speech and Audio Processing
- Speech Recognition
- Speech Analysis and
- Deep Learning
- Auditory Systems
- Human machine interaction
- Multi-modal systems
- Information Extraction from
- Machine Learning and Pattern
Projects while at Sony PlayStation (2010-2017):
At PlayStation, I did research and development for current and next-gen voice centric PlayStation products. I was part of the speech team which shipped ASR solution for PS4 console and released ASR SDK for game studios in 10+ languages for both close-talk and far-field situations. I got a chance to work and interact with game studios on voice centric game applications. My work was mainly focused on automatic speech recognition, robustness to noise and speaking styles, natural language understanding, and deep learning.
Sample Projects before 2010:
- Bio-Inspired Auditory Attention Models
During PhD, proposed novel biologically inspired auditory attention models and applied it to speech and audio problems successfully. For details, please see papers on
saliency-driven bottom auditory attention model and
top-down task-dependent model.
- Noise Adaptive Training for Robust ASR
Worked on noise adaptive training (NAT) that normalizes the environmental distortion as part of the acoustic model training for robust automatic speech recognition (in collaboration with Microsoft Research). Implemented NAT in HTK Toolkit and achieved state-of-the-art results on a noisy ASR task.
SpeechLinks: A Speech-to-Speech Translation System Fall 2006-Fall 2009
Assisted with data collection, processing, and training acoustic models for
large vocabulary continuous speech recognition (LVCSR) in an English-to-Farsi speech translation
system (DARPA TRANSTAC).
- Adaptation of an Automatic Speech Recognizer Front-end
Implemented new front-end functions for Sphinx automatic speech recognizer toolkit
for SAIL lab research.
- IMSC-Media Streaming Project: Echo Cancellation Fall
Developed an echo cancellation algorithm based on the LMS algorithm for
IMSC's Streaming Media project for video-conferencing using C++.
- Underdetermined Source Separation 2003-2005
Worked on undertermined source separation while at Immersive Audio Lab at USC.