RESEARCH


My research work has been on biologically inspired speech and audio signal processing. This involves understanding and modeling of the human auditory system and auditory attention mechanism to be used for speech analysis, speech understanding, and speech recognition purposes. In recent years, I have been also working on deep learning for speech recognition. My research interests cover:

  • Speech and Audio Processing
  • Speech Recognition
  • Speech Analysis and Understanding
  • Deep Learning
  • Auditory Systems 
  • Human machine interaction 
  • Multi-modal systems 
  • Information Extraction from audio signals
  • Machine Learning and Pattern Recognition

 

 

Projects while at Sony PlayStation (2010-2017):

At PlayStation, I did research and development for current and next-gen voice centric PlayStation products. I was part of the speech team which shipped ASR solution for PS4 console and released ASR SDK for game studios in 10+ languages for both close-talk and far-field situations. I got a chance to work and interact with game studios on voice centric game applications. My work was mainly focused on automatic speech recognition, robustness to noise and speaking styles, natural language understanding, and deep learning.

 

Sample Projects before 2010:

  • Bio-Inspired Auditory Attention Models    
    During PhD, proposed novel biologically inspired auditory attention models and applied it to speech and audio problems successfully. For details, please see papers on saliency-driven bottom auditory attention model and top-down task-dependent model.
  • Noise Adaptive Training for Robust ASR    Summer 2008
    Worked on noise adaptive training (NAT) that normalizes the environmental distortion as part of the acoustic model training for robust automatic speech recognition (in collaboration with Microsoft Research). Implemented NAT in HTK Toolkit and achieved state-of-the-art results on a noisy ASR task.
  • SpeechLinks: A Speech-to-Speech Translation System     Fall 2006-Fall 2009
    Assisted with data collection, processing, and training acoustic models for large vocabulary continuous speech recognition (LVCSR) in an English-to-Farsi speech translation system (DARPA TRANSTAC).
  • Adaptation of an Automatic Speech Recognizer Front-end    Summer 2006
    Implemented new front-end functions for Sphinx automatic speech recognizer toolkit
    for SAIL lab research.
  • IMSC-Media Streaming Project: Echo Cancellation    Fall 2005
    Developed an echo cancellation algorithm based on the LMS algorithm for
    IMSC's Streaming Media project for video-conferencing using C++. http://imsc.usc.edu
  • Underdetermined Source Separation    2003-2005
    Worked on undertermined source separation while at Immersive Audio Lab at USC.