WORK


My interests are speech and audio processing, automatic speech recognition, natural language understanding, deep learning and its applications to various problems. I enjoy the cross-section of deep neural networks and neuroscience since we benefit so much as we discover our brain. Solving challenging real-world problems that would impact people's lives keeps me motivated and inspired.

 

Facebook AI (since Sept 2020):

At Facebook, my team is focusing on speech recognition.

 

Siri at Apple (2017-2020):

In Siri, I was managing the acoustic modeling team. The focus of my team was to research and build machine learning models to make Siri work with all Apple devices in various acoustic conditions and for all users.

 

Sony PlayStation US R&D (2010-2017):

At PlayStation, I did research and development for current and next-gen voice centric PlayStation products. I was part of the speech team which shipped ASR solution for PS4 console and released ASR SDK for game studios in 10+ languages for both close-talk and far-field situations. I got a chance to work and interact with game studios on voice centric game applications. My work was mainly focused on automatic speech recognition, robustness to noise and speaking styles, natural language understanding, and deep learning.

 

Work before 2010:

  • PhD Thesis: Biologically Inspired Auditory Attention Models    
    My PhD work has been on biologically inspired speech and audio signal processing. This involved understanding and modeling of the human auditory system and auditory attention mechanism to be used for speech analysis, speech understanding, and speech recognition purposes. During PhD, I proposed novel biologically inspired auditory attention models and applied it to speech and audio problems. For details, please see papers on saliency-driven bottom auditory attention model and top-down task-dependent model.
  • Noise Adaptive Training for Robust ASR    Summer 2008
    Worked on noise adaptive training (NAT) that normalizes the environmental distortion as part of the acoustic model training for robust automatic speech recognition (in collaboration with Microsoft Research). Implemented NAT in HTK Toolkit and achieved state-of-the-art results on a noisy ASR task.
  • SpeechLinks: A Speech-to-Speech Translation System     Fall 2006-Fall 2009
    Assisted with data collection, processing, and training acoustic models for large vocabulary continuous speech recognition (LVCSR) in an English-to-Farsi speech translation system (DARPA TRANSTAC).
  • Adaptation of an Automatic Speech Recognizer Front-end     Summer 2006
    Implemented new front-end functions for Sphinx automatic speech recognizer toolkit
    for SAIL lab research.
  • IMSC-Media Streaming Project: Echo Cancellation     Fall 2005
    Developed an echo cancellation algorithm based on the LMS algorithm for
    IMSC's Streaming Media project for video-conferencing using C++. http://imsc.usc.edu
  • Underdetermined Source Separation     2003-2005
    Worked on undertermined source separation while at Immersive Audio Lab at USC.