Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

The Modulation Spectrum and Its Application to Speech Science and Technology

Tutorial at INTERSPEECH 2007, Antwerp, Belgium

This tutorial describes the biological, mathematical and engineering bases of the modulation spectrum, which encapsulates many perceptually relevant properties of speech in the range between 50 and 1000 ms. The modulation spectrum reflects slow energy fluctuations associated with the opening and closing of the lips, the jaw and other speech articulators. These fluctuations are differentially distributed across the acoustic frequency spectrum. The constellation of modulation patterns across frequency and time is referred to as the Complex Modulation Spectrum (CMS), in which both magnitude and phase are important. CMS-related features are beginning to be used in a variety of applications, including automatic speech recognition, speech synthesis, audio coding and auditory prostheses, and are likely to play an important role in audio/speech technology over the coming decade.

The tutorial will be structured as follows:

  1. Overview and synopsis of the tutorial
  2. Biological and Historical Foundations of the Modulation Spectrum (Steven Greenberg)
    1. Historical antecedents (e.g., VOCODER and room acoustics)
    2. Modulation processing in the auditory and other sensory (and motor) systems
    3. Information packaging in speech and other communication systems
    4. Relationship between the modulation spectrum and phonetic/linguistic units
    5. Scale-invariant representations of speech and other signals
  3. Theoretical and Mathematical Foundations of the Modulation Spectrum (Les Atlas)
    1. Original magnitude envelope formulations in the context of a speech-signal model
    2. Hilbert envelope approaches
    3. Formulation as a demodulation problem, with comparisons between incoherent and coherent approaches
    4. Coherent approaches which reduce or eliminate distortion in modulation spectral filtering
    5. Remaining open theoretical questions
  4. Technical Applications and Demonstrations (Les Atlas)
    1. Audio Coding
    2. Speech synthesis and modification
    3. Single-channel sound and talker separation
  5. Speech Technology Applications (Hynek Hermansky)
    1. Filtering of temporal trajectories of spectral energy contours for automatic speech recognition (e.g., RASTA)
    2. Speech recognition features employing the modulation spectrum (e.g., TRAP-TANDEM, MRASTA, Gabor wavelets, phoneme-specific receptive fields, LP-TRAPS and PLP2)
    3. Speech quality measures based on the modulation spectrum (e.g., speech transmission index, spectro-temporal modulation index)
    4. Speech and audio coding
    5. Speech/non-speech discrimination

Presenters

Les Atlas,
Department of Electrical Engineering,
University of Washington, Seattle,

Steven Greenberg,
Silicon Speech; University of California, Berkeley; Technical University of Denmark,

Hynek Hermansky,
IDIAP, Martigny, Switzerland

Short Bios

Les Atlas received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1979 and 1984, respectively. He joined the University of Washington in 1984, where he is currently a Professor of Electrical Engineering. His research is in digital signal processing, with specializations in acoustic analysis, time-frequency representations, as well as signal recognition and coding. Professor Atlas received a National Science Foundation Presidential Young Investigator Award and a 2004 Fulbright Senior Research Scholar Award. He was General Chair of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Chair of the IEEE Signal Processing Society Technical Committee on Theory and Methods, and a member-at-large of the Signal Processing Society’s Board of Governors. He is a Fellow of the IEEE “for contributions to time-varying spectral analysis and acoustical signal processing.” Additional information concerning Dr. Atlas can be found at: http://www.ee.washington.edu/faculty/atlas/.

Steven Greenberg received the A.B. in Linguistics from the University of Pennsylvania and a Ph.D. in Linguistics (with a strong minor in Neuroscience) from the University of California, Los Angeles. He has been a scientist in the Department of Neurophysiology, University of Wisconsin and at the International Computer Science Institute (Berkeley, CA). He is a Visiting Professor at the Centre for Applied Hearing Research, Technical University of Denmark and is also affiliated with the Center for New Music and Audio Technology, University of California, Berkeley. Dr. Greenberg’s research focuses on the interface between the science and technology of spoken language. His studies of human speech perception have examined the role of the modulation spectrum in understanding spoken language. He has also published many papers pertinent to automatic speech recognition and hearing technology. Dr. Greenberg’s bibliography can be found at: http://www.silicon-speech.com/siliconspeechpub.html. A more detailed description of his research is at: http://www.silicon-speech.com/siliconspeechres.html.

Hynek Hermansky is a senior researcher and director of research at IDIAP in Martigny, Switzerland, and serves as Professor at the Swiss Federal Institute of Technology in Lausanne, Switzerland. He has been working in speech processing for over 30 years, previously at the University of Tokyo, Panasonic Technologies in Santa Barbara, California, U S WEST Advanced Technologies, and has been a Professor and Director of the Center for Information Processing at OHSU Portland, Oregon. He is a Fellow of IEEE for “Invention and development of perceptually-based speech processing methods”, a Member of the Editorial Board of Speech Communication and of Phonetica, holds 6 US patents and has authored or co-authored over 140 papers in reviewed journals and conference proceedings. He holds a Dr.Eng. Degree from the University of Tokyo, and a Dipl. Ing. Degree from Brno University of Technology, Czech Republic. Additional information about Dr. Hermansky can be found at: http://people.idiap.ch/hynek.


ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo