Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

A Mathematical Theory of Speech Signals – Beyond the Linear Model

Tutorial at INTERSPEECH 2007, Antwerp, Belgium

Speech signal modeling is the basis for the development of most elementary speech processing functions such as speech signal analysis, recognition, coding, enhancement, and synthesis. Such models are often developed from a specific point of view in the communication chain, such as speech production models, speech transmission models, or speech perception models. When it comes to engineering applications, we often reach the limits of our knowledge in sufficient details of speech production and perception and we have to rely on approaches that rather build on a solid mathematical basis while complementing it with only a few truly generic speech properties. This observation helps to understand the long-term success of adaptive linear prediction in speech modeling and paves the way to further extend our signal theoretic basis towards a more comprehensive theory of speech signals.

The planned tutorial will introduce the speech modeling problem in a systematic way and review signal theoretic concepts from both the deterministic, dynamical systems perspective and the stochastic, information-theory perspective. It will show how to apply these advanced concepts to speech signals in cases where conventional linear approaches fail, and it will use these building blocks to develop a fullfledged oscillator-plus-noise model of speech signals that can be adapted automatically to sustained speech sounds. Finally, specific applications to continuous speech are discussed where the new signal theoretic methods have led to competitive engineering solutions.

The presentation will be organized in six units of approx. 30 minutes each: 

  1. Speech modeling as a signal theoretic problem
  2. Deterministic theory – Nonlinear dynamical systems from fading-memory filters to chaotic oscillators
  3. Stochastic theory – Cyclostationarity, higher-order statistics and information theory
  4. First steps in nonlinear speech modeling – Where linear models fail and nonlinear models prevail
  5. Speech analysis and synthesis by nonlinear prediction of the speech wave – How to automatically identify oscillator-plus-noise models for sustained speech sounds
  6. Selected applications to continuous speech – Error concealment, time-scale modification, and pathological voice augmentation.

Presenters

Gernot Kubin, Erhard Rank,
Graz University of Technology

Short Bios

Gernot Kubin has worked on speech analysis, synthesis, coding for mobile and IP telephony, error concealment, watermarking, enhancement and augmentation, echo cancellation, resource collection, as well as the recognition of speech, speakers, and regional varieties over the past 25 years (including affiliations with TU Vienna, Philips Research Eindhoven, AT&T Bell Laboratories Murray Hill, KTH Stockholm, Global IP Sound Stockholm/San Francisco) and, in particular, on nonlinear speech processing since 1990. He is currently full professor and head of the Signal Processing and Speech Communication Laboratory at Graz University of Technology, Graz, Austria, scientific director of the Christian Doppler Laboratory for Nonlinear Signal Processing and of the Competence Network for Advanced Speech Technology COAST, and he is a member of the board of the Vienna Telecommunications Research Centre FTW and the Austrian Acoustics Association. He has been vice chair of the COST Action 277 Nonlinear Speech Processing and is co-initiator of the new COST Action 2103 Advanced Voice Function Assessment.

Erhard Rank has worked in speech processing, synthesis and recognition, in particular in hybrid concatenative and model based speech synthesis, synthesis of emotional speech, nonlinear model based synthesis of speech and musical signals, and noise reduction/speech enhancement. He was affiliated with the Austrian Research Institute for Artificial Intelligence ÖFAI, the Vienna Telecommunications Research Centre FTW, and as research and teaching assistant at TU Vienna and TU Graz. In his PhD thesis (2005) he developed an oscillator model for the automatic identification and re-synthesis of speech sounds. He is currently with the Signal Processing and Speech Communication Laboratory at Graz University of Technology, Graz, Austria as project leader for “Robust” (signal processing for robust speech quality), a strategic project of the Competence Network for Advanced Speech Technology COAST.


ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo