![]() |
August 27-31, 2007
Antwerp, Belgium |
![]() |
![]() |
![]() |
A Mathematical Theory of Speech Signals – Beyond the Linear ModelTutorial at INTERSPEECH 2007, Antwerp, Belgium Speech signal modeling is the basis for the development of most elementary speech processing functions such as speech signal analysis, recognition, coding, enhancement, and synthesis. Such models are often developed from a specific point of view in the communication chain, such as speech production models, speech transmission models, or speech perception models. When it comes to engineering applications, we often reach the limits of our knowledge in sufficient details of speech production and perception and we have to rely on approaches that rather build on a solid mathematical basis while complementing it with only a few truly generic speech properties. This observation helps to understand the long-term success of adaptive linear prediction in speech modeling and paves the way to further extend our signal theoretic basis towards a more comprehensive theory of speech signals. The planned tutorial will introduce the speech modeling problem in a systematic way and review signal theoretic concepts from both the deterministic, dynamical systems perspective and the stochastic, information-theory perspective. It will show how to apply these advanced concepts to speech signals in cases where conventional linear approaches fail, and it will use these building blocks to develop a fullfledged oscillator-plus-noise model of speech signals that can be adapted automatically to sustained speech sounds. Finally, specific applications to continuous speech are discussed where the new signal theoretic methods have led to competitive engineering solutions. The presentation will be organized in six units of approx. 30 minutes each:
Presenters Gernot Kubin, Erhard Rank, Short Bios Gernot Kubin has worked on speech analysis, synthesis, coding for mobile and IP telephony, error concealment, watermarking, enhancement and augmentation, echo cancellation, resource collection, as well as the recognition of speech, speakers, and regional varieties over the past 25 years (including affiliations with TU Vienna, Philips Research Eindhoven, AT&T Bell Laboratories Murray Hill, KTH Stockholm, Global IP Sound Stockholm/San Francisco) and, in particular, on nonlinear speech processing since 1990. He is currently full professor and head of the Signal Processing and Speech Communication Laboratory at Graz University of Technology, Graz, Austria, scientific director of the Christian Doppler Laboratory for Nonlinear Signal Processing and of the Competence Network for Advanced Speech Technology COAST, and he is a member of the board of the Vienna Telecommunications Research Centre FTW and the Austrian Acoustics Association. He has been vice chair of the COST Action 277 Nonlinear Speech Processing and is co-initiator of the new COST Action 2103 Advanced Voice Function Assessment. Erhard Rank has worked in speech processing, synthesis and recognition, in particular in hybrid concatenative and model based speech synthesis, synthesis of emotional speech, nonlinear model based synthesis of speech and musical signals, and noise reduction/speech enhancement. He was affiliated with the Austrian Research Institute for Artificial Intelligence ÖFAI, the Vienna Telecommunications Research Centre FTW, and as research and teaching assistant at TU Vienna and TU Graz. In his PhD thesis (2005) he developed an oscillator model for the automatic identification and re-synthesis of speech sounds. He is currently with the Signal Processing and Speech Communication Laboratory at Graz University of Technology, Graz, Austria as project leader for “Robust” (signal processing for robust speech quality), a strategic project of the Competence Network for Advanced Speech Technology COAST.
|