![]() |
August 27-31, 2007
Antwerp, Belgium |
![]() |
![]() |
![]() |
ThB.SS - Speech Recognition by Automatic Speech Attribute TranscriptionThursday, August 30, 2007, Astrid Plaza hotel, Room Scala 1 Session chair: Chin-Hui Lee, Georgia Institute of Technology, USA It is well-known that the speech signal contains a rich set of information that facilitates human auditory perception and communication, beyond a simple linguistic interpretation of word sequences. It has long been postulated that human beings determine the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy. In order to bridge the performance gap between automatic speech recognition (ASR) systems and human speech recognition, the narrow notion of speech-to-text in ASR has to be expanded to incorporate all related human information “hidden” in speech utterances. This collection of information includes a set of fundamental speech sounds and their linguistic interpretations, a speaker profile that encompasses gender, accent and other speaker characteristics, the speaking environment that describes the interaction between speech and acoustics, the emotional state of the speaker, etc. Collectively, we call this set of speech information, speech attributes. Program10:00 – 10:20 Tutorial An Overview on Automatic Speech Attribute Transcription (ASAT), Chin-Hui Lee, Mark Clements, Sorin Dusan, Eric Fosler-Lussier, Keith Johnson, Biing-Hwang Juang and Lawrence Rabiner, Georgia Institute of Technology, Rutgers University, Ohio State University and University of California, Berkeley (USA) Automatic Speech Attribute Transcription (ASAT), an ITR project sponsored under the NSF grant (IIS-04-27113), is a cross-institute effort involving Georgia Institute of Technology, The Ohio State University, University of California at Berkeley, and Rutgers University. This project approaches speech recognition from a more linguistic perspective: unlike traditional ASR systems, humans detect acoustic and auditory cues, weigh and combine them to form theories, and then process these cognitive hypotheses until linguistically and pragmatically consistent speech understanding is achieved. A major goal of the ASAT paradigm is to develop a detection-based approach to automatic speech recognition (ASR) based on attribute detection and knowledge integration. We report on progress of the ASAT project, present a sharable platform for community collaboration, and highlight areas of potential interdisciplinary ASR research. 10:20 – 12:00 Oral Session 10:20 - Detection-Based ASR in the Automatic Speech Attribute Transcription Project, Ilana Bromberg, Qiang Fu, Jun Hou, Jinyu Li, Chengyuan Ma, Brett Matthews, Antonio Moreno-Daniel, Jeremy Morris, Sabato Marco Siniscalchi, Yu Tsao and Yu Wang, Ohio State University, Georgia Tech, Rutgers University (USA) 10:40 - Attribute-based Mandarin Speech Recognition using Conditional Random Fields, Chi-Yueh Lin and Hsiao-Chuan Wang, National Tsing-Hua University (Taiwan) 11:00 - Comparing classifiers for pronunciation error detection, Helmer Strik, Khiet Truong, Febe de Wet, Catia Cucchiarini, Radboud University and TNO Human Factors (The Netherlands) and Stellenbosch University (South-Africa) 11:20 - Using Prosodic And Spectral Characteristics For Sleepiness Detection, Jarek Krajewski and Bernd Kroeger, Work and Organizational Psychology, University Hospital Aachen and Aachen University (Germany) 11:40 - Score Fusion for Articulatory Feature Detection, Brian Ore and Raymond Slyh, General Dynamics Advanced Information Systems Dayton and Wright-Patterson Air Force Base (USA) ContactSession organizer:Chin-Hui LEE, Professor School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA
|