Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Processing Morphologically-Rich Languages

Tutorial at INTERSPEECH 2007, Antwerp, Belgium

Morphologically-rich languages like Arabic, Turkish, Finnish, Korean, etc., present significant challenges for speech processing, natural language processing and machine translation. These languages are characterized by highly productive morphological processes (inflection, agglutination, compounding) that may produce a very large number of word forms for a given root form.  Modelling each form as a separate word leads to a number of problems for speech and language processing applications, including:

  1. increase in dictionary size: the size of word lists and dictionaries, such as pronunciation dictionaries in speech recognizers, explodes, which places heavy demands on computational resources, such as memory and processing time of speech decoders
  2. poor language model (LM) probability estimation: many combinations of word forms will have been observed rarely or not at all in language model training data, resulting in unreliable probability estimates
  3. higher out-of-vocabulary (OOV) rate:  when moving from a known (training) set to an unknown (test) set, many novel word forms will occur that need to be accounted for
  4. inflection gap for machine translation:  multiple different forms of  the same underlying baseform are often treated as unrelated items, with negative effects on word alignment and translation accuracy.

Large-scale speech and language processing systems require more advanced modelling techniques to address these problems:

  • automatic decomposition of complex word forms into smaller units
  • methods for optimizing the selection of units at different levels of processing
  • diacritization/vowelization  (for Arabic)
  • pronunciation modelling for morphologically-rich languages
  • morphologically-rich languages in speech synthesis
  • novel probability estimation techniques that avoid data sparseness problems
  • creating data resources and annotation tools for morphologically-rich languages

Presenters

Katrin Kirchhoff
EE Department, University of Washington

Ruhi Sarikaya
IBM T.J. Watson Research Center

Short Bios

Dr. Kirchhoff is a Research Assistant Professor in the EE Department at the University of Washington. Her research interests are in automatic speech recognition and natural language processing, with an emphasis on multilingual applications.  She has published over 50 refereed conference papers, journal papers, and book chapters in these areas and is co-editor of a recent book on “Multilingual Speech Processing”. In 2002 she led a team effort on developing Novel Models for Arabic Speech Processing at the Johns-Hopkins Summer Research Workshop.  Dr. Kirchhoff has served on numerous conference and workshop committees and is a member of the Editorial Board of the Speech Communication journal.

Dr. Ruhi Sarikaya is a research staff member in the Human Language Technologies Group at IBM T.J. Watson Research Center. He received the B.S. degree from Bilkent University, Turkey in 1995, M.S. degree from Clemson University, SC in 1997 and the Ph.D. degree from Duke University, Durham, NC in 2001 all in electrical and computer engineering.  He has published over 40 technical papers in refereed journal and conference proceedings and holder of four patents in the area of speech and natural language processing. At IBM he has received several prestigious awards for his work including an Outstanding Technical Achievement Award and a Research Division Award. Prior to joining IBM in 2001 he was a researcher at the Center for Spoken Language Research (CSLR) at the University of Colorado at Boulder for two years.  He also spent the summer of 1999 at the Panasonic Speech Technology Laboratory, Santa Barbara, CA. He has served in the organizing committee of ASRU’05.  His past and present research interests span speech recognition, natural language processing, machine learning, speech enhancement, speech-to-speech translation, speaker identification/verification and digital signal processing. Dr. Sarikaya is a member of IEEE, ACL and ISCA.
ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo