Estimation of the glottal flow from speech pressure signals: Evaluation of three variants of iterative adaptive inverse filtering using computational physical modelling of voice production

Parham Mokhtari, Brad H Story, Paavo Alku, Hiroshi Ando

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The aim of this study is to comparatively review and evaluate three variants of the glottal inverse filtering algorithm based on iterative adaptive inverse filtering (IAIF): the Standard algorithm, and two recently proposed variants that use iterative optimal preemphasis (IOP) and a glottal flow model (GFM), respectively. To enable an objective evaluation, a computational physical model of voice production is used to generate time-domain signals pertaining to both the input glottal flow and the output speech pressure, for a wide range of vowels, fundamental frequencies, and voice qualities (involving co-variation of phonation type and loudness). Furthermore, for a fair comparison, the three key parameters of IAIF are selected by an exhaustive search to minimize the root-mean-square error between the estimated and reference glottal flow derivative in each analyzed frame and performance is assessed with two time-domain and two frequency-domain error measures. A conventional evaluation is also carried out with fixed parameter values determined by cross-validation. Results indicate that IOP tends to yield the lowest errors for nonback vowels (reducing errors by 31% on average compared with Standard), especially for not too high fundamental frequencies and not too pressed voice qualities; GFM becomes competitive for normal phonations when fixed parameter values are used; and in other cases, Standard IAIF is still recommended. In addition, the results suggest that not only the overall spectral tilt (as controlled by IOP and GFM) but also the balance between the levels of different spectral regions, can be important for accurate estimation of the glottal flow.

Original languageEnglish (US)
Pages (from-to)24-38
Number of pages15
JournalSpeech Communication
Volume104
DOIs
StatePublished - Nov 1 2018

Fingerprint

Physical Modeling
Computational Modeling
Filtering
Evaluation
evaluation
Fundamental Frequency
Time Domain
Mean square error
Exhaustive Search
Tilt
Derivatives
Physical Model
Cross-validation
Computational Model
Frequency Domain
Voice
Speech
Computational
Physical
Modeling

Keywords

  • Glottal inverse filtering
  • IAIF
  • Physical modelling
  • Voice production

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

@article{878c4b3f7916432c8733c7be66211baa,
title = "Estimation of the glottal flow from speech pressure signals: Evaluation of three variants of iterative adaptive inverse filtering using computational physical modelling of voice production",
abstract = "The aim of this study is to comparatively review and evaluate three variants of the glottal inverse filtering algorithm based on iterative adaptive inverse filtering (IAIF): the Standard algorithm, and two recently proposed variants that use iterative optimal preemphasis (IOP) and a glottal flow model (GFM), respectively. To enable an objective evaluation, a computational physical model of voice production is used to generate time-domain signals pertaining to both the input glottal flow and the output speech pressure, for a wide range of vowels, fundamental frequencies, and voice qualities (involving co-variation of phonation type and loudness). Furthermore, for a fair comparison, the three key parameters of IAIF are selected by an exhaustive search to minimize the root-mean-square error between the estimated and reference glottal flow derivative in each analyzed frame and performance is assessed with two time-domain and two frequency-domain error measures. A conventional evaluation is also carried out with fixed parameter values determined by cross-validation. Results indicate that IOP tends to yield the lowest errors for nonback vowels (reducing errors by 31{\%} on average compared with Standard), especially for not too high fundamental frequencies and not too pressed voice qualities; GFM becomes competitive for normal phonations when fixed parameter values are used; and in other cases, Standard IAIF is still recommended. In addition, the results suggest that not only the overall spectral tilt (as controlled by IOP and GFM) but also the balance between the levels of different spectral regions, can be important for accurate estimation of the glottal flow.",
keywords = "Glottal inverse filtering, IAIF, Physical modelling, Voice production",
author = "Parham Mokhtari and Story, {Brad H} and Paavo Alku and Hiroshi Ando",
year = "2018",
month = "11",
day = "1",
doi = "10.1016/j.specom.2018.09.005",
language = "English (US)",
volume = "104",
pages = "24--38",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",

}

TY - JOUR

T1 - Estimation of the glottal flow from speech pressure signals

T2 - Evaluation of three variants of iterative adaptive inverse filtering using computational physical modelling of voice production

AU - Mokhtari, Parham

AU - Story, Brad H

AU - Alku, Paavo

AU - Ando, Hiroshi

PY - 2018/11/1

Y1 - 2018/11/1

N2 - The aim of this study is to comparatively review and evaluate three variants of the glottal inverse filtering algorithm based on iterative adaptive inverse filtering (IAIF): the Standard algorithm, and two recently proposed variants that use iterative optimal preemphasis (IOP) and a glottal flow model (GFM), respectively. To enable an objective evaluation, a computational physical model of voice production is used to generate time-domain signals pertaining to both the input glottal flow and the output speech pressure, for a wide range of vowels, fundamental frequencies, and voice qualities (involving co-variation of phonation type and loudness). Furthermore, for a fair comparison, the three key parameters of IAIF are selected by an exhaustive search to minimize the root-mean-square error between the estimated and reference glottal flow derivative in each analyzed frame and performance is assessed with two time-domain and two frequency-domain error measures. A conventional evaluation is also carried out with fixed parameter values determined by cross-validation. Results indicate that IOP tends to yield the lowest errors for nonback vowels (reducing errors by 31% on average compared with Standard), especially for not too high fundamental frequencies and not too pressed voice qualities; GFM becomes competitive for normal phonations when fixed parameter values are used; and in other cases, Standard IAIF is still recommended. In addition, the results suggest that not only the overall spectral tilt (as controlled by IOP and GFM) but also the balance between the levels of different spectral regions, can be important for accurate estimation of the glottal flow.

AB - The aim of this study is to comparatively review and evaluate three variants of the glottal inverse filtering algorithm based on iterative adaptive inverse filtering (IAIF): the Standard algorithm, and two recently proposed variants that use iterative optimal preemphasis (IOP) and a glottal flow model (GFM), respectively. To enable an objective evaluation, a computational physical model of voice production is used to generate time-domain signals pertaining to both the input glottal flow and the output speech pressure, for a wide range of vowels, fundamental frequencies, and voice qualities (involving co-variation of phonation type and loudness). Furthermore, for a fair comparison, the three key parameters of IAIF are selected by an exhaustive search to minimize the root-mean-square error between the estimated and reference glottal flow derivative in each analyzed frame and performance is assessed with two time-domain and two frequency-domain error measures. A conventional evaluation is also carried out with fixed parameter values determined by cross-validation. Results indicate that IOP tends to yield the lowest errors for nonback vowels (reducing errors by 31% on average compared with Standard), especially for not too high fundamental frequencies and not too pressed voice qualities; GFM becomes competitive for normal phonations when fixed parameter values are used; and in other cases, Standard IAIF is still recommended. In addition, the results suggest that not only the overall spectral tilt (as controlled by IOP and GFM) but also the balance between the levels of different spectral regions, can be important for accurate estimation of the glottal flow.

KW - Glottal inverse filtering

KW - IAIF

KW - Physical modelling

KW - Voice production

UR - http://www.scopus.com/inward/record.url?scp=85053514526&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053514526&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2018.09.005

DO - 10.1016/j.specom.2018.09.005

M3 - Article

AN - SCOPUS:85053514526

VL - 104

SP - 24

EP - 38

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -