OPENGLOT – An open environment for the evaluation of glottal inverse filtering

Paavo Alku, Tiina Murtola, Jarmo Malinen, Juha Kuortti, Brad H Story, Manu Airaksinen, Mika Salmi, Erkki Vilkman, Ahmed Geneid

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, cannot be recorded non-invasively from natural speech. This absence of the ground truth has been circumvented in most previous GIF studies by using simple linear source-filter synthesis techniques with known artificial glottal flow models and all-pole vocal tract filters. Moreover, in a few previous studies, physical modeling of speech production has been utilized in synthesis of the test data for GIF evaluation. The evaluation strategy in previous GIF studies is, however, scattered between individual investigations and there is currently a lack of a coherent, common platform to be used in GIF evaluation. In order to address this shortcoming, the current study introduces a new environment, called OPENGLOT, for GIF evaluation. The key ideas of OPENGLOT are twofold: the environment is versatile (i.e., it provides different types of test signals for GIF evaluation) and open (i.e., the system can be used by anyone who wants to evaluate her or his new GIF method and compare it objectively to previously developed benchmark techniques). OPENGLOT consists of four main parts, Repositories I–IV, that contain data and sound synthesis software. Repository I contains a large set of synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an artificial excitation, and a digital all-pole filter to model the vocal tract. Repository II contains glottal flow and speech pressure signals generated using physical modeling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, electroglottogram, high-speed video of the vocal folds) from natural production of speech. After presenting these four core parts of OPENGLOT, the article demonstrates the platform by presenting a typical use case.

Original languageEnglish (US)
Pages (from-to)38-47
Number of pages10
JournalSpeech Communication
Volume107
DOIs
StatePublished - Feb 1 2019

Fingerprint

Filtering
Evaluation
evaluation
Repository
Speech Production
Physical Modeling
Excitation
Speech Signal
Synthesis
Filter
Pole
Waveform
Poles
Fold
Loudspeakers
Use Case
Replica
Large Set
Speech
recording

Keywords

  • Evaluation tool
  • Glottal flow
  • Glottal inverse filtering
  • Speech production

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

OPENGLOT – An open environment for the evaluation of glottal inverse filtering. / Alku, Paavo; Murtola, Tiina; Malinen, Jarmo; Kuortti, Juha; Story, Brad H; Airaksinen, Manu; Salmi, Mika; Vilkman, Erkki; Geneid, Ahmed.

In: Speech Communication, Vol. 107, 01.02.2019, p. 38-47.

Research output: Contribution to journalArticle

Alku, P, Murtola, T, Malinen, J, Kuortti, J, Story, BH, Airaksinen, M, Salmi, M, Vilkman, E & Geneid, A 2019, 'OPENGLOT – An open environment for the evaluation of glottal inverse filtering', Speech Communication, vol. 107, pp. 38-47. https://doi.org/10.1016/j.specom.2019.01.005
Alku, Paavo ; Murtola, Tiina ; Malinen, Jarmo ; Kuortti, Juha ; Story, Brad H ; Airaksinen, Manu ; Salmi, Mika ; Vilkman, Erkki ; Geneid, Ahmed. / OPENGLOT – An open environment for the evaluation of glottal inverse filtering. In: Speech Communication. 2019 ; Vol. 107. pp. 38-47.
@article{2bcb0813118f46439d18dfdee78cd214,
title = "OPENGLOT – An open environment for the evaluation of glottal inverse filtering",
abstract = "Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, cannot be recorded non-invasively from natural speech. This absence of the ground truth has been circumvented in most previous GIF studies by using simple linear source-filter synthesis techniques with known artificial glottal flow models and all-pole vocal tract filters. Moreover, in a few previous studies, physical modeling of speech production has been utilized in synthesis of the test data for GIF evaluation. The evaluation strategy in previous GIF studies is, however, scattered between individual investigations and there is currently a lack of a coherent, common platform to be used in GIF evaluation. In order to address this shortcoming, the current study introduces a new environment, called OPENGLOT, for GIF evaluation. The key ideas of OPENGLOT are twofold: the environment is versatile (i.e., it provides different types of test signals for GIF evaluation) and open (i.e., the system can be used by anyone who wants to evaluate her or his new GIF method and compare it objectively to previously developed benchmark techniques). OPENGLOT consists of four main parts, Repositories I–IV, that contain data and sound synthesis software. Repository I contains a large set of synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an artificial excitation, and a digital all-pole filter to model the vocal tract. Repository II contains glottal flow and speech pressure signals generated using physical modeling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, electroglottogram, high-speed video of the vocal folds) from natural production of speech. After presenting these four core parts of OPENGLOT, the article demonstrates the platform by presenting a typical use case.",
keywords = "Evaluation tool, Glottal flow, Glottal inverse filtering, Speech production",
author = "Paavo Alku and Tiina Murtola and Jarmo Malinen and Juha Kuortti and Story, {Brad H} and Manu Airaksinen and Mika Salmi and Erkki Vilkman and Ahmed Geneid",
year = "2019",
month = "2",
day = "1",
doi = "10.1016/j.specom.2019.01.005",
language = "English (US)",
volume = "107",
pages = "38--47",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",

}

TY - JOUR

T1 - OPENGLOT – An open environment for the evaluation of glottal inverse filtering

AU - Alku, Paavo

AU - Murtola, Tiina

AU - Malinen, Jarmo

AU - Kuortti, Juha

AU - Story, Brad H

AU - Airaksinen, Manu

AU - Salmi, Mika

AU - Vilkman, Erkki

AU - Geneid, Ahmed

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, cannot be recorded non-invasively from natural speech. This absence of the ground truth has been circumvented in most previous GIF studies by using simple linear source-filter synthesis techniques with known artificial glottal flow models and all-pole vocal tract filters. Moreover, in a few previous studies, physical modeling of speech production has been utilized in synthesis of the test data for GIF evaluation. The evaluation strategy in previous GIF studies is, however, scattered between individual investigations and there is currently a lack of a coherent, common platform to be used in GIF evaluation. In order to address this shortcoming, the current study introduces a new environment, called OPENGLOT, for GIF evaluation. The key ideas of OPENGLOT are twofold: the environment is versatile (i.e., it provides different types of test signals for GIF evaluation) and open (i.e., the system can be used by anyone who wants to evaluate her or his new GIF method and compare it objectively to previously developed benchmark techniques). OPENGLOT consists of four main parts, Repositories I–IV, that contain data and sound synthesis software. Repository I contains a large set of synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an artificial excitation, and a digital all-pole filter to model the vocal tract. Repository II contains glottal flow and speech pressure signals generated using physical modeling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, electroglottogram, high-speed video of the vocal folds) from natural production of speech. After presenting these four core parts of OPENGLOT, the article demonstrates the platform by presenting a typical use case.

AB - Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, cannot be recorded non-invasively from natural speech. This absence of the ground truth has been circumvented in most previous GIF studies by using simple linear source-filter synthesis techniques with known artificial glottal flow models and all-pole vocal tract filters. Moreover, in a few previous studies, physical modeling of speech production has been utilized in synthesis of the test data for GIF evaluation. The evaluation strategy in previous GIF studies is, however, scattered between individual investigations and there is currently a lack of a coherent, common platform to be used in GIF evaluation. In order to address this shortcoming, the current study introduces a new environment, called OPENGLOT, for GIF evaluation. The key ideas of OPENGLOT are twofold: the environment is versatile (i.e., it provides different types of test signals for GIF evaluation) and open (i.e., the system can be used by anyone who wants to evaluate her or his new GIF method and compare it objectively to previously developed benchmark techniques). OPENGLOT consists of four main parts, Repositories I–IV, that contain data and sound synthesis software. Repository I contains a large set of synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an artificial excitation, and a digital all-pole filter to model the vocal tract. Repository II contains glottal flow and speech pressure signals generated using physical modeling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, electroglottogram, high-speed video of the vocal folds) from natural production of speech. After presenting these four core parts of OPENGLOT, the article demonstrates the platform by presenting a typical use case.

KW - Evaluation tool

KW - Glottal flow

KW - Glottal inverse filtering

KW - Speech production

UR - http://www.scopus.com/inward/record.url?scp=85061065066&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061065066&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2019.01.005

DO - 10.1016/j.specom.2019.01.005

M3 - Article

AN - SCOPUS:85061065066

VL - 107

SP - 38

EP - 47

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -