The interplay of linguistic structure and breathing in German spontaneous speech

  • INTERSPEECH 2013
  • Amélie Rochet-Capellan : GIPSA-lab, UMR 5216 CNRS/Université de Grenoble –France
  • Susanne Fuchs : Centre for General Linguistics, Berlin – Germany

Abstract

本稿は自発音声における breth groupの言語学的構造と呼吸の運動学の関係を調査した. 26名のドイツ人女性話者のPlethysmograph [1] の磁場の平均値を記録した. brearh group は単一呼気における発話のインターバルと定義した. それぞれのグループ間で,言語学的パラメータ(句のタイプ数や,シラブル数,ためらいなど句のタイプ数や,シラブル数,言いよどみ等)を計測し,吸気と関連づけた. 呼吸グループの平均持続時間は3.5 sec 以上であった. 呼吸ブループのほとんどが1ー3の句 [2] を構成していた. 53%以上は句マトリックスの冒頭で開始しており,24%が句の中に埋め込まれており,23%は不完全な句(継続,反復,言いよどみ)とともに現れた. 吸音の深さと長さは,はじめの句のタイプの関数として,呼吸グループの長さを反映して変化し, 発話計画と呼吸の制御の間にいくつかの交互作用が現れた. これらの結果は,自発的な発話での音声·計画と呼吸制御の相互作用をより良く理解するために有益である, 調査結果はスピーチセラピーやその技術の応用に関連している.

This paper investigates the relation between the linguistic structure of the breath group and breathing kinematics in spontaneous speech. 26 female speakers of German were recorded by means of an Inductance Plethysmograph. The breath group was defined as the interval of speech produced on a single exhalation. For each group several linguistic parameters (number and type of clauses, number of syllables, hesitations) were measured and the associated inhalation was characterized. The average duration of the breath group was ~3.5 s. Most of the breath groups consisted of 1-3 clauses; ~53% started with a matrix clause; ~24% with an embedded clause and ~23% with an incomplete clause (continuation, repetition, hesitation). The inhalation depth and duration varied as a function of the first clause type and with respect to the breath group length, showing some interplay between speech-planning and breathing control. Vocalized hesitations were speaker-specific and came with deeper inhalation. These results are informative for a better understanding of the interplay of speech-planning and breathing control in spontaneous speech. The findings are also relevant for applications in speech therapies and technologies.

注釈

Index Terms

spontaneous speech, breathing kinematics, breath group, inhalation pauses, syntactic clause, hesitation

訳者注

[1]plethysmograph : 呼吸用の計測器っぽいです.
[2]ここで言う句は句のことかと

1. Introduction

数秒のタイムスケールにおいて,音声発話とは発声を伴う長い呼気に付随する短い吸気の休止の連続である. 一つの呼気における音声発話のインターバルは一般的には ” breath group ” として定義されている. これは言語学的,コニュニケーション的,生理的な制約に依存している. Breath group は韻律や発話の知覚に対する重要なユニットでもある[1]. 本稿ではドイツ語の breath group を以下の2つの問いに注目して解析をした.

  1. Brearh group の言語学的な構造は何なのか?
  2. 吸気をしている間にこの構造を予測しているのか?

On a time-scale of several seconds, speech production is a sequence of short inhalations pauses followed by long exhalations with phonation. The interval of speech produced on a single exhalation is commonly defined as the breath group. It relies on linguistic, communicative and physiological constraints. The breath group is also an important unit for prosody and speech perception [1]. The present paper analyses the breath group in German spontaneous speech with respect to two main questions:

  1. What is the linguistic structure of the breath group?
  2. Is this structure anticipated during inhalation?

吸気の深さと続く breath group の言語学的構造体の長さの関係は発話計画と換気の相互作用を反映している[2-8]. これらの関係は読み上げでも,自発音声でも調査されてきた. これらの研究は異なる発話タスクを含んでおり(例えば,センテンスやテキストの読み上げ,異なる認知負荷での自発音声など),異なる方法で呼吸のパラメータを推定している(呼吸音ノイズの検知,例えば[2,9], 口や鼻からの風の計測[3-8,11-16],そして音響と運動学的方法の組み合わせ[17]).

The relation of the inhalation depth and duration to the linguistic structure of the upcoming breath group reflects the interplay of speech-planning with ventilation [2-8]. These relations have been investigated in both read and spontaneous speech. These studies involved different speech tasks (e.g. sentences and texts reading, spontaneous speech with different cognitive load) and estimated breathing parameters with different methods (detection of breath noises, e.g. [2,9]; measurement of the air flow from the mouth and nose, e.g.[10]; monitoring of the kinematics of the chest wall, e.g. [3-8, 11-16], see also [17] for a comparison between acoustic and kinematic methods).

いくつかの研究ではセンテンスや文章において吸気時の breath group の長さの予測を示している. 吸気の深さと長さはセンテンスの長さと共に増加する[6,11-14,16]. 加えて,文読み上げの際の吸気は次に来る breath group の文法的な複雑さ(句の数)とは明確な関係はない. テキストリーディングでは,ほぼ100%の吸気パルスが句読点や接続語(例えば, and など)によってしめされる文法的な境界において生じる. これらは brearh group がセマンティックな構造をとっていることを示す結果である[2-6, 8-10, 15]. テキストリーディングにおいては,吸気の深さも長さも構文上のマークに対しては異なっている(例えば 段落 > ピリオド > コンマ)[5-6].

Several studies show an anticipation of the breath group length during the preceding inhalation for sentence and text reading. The inhalation depth and duration increase with the sentence length [6, 11-14, 16]. Furthermore, inhalations in sentence reading are not clearly related to the syntactic complexity (number of clauses) of the upcoming breath group [12-13]. In text reading, almost 100 % of the inhalation pauses occurs at syntactic boundaries, indicated by punctuation marks or conjunctions (e.g. and). These results show that the breath groups are syntactically structured [2-6, 8-10, 15]. In text reading, the inhalation depth and duration also differed with respect to syntactic marks (e.g. paragraph > period > comma) [5-6].

自発音声においては,呼吸パレスはシンタックスのみではなく,言語学的なコンテンツを生成するために必要な認知プロセスによっても支配されている[2,4,7-8,15]. このプロセスは発話のフローの言いよどみを導入する. 自発音声においては約80%の呼吸パレスが文法的な成分において発声する. 吸音の平均的なアンプリチュードと持続時間はテキスト読み上げに似ており,次にくる brearh group の長さを反映している. brearh group の平均持続時間は読み上げ音声よりも長い[4,7,15,18-19を参照]. これらのパラメータのばらつきの範囲はテキスト読み上げと比較して自発音声のほうが大きい. 自発音声は呼吸に関連しており,異なる機能を有すると想定される有声のためらい発話(/うー/とか/うーむ/とか)によって特徴付けられる[20,21].

In spontaneous speech, the breathing pauses are not only governed by syntax but also by the cognitive processing required to generate the linguistics content [2,4,7-8,15]. This process introduces disfluencies in the speech flow. In spontaneous speech about 80% of the breathing pauses occur at syntactic constituents; the average amplitude and duration of inhalation are similar to text reading and are reflecting the length of the upcoming breath group. The average duration of breath groups is also longer than in text reading [see: 4, 7, 15, 18-19]. The ranges of variability of these parameters are larger in spontaneous speech as compared to text reading. Spontaneous speech is also characterized by the production of vocalized hesitations (uh, um) that have been assumed to have different functions and have been related to breathing [20, 21].

本稿では,ドイツ語の自発音声における, 呼吸の運動学と brearh group 言語学的な構造との関係を評価した. 先行研究に習って,我々は brearh group の中の文法的な構造(句の数)とシラブルの数を考慮した. brearh group 内の句の種類(マトリックス句または埋め込み句)と言いよどみ(ためらい - うーん,うーむ,繰り返し)も調査した. brearh group における最初の句の種類(マトリックス句か,埋め込み句か)は言語学的構造と関連した吸気の位置の指標である. 言いよどみ,特に有声のためらいと呼吸との関連は発話計画を含む認知プロセスに付いて有益な情報を持っている.

This paper evaluates the relationship between the kinematics of breathing and the linguistic structure of the breath group in German spontaneous speech. As in previous studies we consider the syntactic structure (number of clauses) and the number of syllables in the breath group. We also analyzed the type of clauses (matrix, embedded clause) and disfluencies (hesitations – uh, um, repetition, repairs...) in the breath group. The type of the first clause (matrix clause or embedded clause) in the breath group is an indicator of the location of inhalation relative to the linguistic structure. The association of breathing to disfluencies, and especially vocalized hesitations, is informative about the cognitive process involved in speech planning.

2. Experiment

2.1. Subjects

参加者は26人の女性で,ドイツ語母語話者である(年齢:平均25歳 プラマイ3.1 体格指標21.5 プラマイ2.1) 全ての参加者は音声,聴覚障害の経歴はない.

The participants were 26 female, native speakers of German (age: 25 years (mean) ±3.1 (standard deviation), body mass index 21.5 ±2.1). All participants had no known history of speech, language or hearing disorders.

2.2. Experimental settings and procedure

参加者は,指向性マイクと2つのスピーカーの正面に立った(Figure1.A). 自発音声タスクが大きな実験プロトコルの一部である. 休憩時,及び,短い音読時の呼吸を少し収録したあと,参加者はドイツ語の男性,あるいは,女性のネイティブスピーカーによって読み上げられた,10個の簡単な文章の録音音声をよく聞くように指示された. 各トラックはスピーカーを通して再生された. これらの文章を聞いたあと,参加者は内容の簡単な要約を行った. 呼吸運動の観察を妨げるような可能性のある動きを制限するため, 参加者はトランクに沿って,手を維持するように指示された. 肺活量(VC)操作は,胸部の変位とVCによって誘導された腹部を推定するための手順の最後に実行された. そのため,被験者は,できる限り多くの空気を吐き出し,できる限り多くの空気を吸入する.

Participants were standing up in front of a directional microphone and two loudspeakers (Figure 1.A). The spontaneous speech task was part of larger experimental protocol. After a short recording of breathing at rest and short reading, participants were instructed to listen attentively to the audio recordings of ten brief texts (151±22.1 syllables), read by a male or a female native speaker of German. The tracks were played back through the loudspeakers. After listening to each text, participants briefly summarized the story. In order to limit the movements that could interfere with the monitoring of breathing kinematics, participants were instructed to keep their hands along their trunk. Vital capacity (VC) maneuvers were run at the end of the procedure to estimate the displacement of the rib cage and the abdomen induced by VC. To do so, subjects exhaled as much air as they could and then inhaled as much air as they could.

../../_images/fig14.png

Figure 1: - (A) 実験セットアップ - (B) 吸気フェーズ(I)と呼気フェーズ(E)の呼吸運動の例. - (C) breath groupsのラベリングとシラブル,句数.

  • Hはためらい発話部分を示している.詳細は本文参照.

2.3. Data acquisition, processing and labeling

胸囲と腹部の運動学はインダクタンスプレチスモグラフ(RespitranceTM)の平均値を記録した. バンドのひとつは脇の下(胸部)のレベルに位置し,もうひとつのバンドはへそ(腹部,図1参照)のレベルに位置した. 成果は全ての参加者において,胸部でも腹部でも同等であった. 全ての信号は11030Hzでサンプリングした.

The rib cage and the abdominal kinematics were recorded by means of an Inductance Plethysmograph (RespitraceTM). One band was positioned at the level of the axilla (rib cage) and the other band at the level of the umbilicus (abdomen, see Figure 1.A). The acoustic and the breathing signals were recorded synchronously by means of a six channels voltage data acquisition system. The gains were the same for the thorax and the abdomen and for all the participants. All signals were sampled at 11030 Hz,

収録後,呼吸データは200Hzでサブサンプリングし,通過帯域[1-40hz]を濾過した. 発話呼吸の腹部と胸部の After the recording, the breathing data were sub-sampled at 200 Hz and pass-band filtered [1-40Hz]. The contribution of the rib cage and the abdomen to speech breathing varied according to the speaker. For some speakers, breathing cycles were not clear for the abdomen. For these reasons, we analyzed the sum of the rib cage and the abdomen displacements. As RespitraceTM was not calibrated, our measures could over- or sub-estimate the contribution of the thorax relative to the contribution of the abdomen to lung volume and should not be considered as a direct estimation of lung volume [22-23]. To allow comparison between speakers and conditions, displacements were expressed for each subject in %MD (Maximal Displacement). MD was the displacement corresponding to the excursion of the rib cage and the abdomen during the VC maneuver. The onset and offset of inhalations were automatically detected on the breathing signal using the velocity profiles and zero crossing. The detection was then visualized and corrected when required. The breathing cycle was divided into an inhalation and an exhalation phase (Figure 1.B).

Speech productions were labeled in Praat [24] by detecting the onset and offset of vocalizations and by transcribing the spoken text for each breath group. The vocalized hesitations (e.g. uh, um) and the non-breathing pauses were distinguished (see Figure 1.C). On the basis of this transcription, the number of syllables was derived automatically from the output of the BALLOON toolkit [25]. The syntactic labeling of the breath groups was done by a trained phonetician. The clauses were marked by distinguishing between matrix and embedded clauses. German is a language where the position of the auxiliary verb (verb second or verb final) defines the type of clause. Mainly, the clauses with a verb in a second position were considered as matrix (also called main) clauses and those with a verb final position were considered as embedded clauses. For instance, m-e1-e2 characterized a breath group that included one matrix clause followed by two embedded clauses, with the first one (e1) referring to the matrix clause (m), and the second one (e2) referring to the first embedded clause (e1), see Figure 1.C. The third category, uncompleted clauses (u), included words or groups of words corresponding to hesitations, repetitions or repairs.

2.4. Data selection

Our data set included 1467 breath groups. We discarded 45 groups that were perturbed by laugh, cough or body movements. The number of clauses ranged from 1 to 7 (2.11 (mean) ±1.13 (standard error)). The dataset was restricted to groups with 1-3 clauses. They represented 88% of the observations and were produced by all subjects. Only groups starting with m, e1, e2 or u were considered in this study (99% of the groups with 1-3 clauses).

2.5. Measures and analyses

We estimated: (1) the duration of the breath group (dur_g), as the time interval from speech onset to speech offset; (2) the amplitude (amp_I) and duration (dur_I) of inhalation; (3) the relationship between amp_I and the amplitude of exhalation (amp_IE, amp_I divided by the amplitude of exhalation). This last measure evaluates if speakers exhale more air (amp_IE < 1), less air (amp_IE > 1) or the same amount of air (amp_IE = 1) than they have just inhaled to produce the breath group. This measure could not be taken as an indicator of the reserve volume consumption, as displacements values were not expressed relative to a zero volume.

We considered four main factors: (1) the number of clauses in the breath group (n_clauses, 1, 2, 3); (2) the number of syllables n_syll (continuous factor); (3) the type of first clause f_clause (m, e1, e2, u); (4) the type of hesitation: t_hesi (levels: none, at least one at onset: onset, at least one not at onset: elsewhere).

The effects of n_syll, n_clauses and f_clause on the different parameters were tested as fixed factors effects using Linear Mixed Models (LMM), with subject as a random factor. The interactions between factors were not significant and therefore, additive models were calculated. For dur_I and amp_IE the log values were used to satisfy normality. An analysis of hesitation was introduced in a second step with subject as random factor and n_syll and t_hesi as fixed factors. All the effects reported significant were satisfying the criteria pMCMC <.01.

3. Results

Table I. Description of the breath groups according to the number of clauses and to the type of the first clause. NB: Number of breath groups; n_syll: Average number of syllables; dur_g: average duration (± one standard error).

../../_images/tab1.png

3.1. Linguistic structure of the breath group

The average characteristics of the breath groups and their repartition according to the first clause and to the number of clauses are displayed in Table 1. Speakers produced from 13 to 99 breath groups (47.4 (mean) ±4.5 (sterr), Figure 2). Half of the breath groups (53%) started with a matrix clause (m), a quarter (24%) with and embedded clause and the last quarter (23%) with an uncompleted clause (u). On average, the breath group included ~15.9 syllables (range: 1 to 50), and lasted ~3.5 s (range: .17 to 12.1). The number of syllables and the duration of the groups significantly increased with the number of clauses (~+7.5 syllables and +1.5 s per supplementary clause), but were similar for groups starting with a matrix as compared to an embedded clause. Groups starting with an uncompleted clause were ~6 syllables and 1.1 s shorter than the other groups.

../../_images/fig24.png

Figure 2. Number of breath groups for each speaker with repartition of groups in: no hesitation (none), at least one hesitation at onset or elsewhere

The percentage of breath groups with vocalized hesitations ranged form 0 to more than 50% according to the subject (average 40%, see Figure 2). Among the breath groups with at least one hesitation (n=482), 40% started with a hesitation. Note that the groups with at least one hesitation not at the onset of the group were longer than the groups starting with a hesitation (~+3syllables and ~+749 ms) and than the groups without hesitation (~+3syllables and ~+1246 ms). The effect of hesitation type (t_hesi) on the number of syllables and the duration of the group were significant but didn’t interact with the effect of the first clause.

../../_images/fig32.png

Figure 3. Correlations between n_syll in the breath group with: dur_I and amp_I and amp_IE, all values (top), average (bottom). Correlations values for amp_IE are indicated for log(amp_IE), see text for details.

3.2. Breathing kinematics

On average, the duration of inhalation was 676 ms (±8.5) and the amplitude was 17.6 %MD (± 0.2). The amplitude and the duration of inhalation depended both on the length of the breath group and on the type of the first clause. These values were also positively correlated with the number of syllables (r = ~.20 for all values and r = ~.60 for average correlations, see Figure 3, first two columns). LMM showed a significant effect of n_syll on both amp_I and dur_I.

../../_images/fig42.png

Figure 4. Average and standard errors of dur_I, amp_I and amp_IE according to n_clauses and f_clause (white panels) and to the type of hesitation in the breath group (gray panel)

The duration of inhalation (Figure 4.A) significantly increased from 1 to 2 (+26 ms) and 2 to 3 (+36 ms) clauses. Dur_I was also longer for groups starting with a matrix clause as compared to other types of clauses (+197 ms). Inhalation (Figure 4.B) was significantly deeper when the first clause of the upcoming group was a matrix clause (+3.5 %MD) than any other clauses. Yet, amp_I did not significantly depend on the number of clauses. The analysis of the inhalation displacement relative to the exhalation displacement (amp_IE, Figure 3 and 4) shows: (1) that amp_IE was close to 1 for groups with 2 clauses and groups with 15-18 syllables; (2) a significant linear correlation between the logarithm of amp_IE with the number of syllables (-.48 for all values, -.83 for average, significant effect of n_syll); (3) an effect of the number of clauses (1 > 2 > 3); (4) no significant effect of the type of the first clause. Hence, on average, the inhalation displacement was similar to the exhalation displacement for groups with 2 clauses or 15-18 syllables, larger for shorter groups and smaller for longer groups. Inhalations were deeper (+2.54 %MD) and longer (+41 ms) for the breath groups with at least one hesitation as compare with no hesitation (Figure 4). The effect of t_hesi on amp_IE was not significant when the number of syllables was taken into account.

4. Discussion

The present study investigated the linguistic structure of the breath group in German spontaneous speech and evaluated if this structure is reflected in breathing kinematics. The important findings are:

    1. Inhalations occur at syntactic boundaries (before a matrix or an embedded clause) or before a disfluency (uncompleted clause, repetition, hesitation, repair);
    1. Inhalation depth and duration reflect: (2.1) the length of the breath group (number of syllables); (2.2) the type of the first clause, with deeper and longer inhalation for groups starting with a matrix clause as compared to the other groups; (2.3) vocalized hesitations, with deeper and longer inhalations for groups that include at least one vocalized hesitation as compared to none;
    1. Syntactic complexity (number of clauses) is reflected only in the duration but not in the amplitude of inhalation;
    1. On average the amplitude of exhalation is similar to the amplitude of inhalation for groups with 2 clauses or 15-18 syllables.

The observation that most of the inhalation pauses respect the syntactic organization of speech is consistent with previous work on English spontaneous speech [7,15]. The average duration (3.5 s), the number of syllables in the breath group (16 syllables) and the duration of inhalation (~.7 s) are also similar to values reported in the literature on English language ([7,8,15]).

As described in the introduction, previous studies found deeper and longer inhalations for longer utterances. Our dataset also show these relations. However, we also found that inhalations were deeper and longer for the breath groups starting with a matrix clause and for the groups including hesitations as compared to the other groups. To our knowledge, the relationship between the type of the first clause and hesitation to inhalation parameters have not been investigated so far for spontaneous speech. This relation is important with respect to the understanding of speech planning. It suggests that speaker inhale more air: (1) when they are starting a matrix clause that may come with other related clauses; (2) when they produce hesitations and do not know exactly what they are going to say. In this case, they can use vocalized hesitations as fillers during the exhalation phase, which could help to preserve ventilation and speech at the same time [21]. The fact that the breath groups with a hesitation at the onset were shorter than groups with a later hesitation shows that when hesitation came at the onset of the group, speaker probably inhaled again soon after it.

We also found that groups with an average number of syllables (15-18) show similar exhalation and inhalation amplitudes. These breath groups correspond to 2 clauses and could be a “favored” association between linguistic structure and breathing. This hypothesis should be tested by considering inter-speaker variability and speaker-specific lung volume capacities.

The speech task used in the present study required speakers to summarize the story they have just heard. This task is cognitively demanding and could have influenced the production of hesitations and the breathing profiles. This is in line with inter-speakers variability we found with respect to the number of breath groups and hesitations produced in the current task. To our knowledge only [8] have investigated the possible effect of cognitive load on breathing kinematics during spontaneous speech. We think it is important to distinguish between speaker-specific behaviors according to the task (e.g. variation in disfluency, hesitations).

5. Limits and perspectives

This study is a first analysis of a larger corpus of breathing kinematics in German spontaneous speech that now includes more than 50 speakers. Our global aim is to understand the interplay of speech planning and breathing in unconstrained speech. From the current study some first issues appear: (1) it is difficult to distinguish between the effect of the number of syllables and the effect of the number of clauses. Note that the quartile of the average number of syllables (10-15-21) were close to the average number of syllables in 1, 2, and 3 clauses, respectively (10-17-25 syllables); (2) Uncompleted clauses should be analyzed in more detail by splitting between hesitations, repairs and repetitions, that could have specific effect on breathing; (3) the amplitude of inhalation anticipates the upcoming breath group, but may also rely on what happened before [9]. This may be especially true for groups starting with an embedded clause. The next step is also to characterize the breath group in spontaneous speech not only as an individual unit but as a temporal sequence that depends on the preceding and following speech.

Speaker-specific behavior and context effects should also be considered. Previous studies on read and spontaneous speech, found that the properties of the breath group and their relations to inhalation parameters are speaker-specific [10,13], varied with age [11], cognitive load [8], speech rate [3] and loudness [16,19]. A large variability has also been observed for a same subject across repetitions and according to her emotional state [6-7, 10]. The sensitivity of speakers’ breathing regarding these multiple influences is important to understand the interplay between linguistics and respiration and may provide a fundamental tool for pathological diagnostics and speech therapy. Furthermore, implementing breathing in speech synthesis may improve the naturalness of speech synthesizers.

6. Acknowledgements

This work was funded by a grant from the BMBF (01UG0711) and the French-German University to the PILIOS project. The authors want to thanks Jörg Dreyer, Anna Sopronova and Uwe Reichel for their help with data collection and labeling.

7. References

  • [1] Lieberman, P., Intonation, Perception and Language. (1967), Cambridge MA: MIT Press.
  • [2] Henderson, A., Goldman-Eisler, F., & Skarbek, A. (1965). Temporal patterns of cognitive activity and breath control in speech. Lang Speech, 8, 236–242.
  • [3] Grosjean, F. & Collins, M. (1979). Breathing, pausing and reading. Phonetica, 36(2), 98–114.
  • [4] Conrad, B. & Schonle, P. (1979). Speech and respiration. Arch Psychiatr Nervenkr, 226, 251–268.
  • [5] Conrad, B., Thalacker, S., & Schonle, P. (1983). Speech respiration as an indicator of integrative contextual processing. Folia Phoniatr (Basel), 35, 220–225.
  • [6] Winkworth, A. L., Davis, P. J., Ellis, E., & Adams, R. D. (1994). Variability and consistency in speech breathing during reading: lung volumes, speech intensity, and linguistic factors. J Speech Hear Res, 37, 535–556.
  • [7] Winkworth, A. L., Davis, P. J., Adams, R. D., & Ellis, E. (1995). Breathing patterns during spontaneous speech. J Speech Hear Res, 38, 124–144.
  • [8] Mitchell, H. L., Hoit, J. D., & Watson, P. J. (1996). Cognitivelinguistic demands and speech breathing. J Speech Hear Res, 39, 93–104.
  • [9] Bailly, G. and Gouvernayre, C. (2001). Pauses and respiratory markers of the structure of book reading. in Interspeech. 2012. Portland, OR.
  • [10] Teston, B. and Autesserre, D. (1987). L’ aérodynamique du souffle phonatoire utilisé dans la lecture d’un texte en français. in International Congress of Phonetic Sciences (ICPhS). Estonia, University of Tallin. p. 33-36.
  • [11] Sperry, E. E. & Klich, R. J. (1992). Speech breathing in senescent and younger women during oral reading. J Speech Hear Res, 35, 1246–1255.
  • [12] Whalen, D. H. & Kinsella-Shaw, J. M. (1997). Exploring the relationship of inspiration duration to utterance duration. Phonetica, 54, 138–152.
  • [13] Fuchs, S., Petrone, C. Krivokapic, J. & Hoole, P. (2013). Acoustic and respiratory evidence for utterance planning in German. Journal of Phonetics 41. 29-47.
  • [14] McFarland, D. H. & Smith, A. (1992). Effects of vocal task and respiratory phase on prephonatory chest wall movements. J Speech Hear Res, 35, 971–982.
  • [15] Wang, Y. T., Green, J. R., Nip, I. S., Kent, R. D., & Kent, J. F. (2010). Breath group analysis for reading and spontaneous speech in healthy adults. Folia Phoniatr Logop, 62, 297–302.
  • [16] Huber, J. E. (2008). Effects of utterance length and vocal loudness on speech breathing in older adults}. Respir Physiol Neurobiol, 164, 323–330.
  • [17] Wang, Y.T., Nip, I.S.B., Green, J.R., Kent, R.D., Kent, J.F., Ullman, C. Accuracy of perceptual and acoustic methods for the detection of inspiratory loci in spontaneous speech. Behavior
  • Research Methods, 2012. 44(4): p. 1121-1128 [18] McFarland, D. H. (2001). Respiratory markers of conversational interaction}. J. Speech Lang. Hear. Res., 44, 128
  • [19] Huber, J. E. (2007). Effect of cues to increase sound pressure level on respiratory kinematic patterns during connected speech}. J. Speech Lang. Hear. Res., 50, 621–634.
  • [20] Ferreira, F. and Bailey K. G.D. (2004). Disfluencies and human language comprehension. Trends in Cognitive Sciences, 8(5), 231–237.
  • [21] Schonle, P. W. & Conrad, B. (1985). Hesitation vowels: a motor speech respiration hypothesis. Neurosci. Lett., 55, 293–296.
  • [22] Konno, K. & Mead, J. (1967). Measurement of the separate volume changes of rib cage and abdomen during breathing. Journal of Applied Physiology 22(3), 407–422.
  • [23] Banzett, R. B., Mahan, S. T., Garner, D. M., Brughera, A. & Loring, S. H. (1995). A simple and reliable method to calibrate respiratory magnetometers and Respitrace. Journal of Applied Physiology, 79(6), 2169-2176.
  • [24] Boersma, P. and D. Weenink, Praat, a System for doing Phonetics by Computer, version 3.4, in Institute of Phonetic Sciences of the University of Amsterdam, Report 132. 182 pages. 1996.
  • [25] Reichel, U.D. (2012). PermA and Balloon: Tools for string alignment and text processing. Proceedings of Interspeech, Portland, paper 346.