Phonological theory informs the analysis of intonational exaggeration in Japanese infant-directed speech

  • Yosuke Igarashi - Graduate School of Letters, Hiroshima University, - 1-2-3 Kagamiyama, Higashihiroshima-shi,Hiroshima 739-8522, Japan
  • Ken’ya Nishikawa, Kuniyoshi Tanaka, and Reiko Mazuka - Laboratory for Language Development, Brain Science Institute, RIKEN, - 2-1 Hirosawa, Wako-shi,Saitama 351-0198, Japan

abstract

To date, the intonation of infant-directed speech (IDS) has been analyzed without reference to its phonological structure. Intonational phonology should, however, inform IDS research, discovering important properties that have previously been overlooked. The present study investigated “intonational exaggeration” in Japanese IDS using the intonational phonological framework. Although intonational exaggeration, which is most often measured by pitch-range expansion, is one of the best-known characteristics of IDS, Japanese has been reported to lack such exaggeration. The present results demonstrated that intonational exaggeration is in fact present and observed most notably at the location of boundary pitch movements, and that the effects of lexical pitch accents in the remainder of the utterances superficially mask the exaggeration. These results not only reveal dynamic aspects of Japanese IDS, but also in turn contribute to the theory of intonational phonology, suggesting that paralinguistic pitch-range modifications most clearly emerge where the intonation system of a language allows maximum flexibility in varying intonational contours.

I.INTRODUCTION

A.Background

During the past few decades, studies of intonation in various languages have demonstrated that the intonation patterns of a language possess a phonological organization (Pierrehumbert, 1980; Ladd, 1996). Research on typologically different languages has revealed that each language’s way of organizing intonation reflects both universal and language-specific properties (Gussenhoven, 2004; Jun, 2005). To date, the theoretical frameworks that have been presented for intonational phonology have been largely based on the analysis of speech produced in idealized conditions, such as controlled laboratory speech. In natural speech, however, which occurs in live communication among speakers under a variety of conditions and contexts, para- or extra-linguistic factors such as the emotional intent of speakers and the cognitive constraints of speakers or listeners can impact intonation. In many languages, for example, fundamental frequency (F0) is raised when the speaker is angry and lowered when s/he is sad (e.g., Williams and Stevens, 1972). Utterances spoken by individuals with autism reportedly have monotonous F0 contours (cf. McCann and Peppe, 2003). These variations in F0 contour can be seen as phonetic implementations of a phonological representation of intonation (Gussenhoven, 2004). Better understanding of these factors will increase our knowledge of intonational phonology in general. To fully capture the nature of the intonation of human language, therefore, it will be useful to further expand the scope of research beyond idealized speech and include more speech data produced under various real-life conditions.

To this end, analyses of how intonation is modified in specialized registers of speech, such as infant-directed speech (IDS), could shed new light on intonation. As discussed further below, intonation of IDS tends to be “exaggerated.” The fact that the intonation of a language can be modified systematically in a given register indicates that the system of intonation is dynamic; that is, it can shift the realization of intonation dynamically to accommodate the specific paralinguistic factors of a register. Such dynamic properties, which must be one of the ways the phonological structure of intonation is implemented phonetically, cannot be observed in idealized speech. In turn, the development of an intonational phonological framework gives IDS research (and language acquisition research broadly) a tool to capture the universal and language-specific ways intonation is modified in IDS. In the present paper, specifically, we will demonstrate that exaggeration of intonation is actually present in Japanese IDS, a fact which has been overlooked in previous studies. This discovery was made possible only when the phonological structure of Japanese intonation was taken into consideration.

B.Infant-directed speech

IDS is known to play several important roles in communication between infants and caregivers, such as capturing the infants’ attention, communicating affect, and facilitating the infants’ language development as a result of certain distinctive properties (e.g., Fernald, 1989; Kitamura et al., 2002). An extensive body of research has examined in what ways IDS differs from adult-directed speech (ADS) among the world’s languages (cf. Soderstrom, 2007, for a review). Of the various properties of IDS, modification of intonation is arguably the best known. Specifically, caregivers are reported to use “exaggerated” or “sing-songy” intonation in IDS across many languages, and this intonation is often argued to be one of the universal properties of IDS (Ferguson, 1977; Fernald et al., 1989; Grieser and Kuhl, 1988; Kitamura et al., 2002). Intonational exaggeration in IDS, which is defined in general as the expansion of the pitch range of utterances as compared to ADS, is an example of the dynamic properties of the intonation structures, as discussed above, where the realization of intonation is modified or “stretched” from a typical (ADS) intonation by the paralinguistic factor of the caregiver’s attempt to communicate with the infant.

Interestingly, however, significant cross-linguistic differences are also known to exist in the characteristics of IDS (Bernstein Ratner and Pye, 1984; Fernald et al., 1989; Grieser and Kuhl, 1988; Kitamura et al., 2002; Papousek et al., 1991). Fernald et al. (1989) compared intonational modifications in six languages/varieties (French, Italian, German, British English, American English, and Japanese), and found that all of them except Japanese showed pitchrange expansion in IDS. Fernald (1993) also reported that although English-learning infants responded appropriately to positive and negative emotional prosody in IDS in English, German, and Italian, they failed to do so with Japanese IDS. These findings suggest that intonational modifications in Japanese IDS may differ substantially from those of Germanic and Romance languages.

It might be possible for differences in intonational modifications in IDS between languages to arise from cultural differences in mother–infant interaction (Bornstein et al., 1992; Fernald and Morikawa, 1993; Ingram, 1995; Toda et al., 1990). For example, Fernald and Morikawa (1993) revealed that Japanese mothers interact with their infants quite differently from their American counterparts. The implication of this account is that intonation in Japanese IDS is not exaggerated. For Japanese adults, however, the intonational characteristics of Japanese IDS are clearly distinct from those of ADS, and to them, perceptually, it does sound exaggerated (Horie et al., 2008). For Japanese infants as well, behavioral studies show that Japanese IDS is preferred over ADS (Hayashi et al., 2001). It is possible that these Japanese adults and infants were responding to characteristics of Japanese IDS not related to its intonation (cf. Inoue et al., 2011). Nonetheless, the fact that the infants’ data are comparable to findings from among English-learning infants (Cooper et al., 1997; Newman and Hussain, 2006; Pegg et al., 1992) and that the pitch characteristics of IDS seem to be the critical factor in infants’ preferences both in English-learning infants (Fernald and Kuhl, 1987) and Japanese-learning infants (Hayashi, 2004), is sufficient to prompt us to explore the possibility that Japanese IDS is in fact exaggerated.

C.The present study

In the present paper, we will pursue an alternative account of the apparent lack of intonational exaggeration in Japanese IDS. Given that the intonation of a language has a phonological organization (Ladd, 1996; Gussenhoven, 2004; Jun, 2005), it is reasonable to assume that the phonology of a language restricts the way the realization of intonation is affected by paralinguistic factors. For example, languages differ in the way, and the degree to which, pitch changes are utilized lexically as opposed to intonationally. In previous studies, pitch-range expansion in IDS in tone languages, such as Chinese and Thai, has been demonstrated to be significantly smaller than that in English (Grieser and Kuhl, 1988; Kitamura et al., 2002; Papousek et al., 1991), arguably because lexical use of pitch in a language restricts the flexibility to exaggerate pitch range in IDS. Japanese, which is not a tone language but a pitch accent language (Pierrehumbert and Beckman, 1988) may represent a case in which utterance-level differences in intonation due to the speech register (ADS vs IDS) are superficially camouflaged by the impact of lexical pitch accent.

Previous studies have described the intonational modification of IDS without reference to the internal phonological structure of intonation: They have simply measured the mean, minimum, and maximum, values of the F0 contour of the overall utterance (e.g., Ferguson, 1977; Fernald et al., 1989; Grieser and Kuhl, 1988; Papousek et al., 1991; Kitamura et al., 2002). However, if the mechanism of register-induced modification of intonation differs across languages due to language-specificity in intonational phonology, then these conventional measurements may not have been sufficient to capture the nature of intonational exaggeration for each language. In the present study, we analyze a large corpus of IDS in Japanese (Mazuka et al., 2006), and demonstrate that (1) Japanese mothers in fact expand the pitch ranges when they talk to their infants; (2) that this pitch-range expansion or intonational exaggeration in Japanese IDS is observed locally at specific structural positions; and (3) that the profile of the intonational exaggeration cannot be captured unless the phonological structure of Japanese intonation is taken into account.

The structure of this paper is as follows. In Sec. II, we will first describe the phonological structure of Japanese intonation and discuss the phonological entities that should be considered for the valid measurement of pitch modification in Japanese. Section III describes the design and characteristics of the Japanese IDS corpus we used for the analysis, and Sec. IV describes the results of the analysis. Finally, in Sec. V, we will discuss the significance of our results in light of the universal and language-specific characteristics of IDS.

II.THE JAPANESE INTONATION SYSTEM

A.The phonology of Japanese intonation

The description of the Japanese intonation system in this paper is based on the framework called X-JToBI (Maekawa et al., 2002). It is an extended version of the original Japanese Tone and Break Indices, or J_ToBI, framework (Venditti, 2005), which owes its theoretical foundation to the major study of Japanese intonation by Pierrehumbert and Beckman (1988). In this section, we will describe two aspects of Japanese intonation that are relevant to the present study: (1) Lexical pitch accent (and related phenomena) and (2) boundary pitch movement (BPM).

2.BPM

The second major element of Japanese intonation is the inventory of BPMs. BPMs are tones that can occur at the end of an AP and contribute to the pragmatic interpretation of the phrase, indicating features such as questioning, emphasis, and continuation (Venditti et al., 2008). Not all APs have BPMs; in fact, most APs are not marked by BPMs. The occurrence of BPMs is not restricted to IP-final or utterance-final APs. They can also occur at the end of IPmedial APs.

The inventory of BPMs indicated in the X-JToBI system is H% (rise), LH% (scooped rise), (HL)% (rise-fall), and HLH% (rise-fall-rise), as well as their variations (Maekawa et al. 2002). Figure 4 depicts these four main types of BPM. As can be seen from the figure, all types of BPM consist of a rise at their beginning. In most cases, the rise starts around the onset of the AP-final mora.

Which pragmatic intentions each of the BPM conveys is not without controversy. Briefly speaking, H% gives prominence to the constituent to which it associates, LH% is generally exploited to signal a question, and HL% is often used in a context where a speaker is explaining some point to a listener (Venditti et al., 2008). HLH% occurs quite infrequently, and, according to Venditti et al. (2008), it gives a wheeling or cajoling quality to the utterance. In Figs. 2 and 3 above, the final morae of the utterances are marked by a LH% (scooped rise) BPM.

../../_images/fig13.png

注釈

FIG. 1.

Waveforms and F0 contours of unaccented accentual phrase (AP) amai ame “sweet candy” (left) and accented AP uma’i ame “tasty candy.” Vertical lines indicate AP boundaries. The adjective amai “sweet” and the noun ame “candy” in the phrase amai ame “sweet candy” (left) are both lexically specified as unaccented, while the adjective uma’i “tasty” in the phrase uma’i ame “tasty candy” (right) is lexically specified as accented on the second mora/ ma/, which exhibits an F0 fall starting near its end.

../../_images/fig23.png

注釈

FIG. 2.

Waveform and F0 contour of utterances without and with downstep: an utterance without downstep yubiwa-o wasure-ta onna’-wa dare-desu-ka? “Who is the woman that left the ring behind?” (top), and an utterance with downstep on the third AP yubiwa-o era’nd-ta onna’-wa da’re-desu-ka? “Who is the woman that chose the ring?” (bottom). Dotted vertical lines stand for AP boundaries, and solid vertical lines for IP boundaries. In the utterance in the top panel, an accented AP onna’-wa “woman-TOP” follows unaccented APs yubiwa-o “ring-ACC” and wasureta “left behind,” and thus downstep does not occur. In the utterance in the bottom panel, on the other hand, the accented AP onna’-wa “woman-TOP” follows an accented AP era’nda “chose.” The F0 peak of the AP onna’-wa “woman-TOP” is therefore reduced due to downstep caused by the preceding accented AP. In both utterances, pitch range is reset at the beginning of the last AP da’re-desu-ka “who-COPULA-Q,” whose peak is as high as that in the preceding AP. We can therefore posit an IP boundary between on’nawa and da’re-desu-ka “who-COPULA-Q” in both utterances.

../../_images/fig31.png

注釈

FIG. 3.

Waveform and F0 contour of an utterance with four successive downsteps, ao’i ya’ne-no ie’-o era’nda onna’-wa da’re-desu-ka? “Who is the woman who chose the house with a blue roof?” Vertical lines indicate AP boundaries and solid vertical lines indicate IP boundaries. Five accented APs, ao’i ya’ne-no ie-o era’nda onna’-wa “blue roof-GEN house-ACC woman-TOP” constitute a single IP. The F0 peaks of the last four APs are iteratively lowered, i.e., downstepped, until pitch reset occurs at the beginning of the following IP. Also, anticipatory raising is observed in the F0 peak of the first AP, which is notably high.

../../_images/fig41.png

注釈

FIG. 4.

Waveforms and F0 contours of four BPM types—H%, LH%, HL%, and LH%—appearing in the same utterance I’ma-ne… “Just now…”. Squares mark BPMs.

B.Possible impacts of pitch accent and BPM on pitch range

There are two elements that can impact pitch ranges in Japanese intonation: Lexical pitch accent (and associated downstep and anticipatory raising) and BPMs. As will be clearer below, each of these two factors functions to expand the pitch range for different reasons. This leads us to hypothesize (1) that the intonational exaggeration in IDS most clearly occurs in those tones that the speaker chooses for pragmatic reasons, which only occur at BPMs in Japanese, (2) that pitch ranges can also be enlarged by pitch accents, whose effect grows as the IPs become longer and the number of accents within the IP increases, and (3) that the interaction of these two factors can superficially mask the intonational exaggeration that may exist in Japanese IDS.

We begin by discussing the first part of this hypothesis. When speakers express pragmatic intent, they may do so by varying the intonation contour. Languages do not, however, typically permit speakers to vary F0 contours without limits—rather, they are restricted by the phonology of the language (cf. Ladd, 1996). In other words, in order to convey specific pragmatic information, speakers are free to choose from an inventory of intonational/pragmatic tones (Ward and Hirschberg, 1985; Pierrehumbert and Hirschberg, 1990), but such pragmatically chosen tones are structured according to the phonology of the particular language. It follows not only that the inventory of these pragmatic tones is languagespecific, but also that the locations in the utterance in which pragmatically chosen tones appear differ across languages. Given that variability in the f0 contour is restricted by the phonology of the language, it is reasonable to assume that intonational exaggeration will be most clearly observed at those locations where the intonation system of a language allows maximum flexibility in varying F0 contours (i.e., the location where pragmatically chosen tones can appear), rather than to assume that it is uniformly distributed over the utterance.

Cross-linguistic differences in the locations of pragmatic tones (as well as in their inventory) may become clear by comparing the intonation system of Japanese to that of English. Figure 5(a) shows the finite-state grammar for English of Pierrehumbert (1980), which can generate all the intonational contours in that language. In this language, all of the tonal categories (pitch accent, phrase accent, and boundary tone) involve a choice of tones. For each type of structure speakers choose one of these alternatives in order to convey different types of pragmatic information (Ward and Hirschberg, 1985; Pierrehumbert and Hirschberg, 1990), and thus these tones are pragmatically chosen ones.

The structure of Japanese intonation (Pierrehumbert and Beckman, 1988; Maekawa et al., 2002; Venditti, 2005) is summarized in the finite-state grammar in Fig. 5(b), which shows that except for the end of the phrase, the F0 contour does not allow much variability. The only tonal options available in this part of the utterance involve the presence or absence of H*þL (a lexical pitch accent). As discussed above, however, this is an intrinsic part of the lexical representation of a word and does not vary according to the speaker’s pragmatic intent.

The contour at the end of the phrase, in contrast, can be much more variable. This is the location allocated to various types of BPM. Unlike lexical pitch accent, the choice of BPMs depends on pragmatic factors. When speakers wish to express some pragmatic information, they have a choice as to whether they assign a BPM to a given phrase and as to what type of BPM they use.

If, as we hypothesized above, intonational exaggeration in IDS should emerge at the pragmatically chosen tones, then in English it should appear anywhere in the contour: It should be observed not only in stressed syllables where pragmatic pitch accents occur but also at the edges of phrases where phrasal accents and boundary tones appear. In Japanese, by contrast, exaggeration should be confined to a more restricted part of the contour, namely, in the BPM at the end of the AP, which is the only location where pragmatically chosen tones are realized in this language.

In Japanese the section of the F0 contour which does not include the BPM, which we will refer to as the BODY (we treat %L, H-, H*þL, and L% collectively as the BODY), is largely determined by the lexical specifications of the words in the phrase and thus any register-induced pitch-range expansion in the BODY is expected to be of minimum magnitude for this language. This prediction is consistent with the findings mentioned above from tone languages such as Chinese and Thai, in which lexically specified tones are densely distributed in utterances and registerinduced pitch-range expansion has been shown to be significantly smaller than that in English (Grieser and Kuhl, 1988; Kitamura et al., 2002; Papousek et al., 1991). Thus, the BODY in Japanese patterns with utterances in tone languages with respect to flexibility in varying F0 contours.

We now turn to the second part of our hypothesis. It concerns the effects of pitch accents and accompanied downstep and anticipatory raising that enlarges the pitch range of the BODY.

One of the effects lexical pitch accent has on pitch range is that it lowers the minimum point of the F0 contour and thus enlarges the overall pitch range of a single AP. As can be seen from Fig. 1, the pitch range of the accented AP uma’i ame (left) is larger than that of an unaccented AP amai ame (right). Downstep is also responsible for the enlargement of the pitch range. Although the primary consequence of downstep is the lowering of the F0 peaks (Hor H*þL) of APs, as shown in Fig. 3, F0 valleys (L%) are also lowered, although to a lesser degree. Because of this lowering of F0 valleys, the pitch range of the IP overall becomes larger every time the accentual fall occurs. Finally, when anticipatory raising occurs in association with downstep, it may also impact the pitch-range expansion. As the number of pitch accents in IP increases, the F0 peak of the initial AP becomes higher and consequently the pitch range of the IP is expected to grow. It is important to note here that pitch-range expansion brought about by factors associated with lexical pitch accent occurs even in the most idealized speech, and therefore intonation is not “exaggerated” here.

../../_images/fig51.png

注釈

FIG. 5.

Finite-state grammars for English (top) and Japanese (bottom) intonational tunes.

The crucial prediction of this hypothesis is that the effects associated with pitch accents on pitch ranges should be larger in ADS than IDS. It is well documented that utterances in ADS in general contain more words and are longer in duration than those in IDS (Fernald, 1992; Fernald et al., 1989; Grieser and Kuhl, 1988; Newport et al., 1977; Snow, 1977, among others). Assuming that the proportion of accented to unaccented words is not significantly larger in IDS than in ADS (an assumption that is borne out, as discussed in Sec. IV D), ADS utterances, which in general contain more words than IDS utterances, should on average contain more pitch accents than IDS utterances. This leads to the prediction that ADS utterances should have a larger average pitch range of BODY than IDS, if there is no register-induced pitch-range expansion in the IDS utterance. Or, said differently, when we compare utterances in IDS and ADS with the same length, we should find the same pitch range if there is no intonational exaggeration in IDS. If, on the other hand, there is intonational exaggeration, IDS utterances should have larger pitch ranges than ADS utterances of equivalent length.2

Finally, we will discuss the third part of the hypothesis. The pitch-range expansion effect that lexical pitch accents have on the BODY should be larger in the long utterances characteristic of ADS than in the short utterances characteristic of IDS. This effect could thus superficially mask intonational exaggeration that should be observed in BPMs of IDS, if these two phonological entities, pitch accent and BPM, are not taken into consideration. In order to find intonational exaggeration in Japanese, it is therefore necessary first to analyze the BODY and BPMs separately, and second to compare utterances of the same length between ADS and IDS.

III.DATA: JAPANESE IDS CORPUS

A.Participants

Twenty-two mothers [age 25–43, average age 33.0, standard deviation (SD) 6 3.6] and their children participated in the recording of the RIKEN Japanese MotherInfant Conversation Corpus (R-JMICC) (Mazuka et al. 2006). All the mothers were from Tokyo or its adjacent prefectures, and were native speakers of Standard Japanese. The children’s ages ranged from 18 to 24 months (average 20.4, SD 6 2.7 months, 10 girls). The data from one mother Igarashi et al.: Exaggerated prosody in infant-directed speech

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56

was excluded from the analyses of the present paper

because of difficulty in F0 extraction due to her overly creaky phonation.

B.Recordings

Mother-child dyads were brought into a soundattenuated room at the Laboratory of Language Development, RIKEN Brain Science Institute, Japan. A head-mounted dynamic microphone was used to record each mother’s speech. These audio recordings were saved onto digital audio tapes (16 bits, 41 kHz). Three tasks were involved in the recordings. The mother was first asked to play with their infant with a number of picture books for approximately 15 min. After 15 min, the books were removed, and replaced with a set of toys. The mother was asked to play with them for an additional 15 min. Afterwards, a female experimenter, aged 33, who was also a mother of a girl of similar age, entered the room and talked with the mother for about 15 min. The topic of the conversation was not specified in advance, but the conversation tended to focus on topics related to child-rearing. Approximately 45 min of recording per mother-child dyad were thus made. The final sample consisted of a total of three hours of ADS (approximately 24 000 words) and eight hours of IDS (47 000 words) from 21 mothers.

C.Linguistic annotations in the R-JMICC

The R-JMICC contains various linguistic annotations including segmental, morphological and intonational labels (Mazuka et al., 2006). Segmental and intonational labeling was undertaken using the PRAAT software (Boersma and Weenink, 2006). The basic coding schema for R-JMICC was adopted from the criteria used in the Corpus of Spontaneous Japanese (CSJ), a large-scale annotated speech database of Japanese spontaneous speech (Maekawa, 2003). Segmental labels, representing types and duration of vowels and consonants, were time-locked to the speech signals. Morphological labels used in the R-JMICC provide information about boundaries of words, their part of speech, and their conjugation. Consistent with the criteria of the CSJ, two types of word-sized units, short unit word (SUW) and long unit word (LUW) were identified. Each SUW is either a mono-morphemic word or else made up of two consecutive morphemes, and is identical to or close to items listed in ordinary Japanese dictionaries. LUWs, on the other hand, are compounds. The SUW and LUW categories are both defined independently of prosodic cues. The analyses in this study exploited SUW but not LUW in measuring the length of various prosodic units, since LUW does not always constitute a hierarchical structure with the other prosodic units. Intonational labeling was based on the X-JToBI scheme (Maekawa et al., 2002), which provides, among other things, information on two levels of prosodic phrasing (AP and IP), lexical pitch accents, and BPMs, as discussed in Sec. II. The labeling was done by three trained phoneticians, including the first author (a highly experienced X-JToBI labeler). To J. Acoust. Soc. Am., Vol. 134, No. 2, August 2013

ensure reliability, the labeling of the entire corpus was double-checked by the first author.3

D.Measurements

For the purpose of the present study, an utterance is defined as an IP or a sequence of IPs followed by a pause longer than 200 ms, following the coding scheme developed for the CSJ (Maekawa et al., 2002). Henceforth, we will refer to this operationally defined unit as an Utterance, with the first letter capitalized. The utterance in a general sense, which may be described as a “stretch of speech preceded and followed by silence or a change of speaker” (Crystal, 2008: 505), is written without a capital letter. Note that in the coding schema of X-JToBI, the highest prosodic unit is IP, and the Utterance is not independently defined. For the measurement of duration of various prosodic units such as AP and IP, we used the segmental labels. For the second analysis (Sec. IV B), we first identified where the BPM occurs and then divided the F0 contours into BODY and BPM parts. The occurrence/non-occurrence of a BPM and its temporal location was detected by means of the intonation labels and segmental labels of X-JToBI. In R-JMICC, the intonation labels for BPMs, such as H%, LH%, and HL%, were aligned exactly with the end of the AP in which the BPM occurred. The ending time of the BPM was thus identified as the offset of the AP accompanying a BPM label. The starting time of the BPM, on the other hand, was identified generally as the onset of the final mora of the AP containing the BPM label, because BPMs usually start from the onset of the AP-final mora. A BPM can also start at the penultimate mora. When this occurred, it was coded using X-JToBI labels, and the beginning of the penultimate mora of the AP was identified as the onset of the BPM. Once the starting and ending times of each BPM were determined, we measured the mean, maximum, minimum, and range of F0 between the two temporal points. In order to examine pitch modification during the BODY section, we first deleted the F0 points between the starting and ending times of the BPM as defined above. We then measured the mean, maximum, minimum, and range of the modified F0 contours of both Utterance and IP.

IV.ANALYSIS =============================-

A.Introduction

The analysis was carried out in three steps. First, in analysis 1, we examined pitch-range modification using the same methodology as in Fernald et al. (1989). Second, in analysis 2, we examined the pitch ranges of the BPMs. Third, in analysis 3, we investigated the pitch ranges of the BODY, controlling for its length. As discussed above, the IP is the relevant domain for intonational phenomena such as downstep, pitch reset, and pitch-range expansion. In the previous studies that investigated IDS prosody, however, utterances have been used as the unit for measurements. In the present paper, the calculations are carried out using both Utterances and IPs as units of reference, and the same pattern of results are obtained Igarashi et al.: Exaggerated prosody in infant-directed speech

1289

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56

TABLE I. Pitch modification in BPM. Standard deviation (SD) in

parentheses. ADS Mean (Hz)

Max (Hz)

Min (Hz)

Range (st)

ADS > IDS

IDS

a

H%

210.04

(17.02)

263.06

(23.93)

LH%

209.30

(34.26)

237.67

(23.23)

a

HL%

206.33

(19.00)

234.88

(27.49)

b

H%

224.15

(18.61)

289.87

(26.23)

a

LH%

242.65

(41.90)

311.78

(37.78)

a

HL%

224.77

(21.27)

261.62

(32.62)

a

H%

196.05

(15.84)

237.60

(21.90)

a

LH% HL%

191.33 181.60

(28.28) (17.17)

202.17 195.43

(19.86) (22.75)

a

H%

2.25

(0.57)

3.46

(0.60)

a

n.s.

LH%

3.91

(1.01)

7.37

(1.62)

b

HL%

3.54

(0.76)

4.92

(1.42)

a

a

IDS was significantly higher (larger) than ADS at less than 0.1% in the post-hoc comparisons of means with Bonferroni correction. b IDS was significantly higher (larger) than ADS at less than 1% in the posthoc comparisons of means with Bonferroni correction.

for both analyses. For analysis 1, we report the results based on Utterances, so that our results can be compared directly with previous studies. For other analyses, we report the results based on IPs. In all the statistical analyses, differences were considered significant when p < 0.05. Unless otherwise stated, all of the post hoc tests reported in this paper used the Bonferroni method to control for multiple comparisons, and they are significant at least at p < 0.05 level.

B.Analysis 1: Replication of Fernald et al. (1989)

The first analysis measured the F0 maximum, mean, minimum, and range of Utterance as a whole. Following the convention in IDS pitch studies (cf. Fernald et al., 1989; Kitamura et al., 2002), we report mean, maximum, and minimum F0 in Hz, while we report pitch ranges in semitones (st). Results of paired t-tests revealed that although IDS showed a significantly higher mean F0 [t(20) ¼ 7.60, p < 0.001], maximum F0 [t(20) ¼ 5.46, p < 0.001], and minimum F0 [t(20) ¼ 5.46, p < 0.001] for overall Utterances, the F0 range did not differ significantly between the two registers [t(20) ¼ 0.27, not significant (n.s.)]. These results replicate the findings in Fernald et al. (1989); that is, mothers used a higher pitched voice when talking to infants than to adults, but did not alter their overall pitch range. Moreover, all averages were comparable to those reported in Fernald et al. (1989).

C.Analysis 2: Measurement of BPM

In this analysis, we analyzed the pitch modification during the BPM sections.4 Table I shows means for the F0 mean, maximum, minimum (Hz) and F0 range (st) of each BPM type. F0 mean, maximum, minimum (Hz), and F0 range (st) data were separately submitted to 2 Â 3 repeated measure analyses of variance (ANOVAs), with register (ADS and IDS) and BPM types (H%, LH%, and HL%) as within-subjects factors. As shown in Table II, there was a significant main effect of register, with IDS averages generally higher than ADS averages for each acoustic dimension. Importantly, the pitch ranges in IDS were significantly larger than those of ADS.5 Note that the pitch range expansion here cannot be accounted for in terms of the slower speech rate in IDS, as the duration of syllables with BPM in IDS were not any longer than that of ADS.6 An example of pitch-range expansion at a BPM (LH%) is shown in Fig. 6.

D.Analysis 3: Measurement of the BODY as a function of its length

Utterances and IPs were longer in ADS than IDS no matter how we measured them.7 Thus we examined the pitch range of ADS and IDS BODY, controlling for the length of the IP. Of the various measures, the numbers of pitch accents per IP was chosen for this analysis, because, as the IP is the domain of pitch range specification, i.e. the domain of downstep (see Table III). Since very few of the IPs in IDS contained four or more accents, analyses are constrained to IPs with three or fewer accents.8 We carried out a series of four repeated measure twoway ANOVAs using F0 mean, maximum, minimum, and F0 range as dependent variables. Register (ADS and IDS) and the number of accents within an IP (0, 1, 2, or 3) were the within-subject variables. First, the results of the ANOVA using mean F0 as the dependent variable revealed a significant main effect of register [F (1, 20) ¼ 43.43, p < 0.001], a significant main effect of the number of accents [F (3, 60) ¼ 255.04, p < 0.001], and a significant interaction [F (3, 60) ¼ 14.04, p < 0.001]. Post hoc tests revealed that the F0 mean was significantly higher in IDS than ADS in every condition except three-accent IPs. This shows that the mean F0 tends to decrease as the number of accents within the IP increases, and also that the mean F0 is in general higher in IDS than ADS. Second, for maximum F0, the results revealed a significant main effect of register [F (1, 20) ¼ 22.50, p < 0.001]

TABLE II. Results of ANOVAs for pitch modification in BPMs. “Register * BPM” means an interaction between Register and BPM.

Mean (Hz) Maximum (Hz) Minimum (Hz) Range (st)

1290

Register

BPM

Register * BPM

F (1, 18) ¼ 71.60, p < 0.001 F (1, 18) ¼ 89.91, p < 0.001 F (1, 18) ¼ 47.74, p < 0.001 F (1, 18) ¼ 69.18, p < 0.001

F (2, 18) ¼ 6.46, p < 0.01 F (2, 18) ¼ 15.22, p < 0.001 F (2, 18) ¼ 26.84, p < 0.001 F (2, 18) ¼ 74.84, p < 0.001

F (2, 36) ¼ 0.00, p < 0.001 F (2, 36) ¼ 7.01, p < 0.001 F (2, 36) ¼ 12.55, p < 0.001 F (2, 36) ¼ 30.25, p < 0.001

  1. Acoust. Soc. Am., Vol. 134, No. 2, August 2013

Igarashi et al.: Exaggerated prosody in infant-directed speech

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56

FIG. 6. Waveforms and F0 contours of

utterances with LH% BPMs. ADS Bu’ranko to ka? “Things like a seat swing?” (left) and IDS A’nyo syuru no? “Will you walk?” (right), produced by the same mother. The BPM is marked by a rectangle. Apostrophes indicate the locations of pitch accents.

and a significant main effect of the number of accents [F (3, 60) ¼ 78.86, p < 0.001], but not a significant interaction [F (3, 60) ¼ 1.09, n.s.]. They showed that as the number of accents increased, the maximum F0 also increased. This confirms the presence of anticipatory raising effect of downstep (Laniran and Clements, 2003) in Japanese. Third, the ANOVA with minimum F0 as the dependent variable showed a significant main effect of register [F (1, 20) ¼ 43.43, p < 0.001], a significant main effect of the number of accents [F (3, 60) ¼ 255.04, p < 0.001], and a significant interaction between the two [F (3, 60) ¼ 14.04, p < 0.001]. Post hoc tests revealed that the F0 minimum was significantly higher in IDS than ADS in every condition except threeaccent IPs. This shows that minimum F0 is higher in IDS than in ADS in almost all conditions, and it decreases as the number of accents increases.

TABLE III. Pitch modifications of IPs sorted by the number of pitch accents within the IP. SD in parentheses. No. of Accents within an IP Mean (Hz)

ADS

IDS

Fourth, the ANOVA with F0 range as the dependent variable revealed no significant main effect of register [F (1, 20) ¼ 1.31, n.s.]. There was, however, a significant main effect of the number of accents [F (3, 60) ¼ 255.04, p < 0.001]. An interaction between the two factors was not significant [F (3, 60) ¼ 14.04, n.s.]. Post hoc tests revealed that the F0 range in IDS was significantly larger than in ADS for one-, two-, and three-accent IPs. However, the difference was not significant when the IP contained no accent. The results showed that the F0 range of the IP BODY is determined predominantly by the number of accents within it, while the contribution of the register was not significant. The larger pitch range in the ADS BODY is not attributable to the more frequent occurrence of accented words in ADS. In fact, a significantly smaller proportion of ADS words were accented than IDS words.9 In summary, the results of these analyses showed that the pitch range of the BODY becomes larger as the number of accents within an IP increases. This is true in both ADS and IDS. When the effects of this length-induced pitch-range expansion were factored out, we found that the pitch range of the BODY in ADS is not larger than in IDS.

ADS < IDS

0

230.06 (19.57) 256.30 (24.08)

a

1

225.34 (16.71) 254.73 (27.23)

a

2 3

221.50 (16.95) 245.34 (22.83) 227.89 (24.64) 236.22 (24.01)

0

247.67 (21.45) 277.47 (27.90)

a

1

259.74 (19.32) 295.10 (35.54)

a

2 3

288.18 (26.59) 321.87 (36.09) 317.14 (41.92) 338.47 (49.43)

a

0

209.04 (17.41) 231.73 (19.83)

a

1

187.54 (15.32) 209.63 (18.39)

a

2 3 0 1 2 3

170.95 167.25 2.86 5.47 8.69 10.68

a

V.DISCUSSION AND CONCLUSION

a IDS was significantly higher (larger) than ADS at less than 0.1% in the post-hoc comparisons of means with Bonferroni correction.

The present study investigated the dynamic aspects of a language’s intonation by examining pitch-range expansion, or intonational exaggeration, in IDS in Japanese, which has been reported to be absent in this language (Fernald et al., 1989). We found that (1) when measured as the difference between maximum and minimum F0 of whole utterances, Japanese IDS showed no pitch-range expansion, replicating the findings of Fernald et al. (1989); (2) while pitch ranges at the locations of BPMs were significantly expanded in IDS, pitch ranges for the rest of the utterance, which the paper calls BODY, were larger in ADS than IDS; (3) the pitch range for BODY is most strongly determined by its length (i.e., the number of pitch accents it contains), and once length is accounted for, the pitch range of the BODY in ADS is in fact no larger than that of IDS. On the basis of these findings, we argue that Japanese IDS does show register-induced pitch-range expansion.

  1. Acoust. Soc. Am., Vol. 134, No. 2, August 2013

Igarashi et al.: Exaggerated prosody in infant-directed speech

Max (Hz)

Min (Hz)

Range (st)

(14.26) 183.76 (15.42) (17.66) 170.23 (18.81) (0.54) 3.01 (0.68) (0.69) 5.75 (1.06) (1.24) 9.48 (1.50) (2.09) 11.41 (2.72)

a

n.s.

n.s.

n.s. n.s. n.s. n.s. n.s.

1291

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56

First, we found robust pitch-range expansion at the locations of BPMs, which occur more frequently in IDS than ADS. As discussed in Sec. II, BPMs are tones that are associated with pragmatic interpretations (Venditti et al., 2008), and IDS utterances containing BPMs often involve mothers’ attempt to engage infants by addressing questions to them or seeking agreement by using the sentence-final particle ne (cf. Fernald and Morikawa, 1993). Crucially, when BPMs occur in IDS, each of the tones is produced with an expanded pitch range. This type of pitch-range expansion is likely to be heard as “exaggerated” by listeners and may account for the results of previous studies showing that the intonation of Japanese IDS is perceived to be exaggerated by adults (Horie et al., 2008) and that Japanese IDS is preferred over ADS by infants (Hayashi et al., 2001). Second, as shown in the third analysis (Sec. IV D), the length-induced pitch-range expansion in the ADS BODY is not an “exaggeration” of the intonation. The pitch range of an intonation phrase (IP) is determined primarily by the number of accents the IP contains—the longer the IP (and thus the more accented words it contains), the larger the pitch range. This is independent of register-induced pitchrange modification and occurs regardless of the difference between ADS and IDS. When a speaker produces a long utterance with a pitch range that is normal for its length, she/he is under no pressure to “exaggerate” the intonation, nor is it likely to be heard as “exaggerated” by a listener. In fact, when ADS and IDS utterances of equal length were compared, there was no difference in pitch ranges between the two registers. These results highlight the usefulness of an intonational phonological framework to describe how intonation is modified in a specialized speech register. Our analysis has shown that pitch-range expansion in Japanese IDS has previously been overlooked, because no reference has been made in the prior scholarly literature to phonological events specific to Japanese—specifically, BPM and lexical pitch accents, as well as utterance length and the interactions between these factors. We do not mean to argue that pitch-range expansion in IDS is not universal. On the contrary, our findings provide additional support to the view that it is. Our findings are novel in that they successfully demonstrate that not all phonological tones are subject to the paralinguistic modification characteristic of a specialized speech register. Specifically, our analysis suggested that pitch-range expansions in IDS are not realized in the same way in every language, but are instead implemented within a languagespecific system of intonation. When there is a desire or pressure to exaggerate the intonation, speakers seem to do so by expanding the pitch range at the location where flexibility in varying contours is most tolerated. In phonological terms, this is the location where pragmatically chosen tones are realized. In the case of Japanese, these are BPMs at the boundaries of prosodic phrases, while in the case of English, they are not only phrase accents and boundary tones at the phrasal boundaries, but pitch accents at the locations of stressed syllables. It has been commonly assumed that paralinguistic pitch-range modifications (which should include intonational exaggeration in IDS) can occur globally 1292

  1. Acoust. Soc. Am., Vol. 134, No. 2, August 2013

irrespective of what tones are present in the utterance [cf. Ladd (1996), Chap. 7]. The present results, however, showed that only certain tones can undergo paralinguistic modifications. Our study, therefore, promises to shed light on the phonetics of pitch range variation. One implication of the present study is that crosslinguistic differences in IDS intonation may be better captured by re-examining them with reference to the intonation system of each language. The intonational exaggeration in Japanese is camouflaged by the pitch range of BODY that increases linearly as the number of pitch accents in an IP increases. In English, by contrast, such an increase is not expected, and thus the register-induced exaggeration can be captured straightforwardly by the conventional, phonologically uninformed, method of measuring the pitch range—the maximum minus the minimum F0 of an entire utterance. The same method, however, is not sufficient to capture the two competing forms of pitch-range expansion in Japanese. This leads us to speculate that the magnitude of intonational exaggeration in some tone languages (Thai and Chinese) is in fact larger than what has previously been reported. In these languages, the pitch range in IDS are generally reported to be smaller (e.g., 6 to 7 semitones; Grieser and Kuhl, 1988; Kitamura et al., 2002; Papousek et al., 1991) than in English and other Germanic and Romance languages (10 to 12 semitones; Fernald et al., 1989; Kitamura et al., 2002). Examining the intonation of these languages with reference to their intonation systems may allow us to better understand the specific ways these languages modulate their intonation. It might provide a clue as to why the intonation of Mayan-Quiche speaking mothers do not show the typical pitch characteristics of IDS (Bernstein Ratner and Pye, 1984; Ingram, 1995). At the same time, our data robustly showed that Japanese IDS is produced with higher pitch than ADS. This is consistent with many previous studies of English and other Germanic/Romance languages (Fernald, 1989; Fernald et al., 1989), Japanese (Fernald et al., 1989) as well as tone languages (Papousek et al., 1991; Liu et al., 2007; Kitamura et al., 2002). It has sometimes been argued that the speech modulation seen in IDS may be driven by biological factors common across species, and that both adults’ tendency to produce higher-pitched speech when addressing infants and infants’ preference for higher-pitched voices may be part of a biologically driven phenomenon (Morton, 1977; Trainer and Zacharias, 1998). Thus, the results of the present study help show that there is a complex interplay of universal and language-specific factors that contribute to the pitch modulation that is characteristic of IDS intonation.

ACKNOWLEDGMENTS

We thank the children and mothers for their participation in the recordings of IDS and ADS used in the present study. We also thank Akira Utsugi, Kikuo Maekawa, and Andrew Martin for their helpful comments. The study reported in this paper was supported in part by a Japanese government Grant-in-Aid for Young Scientists B #23720207 to Y.I. and a Grant-in-Aid for Scientific Research A#21610028 to R.M. Igarashi et al.: Exaggerated prosody in infant-directed speech

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56

  • Bernstein Ratner, N., and Pye, C. (1984). “Higher pitch in BT is not universal: Acoustic evidence from Quiche Mayan,” J. Child Lang. 11(3), 515–522.

Boersma, P., and Weenink, D. (2006). “Praat, a system for doing phonetics by computer,” Glot International, Vol. 5, pp. 341–345. Bornstein, M. H., Tamis-LeMonda. C. S., Tal, J., Ludemann, P., Toda, S., Rahn, C. W., P^echeux, M. G., Azuma, H., and Vardi, D. (1992). “Maternal responsiveness to infants in three societies: the United States, France, and Japan,” Child Dev. 63(4), 808–821. Cooper, R. P., Abraham, J., Berman, S., and Staska, M. (1997). “The development of infants’ preference for motherese,” Infant Behav. Dev. 20(4), 477–488. Crystal, D. (2008). A Dictionary of Linguistics and Phonetics, 6th ed. (Blackwell Publishing, Malden, MA). Ferguson, C. A. (1977). “Baby talk as a simplified register,” in Talking to Children: Language Input and Acquisition, edited by C. E. Snow and C. A. Ferguson (Cambridge University Press, London), pp. 209–235. Fernald, A. (1989). “Intonation and communicative intent in mothers’ speech to infants: Is the melody the message?,” Child Dev. 60, 1497–1510. Fernald, A. (1992). “Meaningful melodies in mothers’ speech to infants,” in Nonverbal Vocal Communication: Comparative and Developmental Approaches, edited by H. Papousek, U. Jurgens, and M. Papousek (Cambridge University Press, Cambridge), pp. 262–282. Fernald, A. (1993). “Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages,” Child Dev. 64, 657–674. Fernald, A., and Kuhl, P. K. (1987). “Acoustic determinants of infant preference for motherese speech,” Infant Behav. Develop. 10, 279–293. Fernald, A., and Morikawa, H. (1993). “Common themes and cultural variations in Japanese and American mothers’ speech to infants,” Child Dev. 64, 637–656. Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., and Fukui, I. (1989). “A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants,” J. Child Lang. 16, 477–501. Grieser, D. L., and Kuhl, P. K. (1988). “Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese,” Dev. Psychol. 24(1), 14–20. Gussenhoven, C. (2004). The Phonology of Tone and Intonation (Cambridge University Press, Cambridge). Hayashi, A. (2004). “Zen-gengoki no onsei chikaku hattatsu to gengoshuutoku ni kansuru jikken-teki kenkyuu” (“An experimental study of preverbal infants’ speech perception and language acquisition”), Doctoral Dissertation, Tokyo Gakugei University, Tokyo, Japan. Hayashi, A., Tameka, Y., and Kiritani, S. (2001) “Developmental change in auditory preferences for speech stimuli in Japanese infants,” J. Speech Lang. Hear. Res. 44, 1189–1200. Horie, R., Hayashi, A., Shirasawa, K., and Mazuka, R. (2008). “Mother, I don’t really like the high-pitched, slow speech of Motherese: Crosslinguistic differences in infants’ reliance on different acoustic cues in infant directed speech,” XVIth Biennial International Conference on Infant Studies, Vancouver, Canada. Ingram, D. (1995). “The cultural basis of prosodic modifications to infants and children: A response to Fernald’s universalist theory,” J. Child Lang. 22(1), 223–233. Inoue, T., Nakagawa, R., Kondou, M., Koga, T., and Shinohara, K. (2011).“Discrimination between mothers’ infant- and adult-directed speech using hidden Markov models,” Neurosci. Res. 70, 62–70. Jun, S.-A. (2005). Prosodic Typology: The Phonology of Intonation and Phrasing (Oxford University Press, New York). Kitamura, C., Thanavishuth, C., Burnham, D., and Luksaneeyanawin, S. (2002). “Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and non-tonal language,” Infant Behav. Dev. 24, 372–392. Kubozono, H. (1993). The Organization of Japanese Prosody (Kurosio Publishers, Tokyo). Ladd, D. R. (1996). Intonational Phonology (Cambridge University Press, Cambridge). Laniran, Y. O., and Clements, G. N. (2003). “Downstep and high raising:Interacting factors in Yoruba tone production,” J. Phonetics 31, 203–250. Liu, H.-M., Tsao, F.-M., and Kuhl, P. K. (2007). “Acoustic analysis of lexical tone in Mandarin infant-directed speech,” Dev. Psychol. 43(4),912–917. Maekawa, K. (2003). “Corpus of Spontaneous Japanese: Its design and evaluation,” in Proceedings of ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 7–12. Maekawa, K., Kikuchi, H., Igarashi, Y., and Venditti, J. (2002). “X-JToBI:An extended J_ToBI for spontaneous speech,” in Proceedings of the 7th International Conference on Spoken Language Processing, Denver, CO, pp. 1545–1548. Mazuka, R., Igarashi, Y., and Nishikawa, K. (2006). “Input for learning Japanese: RIKEN Japanese Mother-Infant Conversation Corpus,” Technical report of IEICE, TL2006-16 106(165), pp. 11–15. McCann, J., and Peppe, S. (2003). “Prosody in autism spectrum disorders: A critical review,” Int. J. Lang. Commun. Disord. 38, 325–350. Morton, E. S. (1977). “One the occurrence and significance of motivationstructural rules in some bird and mammal sounds,” Am. Nat. 111(981), 855–869. Newman, R. S., and Hussain, I. (2006). “Changes in preference for infantdirected speech in low and moderate noise by 5- to 13-month-olds,” Infancy 10(1), 61–76. Newport, E., Gleitman, H., and Gleitman, L. (1977) “Mother, I’d rather do it myself: Some effects and non-effects of maternal speech style,” in Talking to Children: Language Input and Acquisition, edited by C. E. Snow and C. A. Ferguson (Cambridge University Press, London), pp. 109–150. Papousek, M., Papousek, H., and Symmes, D. (1991). “The meanings of melodies in motherese in tone and stress languages,” Infant Behav. Dev. 14(4), 415–440. Pegg, J. E., Werker, J. F., and McLeod, P. J. (1992). “Preference for infantdirected over adult-directed speech: Evidence from 7-week-old infants,” Infant Behav. Dev. 15, 325–345. Pierrehumbert, J. (1980). “The phonology and phonetics of English intonation,” Ph.D. dissertation, MIT. Pierrehumbert, J., and Beckman, M. E. (1988). Japanese Tone Structure (MIT Press, Cambridge). Pierrehumbert, J., and Hirschberg J. (1990). “The meaning of intonational contours in the interpretation of discourse,” in Intentions in

  1. Acoust. Soc. Am., Vol. 134, No. 2, August 2013

Igarashi et al.: Exaggerated prosody in infant-directed speech

1

Various linguistic factors bring about pitch reset at the IP boundary; these include syntactic constituency and focus (Pierrehumbert and Beckman, 1988; Selkirk and Tateishi, 1991; Kubozono, 1993). In the case of the utterances in Figs. 2 and 3, pitch reset was induced by the focus indicated by wh-element da’re “who.” 2 We do not mean to argue that the pitch range of English (as well as arguably those of many other languages) is not influenced by downstep and anticipatory raising. English utterances can have staircase-like contours, such as one having multiple H*þL pitch accents. The contour of such an utterance would resemble the Japanese one in Fig. 3. In this case, a large pitch range would be expected due to the progressive lowering of the F0 bottoms and the anticipatory raising of the peaks. The difference is that these contours constitute only a small part of the large inventory of contours in English, and the majority of utterances, which have other types of contours, are not subject to downstep or downstep-induced pitch-range expansion. Consequently, simply because utterances are longer in ADS in English and other languages, their pitch ranges are not expected to increase to the same degree as those in Japanese. 3 See Supplementary Material 1 at [http://lang-dev-lab.brain.riken.jp/igarashi-jasa-suppl.html], for inter-labeler reliability with X-JToBI. 4 See Supplementary Material 2 at [http://lang-dev-lab.brain.riken.jp/igarashi-jasa-suppl.html] for the frequency each BPM type. 5 See Supplementary Material 3 at [http://lang-dev-lab.brain.riken.jp/igarashi-jasa-suppl.html] for post-hoc comparisons of ANOVAs. 6 See Supplementary Material 4 at [http://lang-dev-lab.brain.riken.jp/igarashi-jasa-suppl.html] for duration of morae which bear BPMs. 7 We also analyzed the pitch characteristics of the BODY regardless of its length. The results revealed that although the pitch of IDS is higher than ADS, the pitch ranges of the BODY are larger in ADS than in IDS. See Supplementary Material 5 at http://lang-dev-lab.brain.riken.jp/igarashijasa-suppl.html for details. The larger pitch range in ADS is accounted for by the effect of the length of the utterances, as shown in Sec. IV D. 8 See Supplementary Material 5 at [http://lang-dev-lab.brain.riken.jp/igarashi-jasa-suppl.html] for analyses of the average length of Utterances and Intonation Phrases based on several measures. 9 See Supplementary Material 6 at [http://lang-dev-lab.brain.riken.jp/igarashi-jasa-suppl.html] for proportion of accented versus unaccented words.

1293

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56

Communication, edited by P. Cohen, J. Morgan, and M. Pollack (MIT Press, Cambridge, MA), pp. 271–311. Selkirk, E. O., and Tateishi, K. (1991). “Syntax and downstep in Japanese,” in Interdisciplinary Approaches to Language: Essays in Honor of S.-Y. Kuroda, edited by C. Georgopoulos and R. Ishihara (Kluwer Academic, Dordrecht), pp. 519–543. Snow, C. E. (1977). “Mothers’ speech research: From input to interaction,” in Talking to Children: Language Input and Acquisition, edited by C. E. Snow and C. A. Ferguson (Cambridge University Press, London), pp. 31–49. Soderstrom, M. (2007). “Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants,” Dev. Rev. 27, 501–532. Toda, S., Fogel, A., and Kawai, M. (1990). “Maternal speech to three-monthold infants in the United States and Japan,” J. Child Lang. 17(2), 279–294.

1294

  1. Acoust. Soc. Am., Vol. 134, No. 2, August 2013

Trainer, L. J., and Zacharias, C. A. (1998). “Infants prefer higher-pitched singing,” Infant Behav. Dev. 21(4), 799–806. Venditti, J. (2005). “The J_ToBI model of Japanese intonation,” in Prosodic Typology: The Phonology of Intonation and Phrasing, edited by S. A. Jun (Oxford University Press, New York), pp. 172–200. Venditti, J., Maekawa, K., and Beckman, M. E. (2008). “Prominence marking in the Japanese intonation system,” in Handbook of Japanese Linguistics, edited by S. Miyagawa and M. Saito (Oxford University Press, New York), pp. 456–512. Ward, G., and Hirschberg, J. (1985). “Implicating uncertainty: The pragmatics of fall-rise,” Language 61, 747–776. Williams, C. E., and Stevens, K. N. (1972). “Emotions and Speech: Some acoustical correlates,” J. Acoust. Soc. Am. 52, 1238–1250.

Igarashi et al.: Exaggerated prosody in infant-directed speech

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 133.9.93.26 On: Fri, 07 Feb 2014 07:08:56