Lesley M. Carmichael

Situation-Based Intonation Pattern Distribution in a Corpus of American English
Ph.D. dissertation, 2005

Abstract
   Intonation varies considerably in spoken language, making it difficult to characterize consistently and thoroughly. It is equally difficult to generate contextually appropriate intonation for synthetic speech. This dissertation examines intonational variation in different speech situations and demonstrates that the distributions of intonation features and patterns vary systematically with the situational context, or register domain.
   More than 9,000 utterances were annotated with ToBI labels indicating pitch accents, intermediate phrases, and boundary tones. The distributional characteristics of eight intonation variables were analyzed for systematic variation corresponding to register domain.

  1. Boundary tone
  2. Initial pitch accent tone
  3. Simple vs. complex initial pitch accent
  4. Pitch frame
  5. Phrase offset contour
  6. Pitch accent quantity
  7. Intermediate phrase quantity
  8. Tone contour type and token
The register domains were evaluated as a group and in pairs on each intonational measure. A significant effect for register was found for every measure at the group level and in more than half of the register domain pairs. These results confirm for intonation what has already been demonstrated through analyses of lexical and grammatical characteristics of other aspects of language: a register is distinguished by a constellation of features and their relative distributions.
   One important result is the lack of persistent similarities or differences between register domains. Some register domains systematically differed along several or all dimensions while others behaved similarly along some dimensions and diverged on others. There appears to be no single continuum along which registers can be arranged to explain their complex interrelationships. This last result has implications for orienting the speaking situations themselves and suggests that they are as multidimensional as the linguistic features that characterize them. Another noteworthy finding was evidence of tone selection dependencies at different levels of the corpus. Some dependencies were observed throughout the corpus while others appeared to operate within particular register domains.
   The results of this study overall suggest that a general model of intonation probably glosses over a range of significant situation-based intonational behavior. Fortifying existing (text-based) multidimensional analyses with prosodic features will sharpen our understanding of the relationship between linguistic variability and situational factors.

Microsoft Word document (2.1 MB)
PDF (1.1 MB)
For bibliographical reference, use:
Carmichael, Lesley. (2005). Situation-Based Intonation Pattern Distribution in a Corpus of American English. Ph.D. dissertation, University of Washington.

Prosodic Fortification in Error Resolution
Abstract
   People make changes in their speech along various dimensions when attempting to resolve speech or information recognition errors by a computing device. The prosody of an utterance can be manipulated in a manner analogous to the hyperarticulation of speech previously reported in spoken error resolution (Oviatt, et al. 1998). Prosodic fortification is defined as categorically changing or augmenting the prosody of an utterance. Prosodic fortification strategies include making words prominent at the phrase level, using phrasal boundary elements to group words together, and changing the tonal features associated with a pitch accent or phrase boundary. The present research investigates speakers' use of prosodic fortification strategies to resolve recognition errors occurring when working with a multimodal computing interface. 103 utterance pairs with identical lexical content before and after receiving an error message were subjected to phonological prosodic analysis (using the Tones and Break Indices markup system [Silverman, et al., 1992]). For each of these verbatim matched pairs, the original utterance provided a baseline for evaluating the prosodic events and features of the repeated utterance. The results show that speakers fortify prosodic structure in error resolution using various strategies including: the addition of new pitch accents and breaks, augmenting the complexity of pitch accents, and increasing the strength of breaks. Each verified strategy emerges as an independently robust method for differentiating repeated speech from original input. Future work may investigate a possible weighting of prosodic fortification strategies to explore their dominance and interaction. Phonetic data-driven methods should also be implemented to better understand prosodic fortification in speech. Implications for speech technologies are discussed, including using prosodic cues to signal a speech change to a hyperarticulated register.

PowerPoint presentation
For bibliographical reference, use:
Carmichael, Lesley. (2004). Prosodic fortification in error resolution. Presented at the Symposium in Computational Linguistics sponsored by the University of Washington Dept. of Linguistics, the UW Dept. of Germanics, and UW alumni at Microsoft, January 23, 2004

Intonation: Categories and Continua
Abstract
   Prosody pervades all aspects of a speech signal, both in terms of raw acoustic outcomes and linguistically meaningful units, from the phoneme to the discourse unit. It is carried in the suprasegmental features of fundamental frequency, loudness, and duration. Several models have been developed to account for the way prosody organizes speech, and they vary widely in terms of their theoretical assumptions, organizational primitives, actual procedures of application to speech, and intended use. In many cases, these models disagree with regard to their fundamental premises or their identification of the perceptible objects of linguistic prosody. One fundamental division among models is whether they evaluate intonation events phonetically or phonologically. Phonetic models deal with continuous acoustic information whereas phonological models view intonation behavior as the outcome of discrete intonation events. Phonetic models thus quantitatively evaluate the movement and transitions of intonation features while phonological models use qualitative descriptions of level tone targets or tonal shapes of intonation events. Another critical distinction among intonation theories and models is their understanding of intonation events as a linear sequence of exclusive components or the result of layered, potentially overlapping elements. Both phonetic and phonological models can also be linear or layered models. While intonation models differ in many ways, the assumptions and application methods of each model implicitly (if not explicity) specify whether it is phonetic or phonological, linear or layered. Axes representing opposing theoretical foundation pairs (phonetic-phonological and linear-layered) can be positioned in a two-dimensional space, creating a grid on which intonation models can be located in relation to one other. In this paper, models of each type (phonetic-linear, phonetic-layered, phonological-linear, phonological-layered) are directly compared. Each model is applied to the same speech samples. These parallel analyses allow for an inspection of each model type and its efficacy in assessing the suprasegmental behavior of the speech. The analyses illustrate how different approaches are better equipped to account for different aspects of prosody. Viewing the models and their successes from an objective perspective allows for creative possibilities in terms of combining strengths from models which might otherwise be considered fundamentally incompatible.

PDF
For bibliographical reference, use:
Carmichael, Lesley. (2003). Intonation: categories and continua. Paper presented at the 19th Northwest Linguistics Conference, March 1-2, 2003, Victoria, BC, Canada.

Developing a Corpus of Spoken Language Variability
Abstract
We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), corrective (hyperarticulated), careful (word list in carrier), Lombard effect (speech in noise), and Motherese (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (maximum, minimum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and stylistic factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.

Poster PDF
For bibliographical reference, use:
Carmichael, Lesley, Richard Wright, and Alicia Beckford Wassink. (2003). Developing a corpus of spoken language variability. Poster presented at the 146th Acoustical Society of America conference, November 10-14, 2003, Austin, TX

Second Language Acquisition of Suprasegmental Phonology
Abstract
New questions need to be asked in second language acquisition (L2A) research: Can L2ers have differential success in acquiring the segmental vs. suprasegmental phonology of an L2? That is, are segmental and suprasegmental phonologies independent aspects of phonological acquisition? If so, are segmental and suprasegmental acquisition necessarily constrained according to the same age-based schedule? The L2A literature shows that work that has already been done indicates a natural division between segmental and suprasegmental phonology in terms of L2A. I propose that L2ers do achieve differential pronunciation success of segmental and suprasegmental components of speech as a factor of the age-related characteristics of their L2A experience. Specifically, suprasegmental acquisition success in an L2 requires an earlier start in life than segmental acquisition success.

Microsoft Word document
For bibliographical reference, use:
Carmichael, Lesley. (2002). Second Language Acquisition of Suprasegmental Phonology. Unpublished ms., University of Washington Department of Linguistics. http://students.washington.edu/lesley/projects.html.

Measurable Degrees of Foreign Accent: A Correlational Study of Perception, Production, and Acquisition.
Abstract
Most people who learn a second language (L2) in late adolescence or adulthood retain some degree of foreign accent. This study investigates whether measurable intra-factor variability in the speech signal correlates with subjective ratings of degree of foreign accentedness, and whether there is an allowable deviation from native speaker norms corresponding to ratings of little or no accent. Native Korean speakers with varying degrees of English L2 proficiency and native English controls provided the stimuli. Tokens each contained one of three differences between Korean and English phonologies, potentially creating opportunities for accented speech: English contrasts between [l] and [?] (American English "r"); between tense and lax vowels; and the presence of consonant clusters. Monolingual native English respondents indicated the degree of foreign accent they perceived. The stimuli recordings were subjected to acoustic analysis. Intra-factor variability measurements correlated with subjective perceptions of degree of foreign accent, implying that listeners are sensitive to specific, measurable variances in the acoustic signal. Consonant cluster avoidance correlated strongly with perceptions of foreign accent, followed by [l]-[?] distance, and then vowel quality. Together with the measurement data, the ratings indicate that native listeners accept ranges of acoustic performance for different degrees of accentedness.

Microsoft Word document
For bibliographical reference, use:
Carmichael, Lesley. (2000). Measurable Degrees of Foreign Accent: A Correlational Study of Perception, Production, and Acquisition. Unpublished M.A. Thesis, University of Washington.

This document may also be referred to as:
Carmichael, Lesley. (2000). Acoustic variability and perceived foreign accent. Poster presented at the 140th meeting of the Acoustical Society of America, Orange County, CA, December, 2000.

The Influence of Paralanguage on Intonational Complexity
Abstract
   Suprasegmental features such as pitch, duration and loudness are used to signal grammatical units and indicate the relationship between linguistic objects and their relative importance. These features (pitch in particular) can be organized to form categorical expressions of linguistic structure. Paralanguage uses these same suprasegmental features to express the situational context, interpersonal interaction among interlocutors, and the affective state of the speaker. Language and paralanguage therefore yield amalgamated acoustic-phonetic outcomes. Intonational taxonomies attempt to organize suprasegmental feature behavior into meaningful linguistic categories, often including contextual references. While it has long been observed that acoustic outcomes such as expanded pitch range and increased loudness signal emotional states such as anger and surprise, a connection between pitch fluctuation and paralanguage has not yet been concretely investigated. A survey of seven languages shows an interesting relationship between tonal shapes (and their temporal distribution) and paralinguistic influence. Simple tones and tunes are typically used for more basic communicative objectives, while complex tone shapes and tunes are typically reserved for more pragmatically and/or paralinguistically complicated messages.

Microsoft Word document
For bibliographical reference, use:
Carmichael, Lesley. (2001). The Influence of Paralanguage on Intonational Complexity. Unpublished ms., University of Washington Department of Linguistics. http://students.washington.edu/lesley/projects.html.

Spontaneous Speech and the Rhythm Rule.
Abstract
   The Rhythm Rule in English, a phonological process to alleviate stress clash (Hayes, 1984), has been primarily investigated in controlled speech. This study examines the Rhythm Rule in spontaneous speech to determine its robustness in a natural, spontaneous intonation environment. Two hypotheses about the effect of intonational phonology on the realization of the Rhythm Rule are considered: Hypothesis 1-The Rhythm Rule is a prominent phonological process and is always indicated by acoustically salient features on the target phrase. Hypothesis 2-The Rhythm Rule is subject to intonation forces and its realization is dependent upon its placement in the intonation contour. Pitch, duration, and amplitude were measured on all syllables potentially involved in the stress clash. Duration proved to be the most consistent acoustic cue to stress placement when evaluating the target phrases for evidence of stress clash resolution via the Rhythm Rule. When the target phrases were of some topical prominence in spontaneous utterances, clash resolution was most commonly achieved by reversing the stress of the first and third syllables of the first word in the target phrase. When the target phrase occurred late in an utterance, the acoustic features of pitch and duration provide some evidence for the accent deletion analysis of the Rhythm Rule. Crucially, this study shows that even when the lexically stressed syllables are not particularly prominent at the utterance level, stress clash is resolved. It is clear that the intonation patterns of English have some effect on the realization of the Rhythm Rule because the rule is not invariantly applied in all intonational contexts, however, these results provide support for Hypothesis 1.

Microsoft Word document
For bibliographical reference, use:
Carmichael, Lesley. (2001). Spontaneous speech and the rhythm rule. Poster presented at the 141st meeting of the Acoustical Society of America, Chicago, IL, June, 2001.

Home

Resume

PDF | Word Document
Email Lesley

Get Adobe Acrobat