Situation-Based Intonation Pattern Distribution in a Corpus of American English
Ph.D. dissertation, 2005
Abstract
Intonation varies considerably in spoken language,
making it difficult to characterize consistently and thoroughly. It is
equally difficult to generate contextually appropriate intonation for
synthetic speech. This dissertation examines intonational variation in
different speech situations and demonstrates that the distributions of
intonation features and patterns vary systematically with the
situational context, or register domain.
More than 9,000 utterances were annotated with ToBI
labels indicating pitch accents, intermediate phrases, and boundary
tones. The distributional characteristics of eight intonation variables
were analyzed for systematic variation corresponding to register
domain.
- Boundary tone
- Initial pitch accent tone
- Simple vs. complex initial pitch accent
- Pitch frame
- Phrase offset contour
- Pitch accent quantity
- Intermediate phrase quantity
- Tone contour type and token
The register domains were evaluated as a
group and in pairs on each intonational measure. A significant effect
for register was found for every measure at the group level and in more
than half of the register domain pairs. These results confirm for
intonation what has already been demonstrated through analyses of
lexical and grammatical characteristics of other aspects of language: a
register is distinguished by a constellation of features and their
relative distributions. One important result is the lack of persistent
similarities or differences between register domains. Some register
domains systematically differed along several or all dimensions while
others behaved similarly along some dimensions and diverged on others.
There appears to be no single continuum along which registers can be
arranged to explain their complex interrelationships. This last result
has implications for orienting the speaking situations themselves and
suggests that they are as multidimensional as the linguistic features
that characterize them. Another noteworthy finding was evidence of tone
selection dependencies at different levels of the corpus. Some
dependencies were observed throughout the corpus while others appeared
to operate within particular register domains. The results of this study overall suggest that a
general model of intonation probably glosses over a range of
significant situation-based intonational behavior. Fortifying existing
(text-based) multidimensional analyses with prosodic features will
sharpen our understanding of the relationship between linguistic
variability and situational factors.
Microsoft Word document (2.1 MB)
PDF (1.1 MB)
For bibliographical reference, use:
Carmichael, Lesley. (2005). Situation-Based Intonation Pattern
Distribution in a Corpus of American English. Ph.D. dissertation,
University of Washington.
Prosodic Fortification in Error Resolution
Abstract
People make changes in their speech along various
dimensions when attempting to resolve speech or information recognition
errors by a computing device. The prosody of an utterance can be
manipulated in a manner analogous to the hyperarticulation of speech
previously reported in spoken error resolution (Oviatt, et al. 1998).
Prosodic fortification is defined as categorically changing or
augmenting the prosody of an utterance. Prosodic fortification
strategies include making words prominent at the phrase level, using
phrasal boundary elements to group words together, and changing the
tonal features associated with a pitch accent or phrase boundary. The
present research investigates speakers' use of prosodic fortification
strategies to resolve recognition errors occurring when working with a
multimodal computing interface. 103 utterance pairs with identical
lexical content before and after receiving an error message were
subjected to phonological prosodic analysis (using the Tones and Break
Indices markup system [Silverman, et al., 1992]). For each of these
verbatim matched pairs, the original utterance provided a baseline for
evaluating the prosodic events and features of the repeated utterance.
The results show that speakers fortify prosodic structure in error
resolution using various strategies including: the addition of new
pitch accents and breaks, augmenting the complexity of pitch accents,
and increasing the strength of breaks. Each verified strategy emerges
as an independently robust method for differentiating repeated speech
from original input. Future work may investigate a possible weighting
of prosodic fortification strategies to explore their dominance and
interaction. Phonetic data-driven methods should also be implemented to
better understand prosodic fortification in speech. Implications for
speech technologies are discussed, including using prosodic cues to
signal a speech change to a hyperarticulated register.
PowerPoint presentation
For bibliographical reference, use:
Carmichael,
Lesley. (2004). Prosodic fortification in error resolution. Presented
at the Symposium in Computational Linguistics sponsored by the
University of Washington Dept. of Linguistics, the UW Dept. of
Germanics, and UW alumni at Microsoft, January 23, 2004
Intonation: Categories and Continua
Abstract
Prosody pervades all aspects of a speech signal, both
in terms of raw acoustic outcomes and linguistically meaningful units,
from the phoneme to the discourse unit. It is carried in the
suprasegmental features of fundamental frequency, loudness, and
duration. Several models have been developed to account for the way
prosody organizes speech, and they vary widely in terms of their
theoretical assumptions, organizational primitives, actual procedures
of application to speech, and intended use. In many cases, these models
disagree with regard to their fundamental premises or their
identification of the perceptible objects of linguistic prosody. One
fundamental division among models is whether they evaluate intonation
events phonetically or phonologically. Phonetic models deal with
continuous acoustic information whereas phonological models view
intonation behavior as the outcome of discrete intonation events.
Phonetic models thus quantitatively evaluate the movement and
transitions of intonation features while phonological models use
qualitative descriptions of level tone targets or tonal shapes of
intonation events. Another critical distinction among intonation
theories and models is their understanding of intonation events as a
linear sequence of exclusive components or the result of layered,
potentially overlapping elements. Both phonetic and phonological models
can also be linear or layered models. While intonation models differ in
many ways, the assumptions and application methods of each model
implicitly (if not explicity) specify whether it is phonetic or
phonological, linear or layered. Axes representing opposing theoretical
foundation pairs (phonetic-phonological and linear-layered) can be
positioned in a two-dimensional space, creating a grid on which
intonation models can be located in relation to one other. In this
paper, models of each type (phonetic-linear, phonetic-layered,
phonological-linear, phonological-layered) are directly compared. Each
model is applied to the same speech samples. These parallel analyses
allow for an inspection of each model type and its efficacy in
assessing the suprasegmental behavior of the speech. The analyses
illustrate how different approaches are better equipped to account for
different aspects of prosody. Viewing the models and their successes
from an objective perspective allows for creative possibilities in
terms of combining strengths from models which might otherwise be
considered fundamentally incompatible.
PDF
For bibliographical reference, use:
Carmichael,
Lesley. (2003). Intonation: categories and continua. Paper presented at
the 19th Northwest Linguistics Conference, March 1-2, 2003, Victoria,
BC, Canada.
Developing a Corpus of Spoken Language Variability
Abstract
We are developing a novel, searchable corpus as a research tool for
investigating phonetic and phonological phenomena across various speech
styles. Five speech styles have been well studied independently in
previous work: reduced (casual), corrective (hyperarticulated), careful
(word list in carrier), Lombard effect (speech in noise), and Motherese
(child-directed speech). Few studies to date have collected a wide
range of styles from a single set of speakers, and fewer yet have
provided publicly available corpora. The pilot corpus includes
recordings of (1) a set of speakers participating in a variety of tasks
designed to elicit the five speech styles, and (2) casual peer
conversations and wordlists to illustrate regional vowels. The data
include high-quality recordings and time-aligned transcriptions linked
to text files that can be queried. Initial measures drawn from the
database provide comparison across speech styles along the following
acoustic dimensions: MLU (changes in unit duration); relative
intra-speaker intensity changes (mean and dynamic range); and
intra-speaker pitch values (maximum, minimum, mean, range). The corpus
design will allow for a variety of analyses requiring control of
demographic and stylistic factors, including hyperarticulation variety,
disfluencies, intonation, discourse analysis, and detailed spectral
measures.
Poster PDF
For bibliographical reference, use:
Carmichael,
Lesley, Richard Wright, and Alicia Beckford Wassink. (2003). Developing
a corpus of spoken language variability. Poster presented at the 146th
Acoustical Society of America conference, November 10-14, 2003, Austin,
TX
Second Language Acquisition of Suprasegmental Phonology
Abstract
New questions need to be asked in second language acquisition (L2A)
research: Can L2ers have differential success in acquiring the
segmental vs. suprasegmental phonology of an L2? That is, are segmental
and suprasegmental phonologies independent aspects of phonological
acquisition? If so, are segmental and suprasegmental acquisition
necessarily constrained according to the same age-based schedule? The
L2A literature shows that work that has already been done indicates a
natural division between segmental and suprasegmental phonology in
terms of L2A. I propose that L2ers do achieve differential
pronunciation success of segmental and suprasegmental components of
speech as a factor of the age-related characteristics of their L2A
experience. Specifically, suprasegmental acquisition success in an L2
requires an earlier start in life than segmental acquisition success.
Microsoft Word document
For bibliographical reference, use:
Carmichael, Lesley. (2002). Second Language Acquisition of
Suprasegmental Phonology. Unpublished ms., University of Washington
Department of Linguistics.
http://students.washington.edu/lesley/projects.html.
Measurable Degrees of Foreign Accent: A Correlational Study of Perception, Production, and Acquisition.
Abstract
Most people who learn a second language (L2) in late adolescence or
adulthood retain some degree of foreign accent. This study investigates
whether measurable intra-factor variability in the speech signal
correlates with subjective ratings of degree of foreign accentedness,
and whether there is an allowable deviation from native speaker norms
corresponding to ratings of little or no accent. Native Korean speakers with varying degrees of English L2
proficiency and native English controls provided the stimuli. Tokens
each contained one of three differences between Korean and English
phonologies, potentially creating opportunities for accented speech:
English contrasts between [l] and [?] (American English "r"); between
tense and lax vowels; and the presence of consonant clusters.
Monolingual native English respondents indicated the degree of foreign
accent they perceived. The stimuli recordings were subjected to
acoustic analysis. Intra-factor variability measurements correlated
with subjective perceptions of degree of foreign accent, implying that
listeners are sensitive to specific, measurable variances in the
acoustic signal. Consonant cluster avoidance correlated strongly with
perceptions of foreign accent, followed by [l]-[?] distance, and then
vowel quality. Together with the measurement data, the ratings indicate
that native listeners accept ranges of acoustic performance for
different degrees of accentedness.
Microsoft Word document
For bibliographical reference, use:
Carmichael, Lesley. (2000). Measurable Degrees of Foreign Accent: A
Correlational Study of Perception, Production, and Acquisition.
Unpublished M.A. Thesis, University of Washington.
This document may also be referred to as:
Carmichael, Lesley. (2000). Acoustic variability and perceived foreign
accent. Poster presented at the 140th meeting of the Acoustical Society
of America, Orange County, CA, December, 2000.
The Influence of Paralanguage on Intonational Complexity
Abstract
Suprasegmental features such as pitch, duration and
loudness are used to signal grammatical units and indicate the
relationship between linguistic objects and their relative importance.
These features (pitch in particular) can be organized to form
categorical expressions of linguistic structure. Paralanguage uses
these same suprasegmental features to express the situational context,
interpersonal interaction among interlocutors, and the affective state
of the speaker. Language and paralanguage therefore yield amalgamated
acoustic-phonetic outcomes. Intonational taxonomies attempt to organize
suprasegmental feature behavior into meaningful linguistic categories,
often including contextual references. While it has long been observed
that acoustic outcomes such as expanded pitch range and increased
loudness signal emotional states such as anger and surprise, a
connection between pitch fluctuation and paralanguage has not yet been
concretely investigated. A survey of seven languages shows an
interesting relationship between tonal shapes (and their temporal
distribution) and paralinguistic influence. Simple tones and tunes are
typically used for more basic communicative objectives, while complex
tone shapes and tunes are typically reserved for more pragmatically
and/or paralinguistically complicated messages.
Microsoft Word document
For bibliographical reference, use:
Carmichael,
Lesley. (2001). The Influence of Paralanguage on Intonational
Complexity. Unpublished ms., University of Washington Department of
Linguistics. http://students.washington.edu/lesley/projects.html.
Spontaneous Speech and the Rhythm Rule.
Abstract
The Rhythm Rule in English, a phonological
process to alleviate stress clash (Hayes, 1984), has been primarily
investigated in controlled speech. This study examines the Rhythm Rule
in spontaneous speech to determine its robustness in a natural,
spontaneous intonation environment. Two hypotheses about the effect of
intonational phonology on the realization of the Rhythm Rule are
considered: Hypothesis 1-The Rhythm Rule is a prominent phonological
process and is always indicated by acoustically salient features on the
target phrase. Hypothesis 2-The Rhythm Rule is subject to intonation
forces and its realization is dependent upon its placement in the
intonation contour. Pitch, duration, and amplitude were measured on all
syllables potentially involved in the stress clash. Duration proved to
be the most consistent acoustic cue to stress placement when evaluating
the target phrases for evidence of stress clash resolution via the
Rhythm Rule. When the target phrases were of some topical prominence in
spontaneous utterances, clash resolution was most commonly achieved by
reversing the stress of the first and third syllables of the first word
in the target phrase. When the target phrase occurred late in an
utterance, the acoustic features of pitch and duration provide some
evidence for the accent deletion analysis of the Rhythm Rule.
Crucially, this study shows that even when the lexically stressed
syllables are not particularly prominent at the utterance level, stress
clash is resolved. It is clear that the intonation patterns of English
have some effect on the realization of the Rhythm Rule because the rule
is not invariantly applied in all intonational contexts, however, these
results provide support for Hypothesis 1.
Microsoft Word document
For bibliographical reference, use:
Carmichael,
Lesley. (2001). Spontaneous speech and the rhythm rule. Poster
presented at the 141st meeting of the Acoustical Society of America,
Chicago, IL, June, 2001.
|
|