New version page

d1v1

This preview shows page 1-2-3 out of 10 pages.

View Full Document
View Full Document

End of preview. Want to read all 10 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Chapter 1IntroductionA method for accurately describing pathologies in the human voice in acousticterms has long been sought. Rating scales of “roughness” or “breathiness” have beenapplied, but are heavily rater-dependent. Ideally, pathological voices would be sampledand automatically analyzed in terms of model parameters, which could provide ratings thatare more objective. This work mounted an extended effort to apply principles of electricalengineering and signal processing to the study of pathological human voices. Pathologicalvowels were analyzed via the source-filter model, producing objective parameters definingthe voice and allowing accurate re-synthesis. Model parameters in particular includednonperiodic components, which are most prominent and defining in pathological voices.Nonperiodic components were expressed in terms of nonperiodic frequency modulation(FM), consisting of both high frequency period variation (HFPV) and low frequency(tremor), nonperiodic amplitude modulation (AM) consisting of both high frequencyshimmer and low frequency power variation, and aspiration noise. The integration of thisthree component model with AM and FM demodulation techniques yielded a novel1approach to the analysis and synthesis of pathological vowels. This work addressed theexperimental question: “Can modeling nonperiodic components with AM, FM, andaspiration noise improve the accuracy of analysis and fidelity of synthesis of pathologicalvowels?” This question was addressed both by re-analysis of the synthetic signals and bysubjective analysis-by-synthesis (SABS) experiments in which a listener adjusts the modelparameters of a synthesizer to produce a synthetic voice sample which matches theoriginal voice as closely as possible.In brief, the general process of voice analysis and modeling is displayed in Fig. 1.1, andconsists of the following steps:1. Pathological voice samples are recorded from patients.2. The voice samples are analyzed into descriptive parameters that provide sufficientinformation to reconstruct them.3. Parameter extraction is validated manually and modified where necessary.4. Synthetic versions of the original voices are computed.5. The synthetic versions are compared to the original in perceptual experiments.In this effort, the pathological voices used were selected from a range of disordersincluding vocal nodules, cancer, and lack of neural control. Pathological voices may resultfrom a large variety of conditions. Examples include cleft palate, deaf talkers, anddysarthria [21]. In this study of sustained pathological vowels, a large source of differencefrom the normal voice lies in the mechanism of generation of the source driving function inthe source-filter model of speech production (Fig. 1.2). In this model, the source is the2time variation in airflow from the lungs provided by the vibrations of the vocal folds. Innormal voices the glottal vibrations are rhythmic and produce abrupt closures of the vocalfolds, which generate a steady fundamental frequency and excites the higher frequencyresonances of the vocal tract; this generates vowels of a pleasing perceptual quality, whichare deemed “normal.” In pathological voices, the physical structures of the glottis andtheir neural control mechanisms may be disrupted, producing irregular vibrations and slowor incomplete closure; this may result in voices that are perceived as abnormal. Terms suchas “rough,” “breathy,” “creaky,” “gargled,” “hoarse,” or “raspy” may be applied to these.1.1 MotivationResearch in modeling and synthesizing pathological vowels is motivated by at leasttwo goals:1. Objective analysis and parameterization of pathology in voices. Previous efforts [23]have expounded on the need for non-subjective measures of pathology in voices. Forexample, different clinicians rate the same voice differently on subjective scales of“breathiness” or “roughness.” Such ratings acquire importance as they are used in theevaluation of costly medical procedures sometimes applied to improve voice quality.Objective measures provide a much-needed standard against which to measure results ofsuch efforts. In the ideal scenario, a fully automatic voice analysis system samples thepathological voice and establishes objective measures for voice acoustic measurementssuch as jitter, shimmer, tremor, volume variation, fundamental frequency variation,formant modulation and other parameters. Validity of this automatic analysis would be3confirmed by subjective analysis by synthesis experiments (SABS) experiments in which alistener adjusts the model parameters of a synthesizer to produce a synthetic voice samplewhich matches the original voice as closely as possible, thus validating the automaticallydetermined measures. The set of measures would then constitute a standard against whichdifferent voices or the same voice at different times could be compared.2. Generation of high fidelity synthetic voice samples for use in perceptual studies. Asecond goal is to generate synthetic vowel samples matching the original as closely aspossible. Once accurate synthetic versions of voices have been established, theirparameters of analysis and synthesis can be varied in SABS experiments to establish theirperceptual effects. Accurate synthetic samples provide the starting point for several typesof studies, including: [17]- Evaluation of the perceptual importance of each parameter of synthesis.- Measurement of variations in listener perception.- Measurement of minimum perceivable changes in parameters (“difference lymens”).1.2 Background and Related WorkIn order to place the current project into perspective with existing similar work, someof the related efforts are outlined. In regard to complete analysis/synthesis approaches forpathological voice study, the following are related:4Childers and Lee [5] describe a study in which voices containing vocal fry, falsetto,and breathiness are analyzed and synthesized. The two step LP analysis procedure forformant determination and inverse filtering (adopted by Norma Antonanzas for the currentstudy as described in Section 2.1) is used. Fundamental frequency determination is aidedvia EGG. The


Loading Unlocking...
Login

Join to view d1v1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view d1v1 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?