UW-Madison ECE 533 - Automatic Tracing of Vocal Fold Edges in High Speed Laryngeal Imaging - D3004672

Home> Schools> University of Wisconsin, Madison> Electrical and Computer Engr (ECE) > ECE 533> Automatic Tracing of Vocal Fold Edges in High Speed Laryngeal Imaging

UW-Madison ECE 533 - Automatic Tracing of Vocal Fold Edges in High Speed Laryngeal Imaging

School name University of Wisconsin, Madison

Course Ece 533- Image Processing

Pages 12

Download Save

Unformatted text preview:

1 Automatic Tracing of Vocal Fold Edges in High Speed Laryngeal Imaging Erik Bieging ECE 533 Final Project2 1. Introduction Vibratory patterns of the vocal folds of interest in the field of laryngeal physiology. In the case of vocal fold pathologies, irregular motion of the vocal folds is known to occur. In order to better understand how vocal pathologies affect vocal fold vibration, video data is needed to quantify the spatial and temporal dynamics if the vocal folds. As the main functional tissue in the larynx, the vocal folds open and close rapidly while under tension when air is forced passed them. Opening between the vocal folds is known as the glottis. (Figure 1) The vocal folds oscillate at 100 to 400 Hz during normal phonation. [3] Figure 1: A typical laryngeal image obtained from an HSDI system, with the glottis and vocal folds designated. Former methods for capturing this motion include videostroboscopy, in which a strobe light was used to illuminate the vocal folds, and images were acquired at approximately 25 Hz. This method gave an approximate picture of vocal fold motion, but the sampling rate was much too low to reconstruct the actual motion of the vocal folds. Advances in technology have led to the use of high speed digital imaging (HSDI) systems when imaging the vocal folds. Images can be acquired at rates up to 4000 frames/sec. This sampling rate allows for complete capture of the vocal fold motion in one individual oscillation. However due to the high rate of image acquisition, large amounts of image data must be analyzed. In a typical 1 to 4 second sampling period, several thousand images are acquired and need to be analyzed. The vocal fold edges must be traced systematically in each frame of the video so that the glottal area can be extracted. Determined from an image series, a glottal area waveform can be extracted and used to determine if the vocal fold vibratory motion is normal or abnormal. [3] 2. Approach The goal of this project is to implement and test a new method for the extraction of the vocal fold edges and the glottal area in high speed laryngeal images. This method is then compared to existing methods of edge extraction that are currently used in laryngeal physiology. This method was conceived by myself and other members of the UW-Laryngeal Physiology Lab. 2.1. Our New Method Vocal Folds Glottis3 The method developed has two main steps, differentiation and Canny edge detection. The high speed laryngeal images under investigation are all grey scale images, so each pixel has a single value denoting its intensity. The images are converted to 8-bit MatLab images from Audio Video Interleave (.avi) files, and thus have the intensity range [0, 255]. The first step is to ensure that the glottis is oriented horizontally in the image frame, as it is in Figure 1. Each image in the video is then cropped to minimize the amount of unnecessary surrounding tissue in the image frame. In most laryngeal images the significant regions are of relatively low intensity. High intensity pixels are normally caused by reflective areas on the vocal fold tissue due to moisture. To remove high intensities from the images, a user defined threshold is applied to the image such that every pixel above the threshold is set to the threshold value, eliminating errors caused by high intensities. The default value of this threshold is 160. After these preprocessing steps take place, each column of the image is processed individually. A second user defined threshold is applied to the minimum value of each column. If the column’s minimum value does not drop below this threshold, then it is assumed that there is no glottal opening in the column, and no edge needs to be detected. This threshold is normally between 40 and 80. However if the threshold is exceeded, a five-point differentiating filter is applied to the intensity levels of each column, which smoothes the function as it differentiates reducing the effect of noise. The filter’s transfer function is as follows. ()2121() 2 26Hz z z z z−−=+++ The maximum and minimum of the differentiated column correspond to the two most rapid points in image intensity in the column. These are assumed to be the vocal fold edges. Points lying between the max and min are assumed to be part of the glottis and are given the value 1 and points lying outside are assumed to be vocal fold tissue and are given the value zero. After this process is applied to each column, a binary image is created (Figure 2). () min (,)() max (,)Ly IxyxRyIxyx∂⎛⎞=⎜⎟∂⎝⎠∂⎛⎞=⎜⎟∂⎝⎠ In the theoretical equations for determining the vocal fold edges, L(y) and R(y) are the positions of the two vocal fold edges in column y, and I(x, y) is the intensity image. The binary image is based on the following equation 1() ()(, )0() ()binaryLy x RyIxyxL y or x R y<<⎧=⎨<>⎩4 Figure 2: The binary image found after the differentiation step has taken place. This process alone gives a reasonable approximation of the vocal fold edge location. However due to inconsistencies in the location of the maximum and minimum slope in intensity, the edge is very rough. The Canny edge detection method is used to smooth the vocal fold edges, and is implemented automatically in MatLab. In Canny edge detection, an edge detection mask is applied to the image and two thresholds are used two determine the edge points. The first threshold is used to start an edge, and the second, lower threshold is used to continue an existing edge. This method both smoothes the edge and eliminates some unwanted edge points when errors occur in the first step of the algorithm. In this way, if an error occurs in the differentiation step in a single column, it will not affect the final edge. The output after the Canny edge detection step is shown with the original image in Figure 3. Figure 3: The result of the edge detection algorithm after Canny edge detection has taken place. From the Canny edge detection algorithm new functions denoting the glottal edge )(' yL and )(' yR are obtained. To extract the glottal area from the frame, the glottal width function is first calculated to be |)(')('|)( yRyLyWid−=. Although infrequent, errors can occur in the glottal edge detection that causes the width function to change drastically from one point to the next. To account for these drastic jumps in width, a filter is applied

View Full Document