UW-Madison ECE 539 - Optical Character Recognition using Neural Networks - D1320573

Home> Schools> University of Wisconsin, Madison> Electrical and Computer Engr (ECE) > ECE 539> Optical Character Recognition using Neural Networks

DOC PREVIEW

UW-Madison ECE 539 - Optical Character Recognition using Neural Networks

School name University of Wisconsin, Madison

Course Ece 539- Introduction to Artificial Neural Network and Fuzzy Systems

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Optical Character Recognition using NeuralNetworks(ECE 539 Project Report)Deepayan SarkarDepartment of StatisticsUniversity of Wisconsin – MadisonUW ID: [email protected] 18, 20031Contents1 Introduction 21.1 Software choices . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Segmentation 33 Feature Extraction 54 Classification 85 Limitations 86 Results 97 Comparison with existing methods 138 Discussion 141 IntroductionThe goal of my project is to create an application interface for Optical Char-acter Recognition that would use an Artifical Neural Network as the backendto solve the classification problem. It was originally motivated by Sural andDas (1999), which reports using a multi-layer perceptron approach to do OCRfor an Indian language, namely Bengali. However, the approach should workwith English as well.The input for the OCR problem is pages of scanned text. To perform thecharacter recognition, our application has to go through three importantsteps. The first is segmentation, i.e., given a binary input image, to identifythe individual glyphs (basic units representing one or more characters, usu-ally contiguous). The second step is feature extraction, i.e., to compute fromeach glyph a vector of numbers that will serve as input features for an ANN.This step is the most difficult in the sense that there is no obvious way toobtain these features.2The final task is classification. In our approach, there are two parts to this.The first is the training phase, where we manually identify the correct classof several glyphs. The features extracted from these would serve as the datato train the neural network. After the network is trained, classification fornew glyphs can be done by extracting features from new glyphs and usingthe trained network to predict their class.We shall describe each of these steps in the following sections, after somebrief comments on the choice of software used to implement the ideas de-scribed here.1.1 Software choicesI chose to implement this as an add-on package for a statistical and graphi-cal programming environment called R (http://www.r-project.org). Theother option I considered was MATLAB, but I decided not to use it for acouple of reasons. Firstly, I am not as familiar with MATLAB, and secondly,I will not have access to MATLAB after this semester.R is an open source implementation of the S language developed at Bell Labs,and is similar to the commercial package S-PLUS. It is quite popular amongstatisticians and has MATLAB-like capabilities in dealing with vectors andmatrices. It is easy to create new add-on packages for R to perform specifictasks, using the numerous other packages already available. An R package(such as the one I have written) can be used in all platforms where R runs,which include Linux, UNIX, Windows and Macintosh platforms.2 SegmentationThe most basic step in OCR is to segment the input image into individ-ual glyphs. In our approach, this is needed in two different phases, withslightly different requirements. The first is during the training stage, wheresegmented glyphs are presented to the human supervisor for manual classi-fication. The other is after the network is trained and we want to recognizea new image. In this case, we need to identify each glyph in the correct3sequence before extracting features from it and classifying.To make things easier, especially for the second step, I first try to split theimage into individual lines. Our input images are thresholded binary images.Assuming the images are oriented properly, a look at the mean intensitiesacross each row (see Figure 1) tells us the location of the gaps between lines(mean intensity close to 1). We use this fact to identify these line gaps andsplit the image into smaller pieces.Row (pixel)Mean Intensity0.50.60.70.80.91.00 100 200 300 400 500Figure 1: Average Intensity across first 500 rows of input imageThe current implementation is not very sophisticated, and sometimes failswhen there are very short lines at the end of a paragraph. However, thecalculations for identifying line gaps given the mean row intensities is im-plemented as a separate function and can be easily improved later withoutaffecting the rest of the procedure. A similar procedure can be used to splitlines into words.The segmentation into lines is an useful preprocessing step. The actual seg-mentation code accepts a matrix with entries 0 and 1 and returns a matrix ofthe same dimensions with entries 0, 1, 2, 3, . . . , N, where N − 1 is the numberof identified segments. The elements of the matrix marked i, i = 2, . . . , Ncorrespond to the ith segment. This part is computationally intensive and isimplemented internally in C code called from within R. Subsequently, anothersmall R function extracts the individual segments as binary matrices.As mentioned above, one important use of segmentation is for training theclassifier. In the training stage, we need to manually identify several glyphs4for later use as training data. There are several possible approaches to dothis. What I currently do is the following. At any given point, I maintainthe training glyphs as a list of binary matrices (one for each glyph) alongwith the manually selected class for each. This is stored as a data file ondisk (typically named "trainingData.rda") in an internal R binary format.To make additions to this list, one can call the function updateTrainingSetwith the name of an image file as its argument. This loads the specifiedimage, segments it, and for each identified glyph, asks the user to interac-tively input its class after displaying an image of the glyph. Once the classis specified, the corresponding matrix is added to the list along with its classlabel. There is also the possibility of not specifying a class, in which casethat glyph is ignored. This is useful for bad segments as well as for veryfrequent letters that would otherwise dominate the list.This approach has its drawbacks, discussed later.3 Feature ExtractionThe glyphs identified by segmentation are binary matrices, and as such, notsuitable for direct use in a neural network. So, we have to somehow extractfeatures from each glyph that we can subsequently use for classification. Thisis definitely the most important design decision in the procedure, since with-out a good feature set we cannot expect to see good results.There is no single obvious choice of features. I decided to base my featureson identifiable regular parabolic curves

View Full Document

UW-Madison ECE 539 - Optical Character Recognition using Neural Networks

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

UW-Madison ECE 539 - Optical Character Recognition using Neural Networks

Sign up for free to view:

Please select your school