DOC PREVIEW
UW-Madison ECE 539 - Optical Character Recognition using Neural Networks

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Optical Character Recognition using NeuralNetworks(ECE 539 Project Report)Deepayan SarkarDepartment of StatisticsUniversity of Wisconsin – MadisonUW ID: [email protected] 18, 20031Contents1 Introduction 21.1 Software choices . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Segmentation 33 Feature Extraction 54 Classification 85 Limitations 86 Results 97 Comparison with existing methods 138 Discussion 141 IntroductionThe goal of my project is to create an application interface for Optical Char-acter Recognition that would use an Artifical Neural Network as the backendto solve the classification problem. It was originally motivated by Sural andDas (1999), which reports using a multi-layer perceptron approach to do OCRfor an Indian language, namely Bengali. However, the approach should workwith English as well.The input for the OCR problem is pages of scanned text. To perform thecharacter recognition, our application has to go through three importantsteps. The first is segmentation, i.e., given a binary input image, to identifythe individual glyphs (basic units representing one or more characters, usu-ally contiguous). The second step is feature extraction, i.e., to compute fromeach glyph a vector of numbers that will serve as input features for an ANN.This step is the most difficult in the sense that there is no obvious way toobtain these features.2The final task is classification. In our approach, there are two parts to this.The first is the training phase, where we manually identify the correct classof several glyphs. The features extracted from these would serve as the datato train the neural network. After the network is trained, classification fornew glyphs can be done by extracting features from new glyphs and usingthe trained network to predict their class.We shall describe each of these steps in the following sections, after somebrief comments on the choice of software used to implement the ideas de-scribed here.1.1 Software choicesI chose to implement this as an add-on package for a statistical and graphi-cal programming environment called R (http://www.r-project.org). Theother option I considered was MATLAB, but I decided not to use it for acouple of reasons. Firstly, I am not as familiar with MATLAB, and secondly,I will not have access to MATLAB after this semester.R is an open source implementation of the S language developed at Bell Labs,and is similar to the commercial package S-PLUS. It is quite popular amongstatisticians and has MATLAB-like capabilities in dealing with vectors andmatrices. It is easy to create new add-on packages for R to perform specifictasks, using the numerous other packages already available. An R package(such as the one I have written) can be used in all platforms where R runs,which include Linux, UNIX, Windows and Macintosh platforms.2 SegmentationThe most basic step in OCR is to segment the input image into individ-ual glyphs. In our approach, this is needed in two different phases, withslightly different requirements. The first is during the training stage, wheresegmented glyphs are presented to the human supervisor for manual classi-fication. The other is after the network is trained and we want to recognizea new image. In this case, we need to identify each glyph in the correct3sequence before extracting features from it and classifying.To make things easier, especially for the second step, I first try to split theimage into individual lines. Our input images are thresholded binary images.Assuming the images are oriented properly, a look at the mean intensitiesacross each row (see Figure 1) tells us the location of the gaps between lines(mean intensity close to 1). We use this fact to identify these line gaps andsplit the image into smaller pieces.Row (pixel)Mean Intensity0.50.60.70.80.91.00 100 200 300 400 500Figure 1: Average Intensity across first 500 rows of input imageThe current implementation is not very sophisticated, and sometimes failswhen there are very short lines at the end of a paragraph. However, thecalculations for identifying line gaps given the mean row intensities is im-plemented as a separate function and can be easily improved later withoutaffecting the rest of the procedure. A similar procedure can be used to splitlines into words.The segmentation into lines is an useful preprocessing step. The actual seg-mentation code accepts a matrix with entries 0 and 1 and returns a matrix ofthe same dimensions with entries 0, 1, 2, 3, . . . , N, where N − 1 is the numberof identified segments. The elements of the matrix marked i, i = 2, . . . , Ncorrespond to the ith segment. This part is computationally intensive and isimplemented internally in C code called from within R. Subsequently, anothersmall R function extracts the individual segments as binary matrices.As mentioned above, one important use of segmentation is for training theclassifier. In the training stage, we need to manually identify several glyphs4for later use as training data. There are several possible approaches to dothis. What I currently do is the following. At any given point, I maintainthe training glyphs as a list of binary matrices (one for each glyph) alongwith the manually selected class for each. This is stored as a data file ondisk (typically named "trainingData.rda") in an internal R binary format.To make additions to this list, one can call the function updateTrainingSetwith the name of an image file as its argument. This loads the specifiedimage, segments it, and for each identified glyph, asks the user to interac-tively input its class after displaying an image of the glyph. Once the classis specified, the corresponding matrix is added to the list along with its classlabel. There is also the possibility of not specifying a class, in which casethat glyph is ignored. This is useful for bad segments as well as for veryfrequent letters that would otherwise dominate the list.This approach has its drawbacks, discussed later.3 Feature ExtractionThe glyphs identified by segmentation are binary matrices, and as such, notsuitable for direct use in a neural network. So, we have to somehow extractfeatures from each glyph that we can subsequently use for classification. Thisis definitely the most important design decision in the procedure, since with-out a good feature set we cannot expect to see good results.There is no single obvious choice of features. I decided to base my featureson identifiable regular parabolic curves


View Full Document

UW-Madison ECE 539 - Optical Character Recognition using Neural Networks

Documents in this Course
Load more
Download Optical Character Recognition using Neural Networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optical Character Recognition using Neural Networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optical Character Recognition using Neural Networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?