1 8 Artificial Neural Networks written examination Institutionen f r informationsteknologi Olle G llmo Universitetsadjunkt Monday May 15 2006 900 1400 Adress L gerhyddsv gen 2 Box 337 751 05 Uppsala Allowed help material Pen paper and rubber dictionary Telefon 018 471 10 09 Telefax 018 51 19 25 Hemsida user it uu se crwth Epost olle gallmo it uu se Please answer in Swedish or English the following questions to the best of your ability Any assumptions made which are not already part of the problem formulation must be stated clearly in your answer Write your name on top of each page Don t forget to hand in the last page your answers to question 10 The maximum number of points is 40 To get the grade G pass a total of 20 points is required The grade VG pass with distinction requires approximately 30 points but also depends on the results on the lab course labs project Information Technology Olle G llmo Lecturer Address L gerhyddsv gen 2 Box 337 SE 751 05 Uppsala SWEDEN Your teacher will drop in sometime between 10 00 and 11 00 to answer questions In this exam some concepts may be called by different names than the ones used in the book Here is a list of useful synonyms and acronyms Perceptron summation unit SU conventional neuron Binary perceptron summation unit with binary step activation function Multilayer perceptron MLP Feedforward network of summation units RBF Radial Basis Functions Standard Competitive Learning LVQ I without a neighbourhood function Objective function the function to be minimzed or maximized error function fitness function Telephone 46 18 471 10 09 Telefax 46 18 51 19 25 Web site user it uu se crwth E mail olle gallmo it uu se Now sit back relax and enjoy the exam Good luck 21 students attended this exam of which 9 failed 7 passed G and 5 passed with 30 points or more may become pass with distinction depending on the results from lab course The best result was 37 points 1 Why is it impossible for a single binary perceptron to solve the XOR problem 2 Because XOR is not a linearly separable problem Perceptrons solve classification tasks by adjusting a hyper plane in the input space i e in 2D as in this case a line Most students got this Some failed to mention that the discriminant is a hyperplane line though which is the main point here 21 answers 15 with max credit Average 1 5 2 Neural networks require lots of data to be trained properly If you have too little data too few input target pairs the first thing to try is to get more However sometimes this is simply not possible and then to split up the few data you have in a training set and a test set might be considered wasteful Describe how K fold cross validation can be used to deal with this problem Note This is not early stopping 3 Split the data into K sets of N K patterns each where N is the total number of patterns Train on all but one and test on the one left out Do that for each of the K sets Report the average error over the K tests Alternative Select N K patterns at random train on the rest test on the ones selected and run this K times Report result as above K Fold cross validation is not an early stopping technique though as was clearly pointed out in the question 1 point was deducted for answers which did not mention what to do with the K test results 18 answers 5 with max credit Average 1 9 3 What is weight decay What is it good for and how can it be implemented 2 Weight decay is to let each weight in a neural network strive for 0 in addition to the change given by the training algorithm of course There are several reasons for wanting to do this For example most common answer So that we can remove unecessary weights after training since they will be very close to 0 By the way if you do this you should retrain the network afterwards To avoid numerical problems with too large weights since the weighted sum is in the exponent of the sigmoid large weights may quickly lead to numerical problems related to the previous Large weights also means that the sigmoids are likely to bottom out in either end where the derivative is close to 0 This makes the network rigid since this derivative is multiplied in the weight update formula So weight decay gives the network more flexibility and can speed up learning since it tends to move the weighted sums closer to the region in the sigmoid where the derivative is the largest Implementation After updating the weights according to the update rule update them again by w 1 e w where e is the decay rate Some students associated weight decay with Ant Colony Optimization instead of Neural Networks Indeed the decay of pheromones can be viewed as a form of weight decay but since the concept has not been discussed in those term on this course only partial credit was given for such answers 21 answers 7 with max credit Average 1 1 4 Write down the back propagation algorithm For full credit your description must be clear enough for someone who knows what a multilayer perceptron is to implement the algorithm 5 Considering that so much of this course focus on MLPs and backprop I was very surprised to see how many students failed this question Only two students got max credit only two more were close most got less than half of that 20 answers 2 with max credit Average 1 9 5 How is the hidden layer of a RBF network different from the hidden layer in a MLP Explain this difference in terms of a what the hidden nodes compute when feeding data to the network 2 MLP hidden nodes compute weighted sums of the input and feed that through a sigmoid RBF nodes compute the distance between the input vector and the weight vector and feed that through a Gaussian or similar function b how this affects the shape of the discriminant when using the networks for classification 2 MLP hidden nodes form hyperplanes RBF nodes form hyperspheres or hyperellipses Sidenote It is not the activation function which decides the shape of the discriminant The weighted sum forms the hyperplane in MLPs the sigmoid only decides what to output given the distance from that hyperplane Similarly for RBFs it is the distance calculation between input vector and weight vector which forms the hypersphere not the Gaussian c how the hidden nodes are trained 2 MLPs are usually trained by some form of backprop see Q4 The hidden layer of a RBF network is usually trained by some form of unsupervised learning e g competitive learning or K Means Some students got minor deductions for only describing RBF not comparing to MLP A few
View Full Document