Exam inNeural Networks and Learning Systems- TBMI26Time: Some day at some timePlace:Teacher: Magnus Borga, Phone:Allowed add itional material: Calculator, Tefyma, Beta, Physics handbookThe exam consist of three parts:Part 1 Consists of ten questions. The questions test general knowledge and understan-ding of central concepts in the course. The answers should be short and givenon the blank space after each qu estion. Any calculations does not h ave to bepresented. Maximum one point per question.Part 2 Consists of five questions. These questions can require a more detailed know-ledge. Also here, the answers should be short and given on the blank spaceafter each question. Only requested calculations have to be presented. Maxi-mu m two points per question.Del 3 Consists of four questions. All assumptions and calculations made should bepresented. Reasonable simplifications may be done in the calculations. Allcalculation s and answers should be on separate papers (not in the exam).Each question gives maximum five points.)The maximum sum of points is 40 and to pass the exam (grade 3) normally 18 points arerequired. There is no requirement of a certain nu mber of points in the different parts ofthe exam. The answers may be given in English or Swedish.The result will be reported at ... on the latest. The exams will then be available at IMT.GOOD LUCK!Part 11. Mention three types of supervised learning!2. Why can’t th e “xor-problem” be solved by a one-layer perceptron?3. What can usupervised learning be used for?4. What is required of an optimization problem in order to be able to solve it withkernel methods?5. Write a cost function which can be usesed for P CA!6. There is a method that minimizes the quotient between the variance of each classand the distance between classes. What is it called?7. BSS solves a problem with several unknown components. Which are these?8. What are the connections called, that transfer information beween biological neu-rons?9. What is the meaning in words of the Schema theorem?10. What is it called when a system identifies two different states as being one and thesame?Part 211. We have the following three 9-dimensional data vectors:111101234,222201234,333301234What will the 9 eigenvalues be if we analyse the distribution with PCA?12. a) In ordinary one-dimensional correlation analysis, the sign of the correlationco efficient i important sin ce it tells if whether the variables vary in the sameway or if they vary in the opposite way. Can the correspond ing information beobtained from the correlation coefficients in CCA? Give a motivation to youranswer! (1p)b) What problem may turn up in CCA if the dimensionality of the signal is in thesame order of size as the number of samples? (1p)13. In a SOM, not only the winner unit is moved towards the sample, but also otherunits in the network move along. Why is that an important property of this m ethod?14. A Markov model has three states: 1, 2 and 3. The probability for transition betweenstate 1 to sate 2 is 40 % and from state 1 to thr ee it is 60 % . In state 2, th esystem will stay there and from state 3, the system will always move to state 1.Draw the model and calculate th e probab ility that the system will be in state 1, 2,and three respectively after a long time, i.e. th e stationary distribution. Assume thatthe distribution of initial states is given by the vector p = [p1(1), p1(2), p1(3)]T. (2p)15. Which of the following functions are not Lyapunov function to ˙x(t) = e − exp(x)?Motivate why! (1p)• V (x) = x2• V (x) = log(ex)2• V (x) = (x − 1)2• V (x) = (x − e)2Show that one of these functions actually is a Lyapu nov function. (1p)Part 316. Assume we have an RBF network with the following functions in the hidden layer:si=1/21 + kx − µik2/σ2iAussume a linear activation function in the output layer. T he network should betrained in the same way as in the back-propagation excercise, i.e. off-line (batch)with a mean square error measure. The disired ouput is as usual in a variable d.21xxRBFRBF,,Tss12y2µ1vσ1µ σ2sa) First, assume that the RBF units have a constant size, i.e. that σ1= σ2= σ,where σ is constant. Give the update rule for the RBF-layer (i.e. determine howthe parameter vecors µishould be updated). (2p)b) Now, also the size of the RBF units should be adapted. Give an upd ate rule forthe RBF layer when also the σi-parameter should be updated. (3p)17. Suppose we have a quadratic mapping of the input signal x to a high-dimensionalfeature space x → ϕ(x) according to ϕ(x) = x × x w here “×” means that you takethe outer product and than make a vector of the resulting matrix, which will containall products between the components of the input vectors.a) Show that (xTixj)2is a kernel fun ction corresponding to the mapping ϕ. (Youcan for simplicity assume that x is two-dimensional.) (2p)b) What w ill the kernel m atrix look like if we have the following three data vectors:−11,00,11?(1p)c) For this particular case, show that you get a kernel matrix that correspondstho the samples being centered in the feature space by, from the non-centeredkernel matrix subtracting the column mean from each column, then su btractthe row mean from each row, and finally add the total mean value of the wholematrix, i.e.k′ij= kij−1nXikij−1nXjkij+1n2Xijkijwhere k′are the components in the centered kernel matrix. (2p)18. Show why CCA can not be used f or blind source separation if the sources have thesame auto-correlation! (5p)19. The figure shows three different deterministic state m odels and the correspondingreward function. The states are enumerated and arrow s represent actions. The num-bers close the actions denotes the corresponding rewards (negative costs). If thesystem reach es a state denoted “slut” (end), no more reward is obtained, i.e. thevalue function of the state is zero.30121131204Slut113Figur 1: State models A and B.a) Calculate the optimal Q and V functions for system A as a function of 0 < γ <1. (2p)b) Calculate the optimal Q and V functions for system B as a function of 0 < γ < 1.(2p)c) What happens if you set γ = 1 in system A and B respectively?
View Full Document