**Unformatted text preview:**

Solution Sketches Midterm2 ExamCOSC 4368 Fundamentals of Artificial IntelligenceApril 10, 2019Your Name:Your student id: Problem 1 --- Constraint Satisfaction Problems [8]Problem 2 --- Supervised Learning in General [8]Problem 3 --- Neural Networks [11]Problem 4 --- Support Vector Machines [12]Problem 5 --- Using First Order Predicate Logic as a Language [8]Problem 6 --- Reinforcement Learning [13]:Number Grade:The exam is “open books and notes” but the use of computers is not allowed; you have 75 minutes to complete the exam. The exam will count approx. 15% towards the course grade. 11) Constraint Satisfaction Problems [8]Provide a definition1 the letter constraint satisfaction problem given below:1. Define the Variables2. Define the set of values each variable can take3. Define all constraints! T W O + T W O ------------------------------- F O U RAssume each letter can take only one digit, and reciprocally each digit can be associated to at most one letter. Variables: T, W, O, F, U, R in O…9 X1, X2, X3 in {0,1}Constraints:1. DIFF(T, W, O, F, U, R)2. O+O=R+10*X13. X1+W+W=U+10*X24. X2+T+T=O+X35. X3=F2) Supervised Learning in General [8]a) What is the purpose of using N-fold Cross Validation? Explain in a few sentences how 2-fold cross validation works! [4]To determine the generalization error/training accuracy of the learn model (if they just say just accuracy give them on 0.5 points) [1.5]Correct description of 2 fold cross validation [2.5]b) What is overfitting? [2]The model is too complex [1]; the testing accuracy is not optimal, although the training error is quite low. [1]c) Deep Neural Networks usually employ very complex models; what can be done to alleviate the problem of overfitting when using deep neural networks? [2]Use very large training sets[2]. 1 Be aware of the fact that you are not asked to solve this letter constraint satisfaction problem!23) Neural Networks [11]a) How do neural networks compute the value/activation of a node? [2]By applying the activation function to the weighted sum of the activations of its parent nodes. b) Describe how multi-layer neural networks, consisting of 3+ layers learn a model for a training set! Limit you answer to at most 9 sentences! [7]Neural network learning tries to find weights that minimize the error in the neuralnetwork prediction for a training set [1]. Neural networks employ gradient decent hillclimbing to find the “best” weights. [1]. In particular, Neural network learning adjustweights example by example [1]; weights are adjusted in the direction of the steepestnegative gradient of the error function---that is weights are updated accordingly movingin the direction that reduces the error the most [2]. The step width of the weight updatein the direction of the steepest gradient depends on the learning rate and other factors[0.5]. In order to apply this procedure the error for each none-input node has to beknown. As intermediate layer nodes is not initially given, is computed using the back-propagation algorithm [2].Other observation might deserve credit. At most 7 points!c) 2-Layer Neural Networks do not use the Backpropagation Algorithm—why is the case? [2]There is no intermediate layer; consequently, all necessary errors are known [2]34) Support Vectors Machines (SVM) [12]a) What is the margin for a SVM hyperplane? Why do SVM models maximize the margin? What are support vectors? [4]Margin means the width of the slab parallel to the hyperplane that has no interior data points [1.5] not mentioning “no interior points” at most 0.5 points…maximize the margin to better handle noise/to become more fault tolerant [1]Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane. [1.5]b) There has been a lot work in designing new kernels in machine learning including using kernels in conjunction with support vector machines. What do kernels do? Why do most support vector machine approaches employ non-linear kernels? What do you believe is the reason that support vector machines in conjunction with kernels accomplishquite high accuracies for challenging datasets? [5]Kernel map the dataset into a different, usually higher dimensional space [1]To deal with datasets whose classes are not linearly separable [2]By mapping the data to a higher dimensional datasets there are more ways to separate the examples of the 2 classes, improving the potential to obtain higher accuracies [2]Other answers for the second or third question might deserve partial credit!c) Assume we have a dataset with numerical attributes x, y, and an attribute c where c is aclass variable which we assume takes value in {0,1}. Give the equation of a hyperplane that the SVM learning could potentially learn for this dataset! [3]e.g. xyno partial credit!5) First Order Predicate Logic as a Language [8]Map the following two sentences into First Order Predicate Logic formulas: a) There are at least two green frogs in room 205 GARb) Every house owner in Texas owns a dog. a) fg (frog(g) frog(f) green(g) green(f) fg in-room(f, 205GAR) in-room(g, 205GAR))b) o ((house-owner(o) lives(o, Texas) (d owns(o,d)))Solutions that specify that the owned house is in Texas, instead the house owner being in Texas also deserve full credit, assuming that they are one error up to 1.5 points (e.g. omitting fg in a.), but no partial credit if 2+errors or if formulas do not make any sense at all 46) Reinforcement Learning [13]a) What are the main differences between supervised learning and reinforcement learning? [5]SL: assumes a static world[0.5], correct answer/action is known and described in training sets from which models are learnt![1.5]RL: can deal with dynamic changing worlds/can adapt [1]; needs to learn from indirect, sometimes delayed feedback/rewards[1]; suitable for exploration of unknown worlds[1]; temporal analysis/worried about the future/interested in an agent’s long term wellbeing[1], needs to carry out actions to find out if they are good—which actions/states are good is (usually) not know in advance1[1]Other answers might deserve credit, might also use answer from the RL-Paper paragraph on that matter (page 239)! At most 5 points!b) Assume the

View Full Document