# ILLINOIS CS 446 - 102417.1 (5 pages)

Previewing pages*1, 2*of 5 page document

**View the full content.**## 102417.1

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 102417.1

0 0 37 views

- Pages:
- 5
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 17 Neural Networks with Non Linear Activations Lecturer Sanmi Koyejo Scribe Justin Szaday szaday2 Oct 24th 2017 Announcements Please complete the midterm survey if you have not already The details of the project will be out by the end of the week half a class will be dedicated to discussing it next week The project will need a lot of compute hours and to help Microsoft has donated some to the class If you would like to learn more about their platform Azure they will be hosting a tutorial on it Wednesday November 1st There will be a Piazza post with details soon The Basic Perceptron Figure 1 A perceptron in graph form with linear activations The goal of the perceptron is to estimate a function f Rd 1 1 that makes good predictions given a dataset Dn xi yi ni 1 where xi Rd and yi 1 1 To do this we P aim to minimize the loss Dn n1 ni 1 yi f xi where for the perceptron in particular the function f is a simple linear function in the form of f x wT x b and the loss is given by yi f xi max 0 yi f xi This loss function linearly penalizes mistakes and has zero loss otherwise Figure 1 demonstrates a graphical representation of the perceptron and in the next section we will extend this concept to more complex functions 1 2 17 Neural Networks with Non Linear Activations The Multi layer Perceptron Figure 2 A multi layer perceptron in graph form with linear activations The idea of the multi layer perceptron stems from repeating the basic perceptron more than once To do this we repeat the basic perceptron multiple times forming a layer then feed the results into another layer then feed those results into another layer and so on and so forth Figure 2 shows a simple two layer example of this The first layer is formed of k functions that are equivalent to the standard perceptron each with their own weights and biases The k 1 vector they form which is called z is then fed into the output layer which forms the h 1 output vector which gets fed into the loss function during training This can be thought of as a 2 layer Neural Network with one hidden layer and linear activations which we will explain the meaning of in a later section Hidden layers refer to the layers leading up to the output layer the results of which form the vector z given by WT1 1 b1 1 WT b 1 2 1 2 z xi W1 x i b1 WT1 k b1 k 1 Likewise we can express f as f xi W2 z b2 W2 W1 xi b1 b2 2 This shows that the multi layer perceptron is equivalent to a simple linear model like linear regression As such we would like to expand it to include non linearity

View Full Document