Princeton MAE 345 - Communication, Information, and Machine Learning - D2143350

Home> Schools> Princeton University> Mech. And Aerospace Engr. (MAE) > MAE 345> Communication, Information, and Machine Learning

Princeton MAE 345 - Communication, Information, and Machine Learning

School name Princeton University

Course Mae 345- Microprocessors for Measurement/Control

Pages 24

Download Save

Unformatted text preview:

Communication, Information, and Machine Learning !Robert Stengel!Robotics and Intelligent Systems MAE 345 !Princeton University, 2013"• Communication/Information Theory"– Wiener vs. Shannon"• Finding Decision Rules in Data"• Markov Decision Processes"• Graph and Tree Search"Copyright 2013 by Robert Stengel. All rights reserved. For educational use only.!http://www.princeton.edu/~stengel/MAE345.html!“Communication Theory” or “Information Theory”?"• Prodigy at Harvard, professor at MIT"• Cybernetics"• Feedback control"• Communication theory"Norbert Wiener!(1894-1964)!Claude Shannon!(1916-2001)!• University of Michigan, MIT (student), Bell Labs, MIT (professor)"• Boolean algebra"• Cryptography, telecommunications"• Information theory"Dark Hero Of The Information Age: In Search of Norbert Wiener, the Father of Cybernetics, Flo Conway and Jim Siegelman, 2005. Basic Books"The Information: A History, A Theory, A Flood, James Gleick, 2011, Pantheon."Communication: Separating Signals from Noise"Signal-to-Noise Ratio, SNR! SNR =Signal PowerNoise Power!SN=σsignal2σnoise2(zero − mean), e.g., wattswattsSNR(dB) = 10log10Signal PowerNoise Power= 20 log10Signal AmplitudeNoise Amplitude= S(dB) − N(dB)SNR often expressed in decibels"Communication: Separating Analog Signals from Noise" SNSDRω( )=Signal Power Spectral Densityω( )Noise Power Spectral Densityω( )!PSDsignalω( )PSDnoiseω( )Signal-to-Noise Spectral Density Ratio"Hω( )=PSDsignalω( )PSDsignalω( )+ PSDnoiseω( )=SNSDRω( )SNSDRω( )+1Optimal (non-causal) Wiener Filter"Communication: Bit Rate Capacity of a Noisy Channel"Shannon-Hartley Theorem, C bits/s"C = B log2SN+1⎛⎝⎜⎞⎠⎟= B log2SNR +1( )S = Signal Power, e.g., wattsN = Noise Power, e.g., wattsB = Channel Bandwidth, HzC = Channel Capacity, bits/sEarly Codes: How Many Bits?"Semaphore Line Code! Morse Code!• ~ (10 x 10) = 100 pixels = 100 bits required to discern a character"• Dot = 1 bit"• Dash = 3 bits"• Dot-dash space = 1 bit"• Letter space = 2 bits"• 3 to 21 bits per character"ASCII encodes 128 characters in 7 bits"Entropy Measures Information Content of a Signal"• H = Entropy of a signal encoding I distinct events"H = − Pr(i) log2Pr(i)i =1I∑• Entropy is a measure of the signals uncertainty"– High entropy connotes high uncertainty!– Low entropy portrays high information content!• i = Index identifying an event encoded by a signal "• Pr(i) = Probability of ith event"• log2Pr(i) = Number of bits required to characterize the probability that the ith event occurs" 0 ≤ Pr(.) ≤ 1log2Pr(.) ≤ 00 ≤ H ≤ 1Entropy of Two Events with Various Frequencies of Occurrence"• Frequencies of occurrence estimate probabilities of each event (#1 and #2)"H = H#1+ H# 2= − Pr(#1) log2Pr(#1) − Pr(# 2) log2Pr(# 2)• –Pr(i) log2Pr(i) represents the channel capacity (i.e., average number of bits) required to portray the ith event" log2Pr #1 or # 2( )≤ 0Pr(#1) =n #1( )NPr(# 2) =n # 2( )N= 1 −n #1( )N• Combined entropy"Entropy of Two Events with Various Frequencies of Occurrence"Entropies for 128 TrialsPr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropyn n/Nlog2(n/N) 1 - n/N log2(1 - n/N) H1 0.008 -7 0.992 -0.011 0.0662 0.016 -6 0.984 -0.023 0.1164 0.031 -5 0.969 -0.046 0.2018 0.063 -4 0.938 -0.093 0.33716 0.125 -3 0.875 -0.193 0.54432 0.25 -2 0.75 -0.415 0.81164 0.50 -1 0.50 -1 196 0.75 -0.415 0.25 -2 0.811112 0.875 -0.193 0.125 -3 0.544120 0.938 -0.093 0.063 -4 0.337124 0.969 -0.046 0.031 -5 0.201126 0.984 -0.023 0.016 -6 0.116127 0.992 -0.011 0.008 -7 0.066Entropy of a fair coin flip = 1"Accurate Detection of Events Depends on Their Probability of Occurence"Signals Rounded to Their Intended Values!Accurate Detection of Events Depends on Their Probability of Occurence"σnoise= 0.1σnoise= 0.2σnoise= 0.4Finding Efficient Decision Rules in Data (Off-Line)"• Choose most important attributes first"• Recognize when no result can be deduced"• Exclude irrelevant factors"• Iterative Dichotomizer*: the ID3 Algorithm"– Build an efficient decision tree from a fixed set of examples (supervised learning)"*Dichotomy: Division into two (usually contradictory) parts or opinions"Fuzzy Ball-Game Training Set"Case # Forecast Temperature Humidity Wind Play Ball?1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Low Weak Yes6 Rain Cool Low Strong No7 Overcast Cool Low Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Low Weak Yes10 Rain Mild Low Weak Yes11 Sunny Mild Low Strong Yes12 Overcast Mild High Strong Yes13 Overcast Hot Low Weak Yes14 Rain Mild High Strong NoDecisions"Attributes"Parameters of the ID3 Algorithm"• Decisions, e.g., Play ball or dont play ball"– D = Number of possible decisions"• Decision: "Yes, no"Parameters of the ID3 Algorithm"• Attributes, e.g., Temperature, humidity, wind, weather forecast!– M = Number of attributes to be considered in making a decision!– Im = Number of values that the ith attribute can take"• Temperature: "Hot, mild, cool"• Humidity: "High, low"• Wind: "Strong, weak"• Forecast: "Sunny, overcast, rain"Parameters of the ID3 Algorithm"• Training trials, e.g., all the games attempted last month"– N = Number of training trials"– n(i) = Number of examples with ith attribute"Best Decision is Related to Entropy and the Probability of Occurrence"H = − Pr(i) log2Pr(i)i =1I∑• High entropy"– Signal provides low coding precision of distinct events"– Differences coded with few bits" "• Low entropy"– More complex signal structure"– Detecting differences requires many bits"• Best classification of events when H = 1..."– but that may not be achievable"Decision-Making Parameters for ID3"HD = Entropy of all possible decisions"HD= − Pr(d) log2Pr(d )d =1D∑Gi = Information gain of ith attribute"Gi= SD+ Pr(i) Pr(id) log2Pr(id)[ ]d =1D∑i =1Im∑Case # Forecast Temperature Humidity Wind Play Ball?1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Low Weak Yes6 Rain Cool Low Strong No7 Overcast Cool Low Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Low Weak Yes10 Rain Mild Low Weak Yes11 Sunny Mild Low Strong Yes12 Overcast Mild High

View Full Document


School:
Email:
New Password:
Confirm Password:

Princeton MAE 345 - Communication, Information, and Machine Learning

Sign up for free to view:

Please select your school