CORNELL CS 664 - Lecture #5: Markov chains - D2510210

Home> Schools> Cornell University> Computer Science (CS) > CS 664> Lecture #5: Markov chains

CORNELL CS 664 - Lecture #5: Markov chains

Pages 25

Download Save

Unformatted text preview:

CS664 Lecture #5: Markov chainsCoins with memoryMarkov chainsMarkov chain evolutionExample in actionTransition matrixSome notationStationary distributionsPerron-Frobenius theoremConvergence ratesMarkov coin revisitedMarkov chains in visionEnergy functionGradient descentProperties of ETradeoffs of optimizationComplexityConsequencesGeneral-purpose methodsGradient descent alternatives?Metropolis(E,T)Metropolis propertiesRandom walks on graphsBiased random walksExampleCS664 Lecture #5: Markov chainsSome source material taken from:Joseph Changhttp://www.stat.yale.edu/~jtc5/jtc.html2Coins with memory Suppose that the coin acts the way that gamblers think – Look back at last result– Produce the opposite answer (probability p) or the same answer (probability 1-p) At p = .5, what percentage of heads do we expect in the limit?– What about at p = .1? (“Stubborn” coin)– What about at p = .9? (“Flighty” coin)– What about at p = 0? (“Stuck” coin)3Markov chains Generalization of a finite automaton Probabilistic transitions (edge weights)H T1 - p 1 - ppp4Markov chain evolution Distribution over states at a given time Taking a step updates the distribution– According to the edge weights– Consider the “Markov frog”3231s15352s25Example in action3231s15352s26Transition matrix The “probability mass” moved to a state is a linear combination of the masses at adjacent states– Coefficients are the edge weights7Some notation Stochastic vector π has non-negative elements that sum to 1– Stochastic matrix K has stochastic columns– πnis the distribution after n steps8Stationary distributions8553833285528331858353325231⋅+⋅⋅+⋅=8583=9Perron-Frobenius theorem If a Markov chain is strongly connected and has self-loops, it converges to a unique stationary distribution– No matter what the starting distribution Multiple self-loops are not required– Need to avoid “oscillating” cases1110Convergence rates Nothing in this theorem about rate! There are some complicated theorems on this topic– Nothing that guarantees fast convergence for the cases of interest11Markov coin revisited Transition matrix is given by– What about p=q=0?12Markov chains in vision Vital tool for many vision problems– Basis for trigrams, hence Efros & Leung• Images have “local” structure Major application: sampling– Generating answers from a distribution Major application: energy minimization– Also known as optimization– Elegant way to formulate most vision problems– Lots of interesting and powerful algorithms13Energy function),( yxE),( yxLocal minGlobal minCandidate14Gradient descent),( yx),( yxE15Properties of E Local versus global minimum If there is a unique minimum, E is said to be convex– Issue becomes convergence speed– In vision, we’re rarely so lucky We can compute global min sometimes– Other times compute a “strong” local min16Tradeoffs of optimization Advantages– Clean separation between what you want to compute and howyou compute it– Easy to add new constraints (terms)– Simple to explain Disadvantages– Optimization is often difficult– Separation of what and how can hurt you17Complexity In complete generality, computing global min requires exhaustive search Consider 2 energy functions– Uniform (flat everywhere)– Uniform with a well somewhere True even if P=NP18Consequences Consider an optimization method that can find the global min of an arbitrary E– Must require exponential time– Asymptotically same as exhaustive search Might work for a particular problem Strong methods have limited E– You need to understand and exploit the structure of the problem19General-purpose methods Example: genetic algorithms– Not a method taken seriously by reputable academics, in vision or elsewhere Population of candidate solutions– Representation is key Create new population– Crossovers, mutations– Replace the worst (highest E) candidates20Gradient descent alternatives? If E is convex we can just roll downhill– Risk is being stuck in local min What if we sometimes move uphill?21Metropolis(E,T)1. Generate random change (“sampling”)2. If the energy is lower, go there3. If the energy is higher1. Go there with probability ∝ exp(-ΔE/T)2. Otherwise, stay at old candidate22Metropolis properties We can do nothing (step 3.2) Gradient descent at low T, random search at high T Randomized algorithm– Output is a distribution over candidates– Hence, distribution over energy23Random walks on graphs Suppose we pick an edge uniformly at random from the outgoing edges– Undirected graph with self-loops– What is the stationary distribution?• More likely to end up at a node with many (incoming) edges, i.e. high degree2332224Biased random walks What if we want a different stationary distribution?– E.g., high degree node should be “unpopular”– Solution: change transition probabilities• I.e., don’t pick an outgoing edge uniformly Weight the outgoing edges by their relative popularity in desired distribution25Example2332242222Multiply by 2 Multiply by 2/3

View Full Document


School:
Email:
New Password:
Confirm Password:

CORNELL CS 664 - Lecture #5: Markov chains

Sign up for free to view:

Please select your school