UCI ICS 273A - Complex Cell Models - D2028849

Home> Schools> University of California, Irvine> (ICS) > ICS 273A> Complex Cell Models

DOC PREVIEW

UCI ICS 273A - Complex Cell Models

School name University of California, Irvine

Course Ics 273a- Machine Learning

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Complex-Cell ModelsYutian ChenDepartment of Computer ScienceAbstractA Product of Experts (PoE) model is a probabilistic model which com-bines a number of individual component models by multiplying theirprobabilities. A typical PoE model is an exponential family Harmo-nious introduced by Welling et al. In this paper, we apply a hierarchicalPoE model, Complex-Cell Model (CCM), to image retrival and comparethe performance with a kind of standard harmonious model, Simple-CellModel (SCM). It models second order correlations between pixels. Wefind that the precision of CCM is higher than SCM when the recall islarge.1 IntroductionA Product of Experts (PoE) model [1] combines a number of individual component mod-els by taking their product and normalizing the result. The probabilistic model can beexpressed asP (x|{θj}) =1ZMYj=1fj(x|θj)withZ =ZdxMYj=1fj(x|θj)PoE has the advantage of modeling constraints of the data compared to mixture of models.A kind of PoE model is introduced by Welling et al, aka Exponential Family Harmonious(EFH)[2]. It can be understood as a two layers Markov Random Field shown in Figure1. One layer is observed variables x and the other is hidden variables h. The conditionalprobabilities P (h|x) and P (x|h) both belong to exponential family. EFH has much fasterinference speed than directed graphical models since hidden variables are conditional inde-pendent given observed data, while in directed graphical models hidden variable are condi-tional dependent duo to explaining away property. When applied to document retrieval andobject recognition [2] [3], EFH is proved to have better performance than directed graphicalmodels such as pLSI and LSI.A hierarchical EFH model, called hierarchical Product of Student-t model (hPoT) [4], isintroduced to analyze natural scenes. It adds another layer on top of a PoE model whosemarginal probability of x in each component is student-T distribution. This model showstopographic organization of Gabor-like receptive fields.Figure 1: Exponential Family HarmoniousIn this paper, We’d like to use a simplified version of hierarchical EFH model. It’s calledComplex-Cell Model (CCM) because its filters show similar features with complex cellsin the cerebral primary visual cortex. The observed variables and hidden variables areboth logical variables, {0, 1}. This model is trained with unlabeled digits by minimizingcontrastive divergence. Then, we apply it to digit retrieval as well k-nearest neighbourclassification and compare the precision-recall curve with a Simple-Cell Model (SCM)which is a standard two layers harmonious. The name also comes from a type of cells inprimary visual cortex. It turns out that CCMs have Gabor-like filters modeling edges ofdigits and the precision is larger than SCM when recall is large.2 Simple-Cell Models and Complex-Cell ModelsSCMs and CCMs are both harmonium models proposed to model the priors of unlabeleddata such as images and documents. We will introduce the probabilities and correspondingundirected graphs of these two models respectively in the following subsections.2.1 Simple-Cell ModelsThe joint probability of observed and hidden variables in SCM isP (x, h) =1Ze−E(x,h)(1)−E(x, h) =Xiαixi+Xjβjhj+XijhjWijxi(2)Z =Xx,he(−E(x,h))(3)This is a restricted Boltzmann machine with energy E(x, h). The Z is a normalization termcalled partition function. The domain of observed and hidden variables is both {0, 1}. Theconditional probabilities of P (x|h) and P (h|x) are both product of logistic functionsP (x|h) =Yiσ(αi+XjWijhj)xi(4)P (h|x) =Yjσ (βj+XiWijxi)hj!(5)where σ(x) =11 + e−x(6)The mean value of h is a function of the output of a linear filter Wj. This model correspondsto a two layers Markov model (Figure 1). Observed and hidden variables are conditionalindependent given the other layer. This is a simplified version of EFH introduced in [2]and [3] where the conditional probabilities are more complex exponential family. It’s moreobvious to see that a SCM is a PoE when we marginalize hidden variables.P (x) =1Z0Yj(1 + exp(βj+XiWijxi)) exp(αTx) (7)=1Z0expXjlog 1 + exp(βj+XiWijxi)!+ αTx(8)where Z0is partition function.2.2 Complex-Cell ModelA hierarchical PoE is introduced in [4]. The undirected model is shown in Figure 2. Itadds a second layer on top of PoE including a nonlinear transition y =⇒ y2. In this paper,instead of using student-t distribution, we use a binary model.Figure 2: hPoE. Dash line means deterministic functionThe energy of CCM model is−E(x, h) =Xiαixi+Xkγkhk+XkjhkVjk(XiJjixi)2(9)The hidden variables h are still conditional independent given the observed variables x, butx is no longer conditional independent because of the nonlinear operation.P (xi|x−i, h) = σαi+XkjhkVjkJji(2XlJj,l6=ixl+ Jji)xi(10)P (h|x) =Ykσγk+XjVjk(XiJjixi)2hk(11)CCM puts constraints on second order relationship between observed variables. Givendifferent h, it gives high probability to x with different covariance matrix.3 Training AlgorithmParameter learning for harmonium models is performed by stochastic gradient ascent on thelog-likelihood of the data [2]. For large redundant dataset it is more efficient to estimatethe required gradients on small batches rather than on the entire dataset. We also includea momentum term to speed up convergence and a decay parameter to reduce unneededweights.The derivatives of log-likelihood wrt. parameter w is in the form∂ log(P (x, h))∂w∝<−∂E(x, h)∂w>˜p− <−∂E(x, h)∂w>p(12)where < · >˜pdenotes expectation over empirical distribution, i.e. image samples, while< · >pdenotes expectation over the model distribution given by the current parameters.The first term is easy to compute by averaging over samples, while the second one is com-putational intractable. One approach is to run the Gibbs sampling defined by equations [4][5] [10] [11]. However, it takes too long to run this sampler to equilibrium for every itera-tion. An alternative way is to initialize the Gibbs sampler on each data point, run only a fewor even one steps of sampling, and then use the unconverged points to estimate the modelexpectation. This is known as contrastive divergence learning [5]. It reduces variance atthe expense of a bias for the parameter estimates. Moreover, the bias is usually very smallin practice. When computing the derivative of −E(x, h), it is useful to reduce the varianceof estimates by

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

UCI ICS 273A - Complex Cell Models

Sign up for free to view:

Please select your school