PERFORMANCEEVALUATIONOFLATENTVARIABLEMODELSWITHSPARSEPRIORSDavidWipf,JasonPalmer,BhaskarRao,andKennethKreutz-DelgadoDepartmentofElectricalandComputerEngineeringUniversityofCalifornia,SanDiegoLaJolla,CA92093-0407USAe-mail:{dwipf,japalmer}@ucsd.edu,{brao,kreutz}@ece.ucsd.eduABSTRACTAvarietyofBayesianmethodshaverecentlybeenintroducedforfindingsparserepresentationsfromovercompletedictionariesofcandidatefeatures.Thesemethodsoftencapitalizeonlatentstruc-tureinherentinsparsedistributionstoperformstandardMAPesti-mation,variationalBayes,approximationusingconvexduality,orevidencemaximization.Despitetheirrelianceonsparsity-inducingpriorshowever,theseapproachesmayormaynotactuallyleadtosparserepresentationsinpractice,andsoitisachallengingtasktodeterminewhichalgorithmandsparsepriorisappropriate.RatherthanjustifyingpriorselectionsandmodellingassumptionsbasedonthecredibilityofthefullBayesianmodelasiscommonlydone,thispaperbasesevaluationsontheactualcostfunctionsthatemergefromeachmethod.Twominimalconditionsarepostulatedthatideallyanysparselearningobjectiveshouldsatisfy.Outofallpossiblecostfunctionsthatcanbeobtainedfromthemethodsdescribedaboveusing(virtually)anysparseprior,auniquefunctionisderivedthatsatisfiestheseconditions.BothsparseBayesianlearning(SBL)andbasispursuit(BP)arespecialcases.Later,allmethodsareshowntobeperformingMAPestimationusingpotentiallynon-factorableimplicitpriors,whichsuggestsnewsparselearningcostfunctions.IndexTerms-sparserepresentations,sparsepriors,latentvari-ablemodels,underdeterminedinverseproblems,Bayesianlearning1.INTRODUCTIONHerewewillbeconcernedwiththegenerativemodely=X+6,(1)where1CRNNMisadictionaryofunit£2-normbasisvectorsorfeatures,xisavectorofunknownweights,yistheobservationvec-tor,andeisuncorrelatednoisedistributedasPJ(O,Al).Inmanypracticalsituations,thisdictionarywillbeovercomplete,meaningM>Nandrank(b)=N.Whenlargenumbersoffeaturesarepresentrelativetothesignaldimension,theestimationproblemisfundamentallyill-posed.ABayesianframeworkisintuitivelyap-pealingforformulatingthesetypesofproblemsbecauseprioras-sumptionsmustbeincorporated,whetherexplicitlyorimplicitly,toregularizethesolutionspace.Recently,therehasbeenagrowinginterestinmodelsthatem-ploysparsepriorstoencouragesolutionswithmostlyzero-valuedcoefficients.Forpurposesofoptimization,approximation,andin-ference,thesemodelscanbeconvenientlyframedintermsofacol-lectionofnon-negativelatentvariablesay['yi,...'>iV.TheThisworkwassupportedbyNSFgrantDGE-0333451andNSFgrantIIS-0613595.latentvariablesdictatethestructureofthesparsepriorinoneoftwoways.First,intheintegral-typerepresentation,thepriorisformedasascalemixtureofGaussiansviaMP(X)=HP(xi),P(xi)=JA(O,Oi)P(+y)d-y.i=lIncontrast,theconvex-typerepresentationtakestheform'P(xi)=supPJ(O,yij)p('yi),Vi>°(2)(3)whoseformisrootedinconvexanalysisanddualitytheory.Asshownin[10],virtuallyallsparsepriorsofinterestcanbedecom-posedusingboth(2)and(3),includingthepopularLaplacian,Jef-freys,Student'st,andgeneralizedGaussianpriors.2Thekeyre-quirementisthatp(xi)isstronglysupergaussian,whichrequiresthatp(xi)cxexp[g(xi)],(4)withg(.)aconcaveandnon-decreasingfunction.Inthecontextofregressionandmodelselection,thefullyBayes-iantreatmentwouldinvolveintegration(ormaximizationfortheconvexrepresentation)overboththelatentvariablesandtheun-knownweights.Withsparsepriors,however,thisisintractable.Moreover,inapplicationswheresparsityisimportant,oftenasparsepointestimatexisallthatisrequired,ratherthanmerelyagoodestimateofp(y)ortheconditionaldistributionofnewdata-pointsy*,i.e.,p(y*Iy).Assuch,nearlyallmodelswithsparsepriorsarehandledinoneoftwoways,bothofwhichcanbeviewedasapprox-imationstothefullmodel.First,thelatentstructureaffordedby(2)and(3)offersaveryconvenientmeansofobtaining(local)M\APestimatesofxusinggeneralizedEMproceduresthatiterativelysolvex=argmaxp(yx)p(x).(5)HenceforthreferredtoasTypeImethods,commonexamplesin-cludeminimum£p-quasi-normapproaches[6,12],Jeffreysprior-basedmethodssometimescalledFOCUSS[2,3, 5],andalgorithmsforcomputingthebasispursuit(BP)orLassosolution[3,7,12].Secondly,insteadofintegratingout(ormaximizingout)thehy-perparameters,TypeIImethodsintegrateouttheunknownxandthensolveoyargmaxp(-yy)=argmaxp(yx)AP(O,-y)p(-y)dx.(6)1Hereweuseaslightabuseofnotation,inthatp(Qi)neednotbeaproperprobabilitydistribution.2Theconvex-typerepresentationisslightlymoregeneralthan(2).1-4244-0728-1/07/$20.00©2007IEEEII-453ICASSP2007Onceaisobtained,apointestimateforxnaturallyemergesasEE[xY;y]=P(AI+(kF)Y,(7)whereFAdiag(y).RelevantexamplesincludesparseBayesianlearning(SBL)[15],automaticrelevancedetermination(ARD)[9],evidencemaximization[13],andmethodsforlearningovercompletedictionaries[4].Perhapssurprisingly,eventhepopularvariationalmean-fieldapproximations,whichoptimizeafactorialposteriordis-tributionsuchthatp*,'-yy)Q(Iy)q(yy),areequivalenttotheTypeIImethodsinthecontextofstronglysupergaussianpriors[10].Aspecificexampleofthiscanbefoundin[1].Inapplyingallofthesemethodsinpractice,theperformanceachievingsparsesolutionsisvaried.Resultscanbehighlydepen-dentonthe(subjective)parameterizationusedinformingthelatentvariables.Thisoccursbecausethedecompositionofp(x)isgen-erallynotunique.Insomecases,thesemethodsleadtoidenticalresults,inothers,theymayperformpoorlyorevenleadtoprov-ablynon-sparserepresentations,despitetheirfoundationonasparseprior-basedgenerativemodel.Instillothercases,theymaybeverysuccessful.Assuch,sortingoutthemeaningfuldifferencesbetweenthesemethodsremainsanimportantissue.Inthispaper,wewillbeginbyexaminingthecostfunctionsthatemergefromallpossibleTypeIandTypeIImethods,demonstratingthattheformerisactuallyalimitingcaseofthelatter,withacommonunderlyingsetofobjectivefunctionsunitingbothmethods.How-ever,itstillremainsunclearhowtoreliablyselectfromthisclassofalgorithmswhensparsityistheforemostconcern.Tothiseffect,wepostulatetwominimalconditionsthatideallyanysparseapproxima-tioncostfunctionshouldsatisfy.Wethenselect,outofallthepossi-bleTypeIandIImethodsdiscussedabove,theuniquefunctionthatsatisfiesthesetwoconditions.Interestingly,bothBPandSBLarespecialcases.Ingeneral,wewouldarguethattheseresultssignifi-cantlycompressthespaceof'useful'sparsealgorithmsandprovidesamorerigorousjustificationforusingaparticularmethodconsistentwithobservedempiricalresults[16].WeconcludebyshowingthatallTypeIImethodscanbeviewedasperformingM\APestimationusingnon-separable(i.e.,non-factorable)implicitpriors.Thiseluci-datesconnectionsbetween
or
We will never post anything without your permission.
Don't have an account? Sign up