New version page

# IUB MICS-Q 550 - Parameter Identification and Objective Functions

Pages: 27
Documents in this Course

15 pages

16 pages

Unformatted text preview:

Parameter Identification andObjective FunctionsQ550: Models in Cognitive ScienceQualitative vs. Quantitative• Decay example: qualitative comparison (H0) this rarely happens, and we rely on a quantitativecomparison of models: which best fits data?• Quant predictions must be evaluated based on optimal set ofparameters. Otherwise, we could reject a good model simpleb/c we selected poor parameters• Comparison should not be dependent on researcher’sarbitrary selection of parameters (compare to regression)Quantitative Comparisons Require:1. Reliable empirical data2. Objective function of (mis)fit3. Optimal parameters4. Quantitative comparison techniqueTypes of Parameters:• Task Parameters Fixed; set from values used in experiment• Assumptions Randomized or sampled; representation,environmental structure, etc.• Cognitive Parameters Free (estimated from data); Representative ofcognitive process--criterion, attention weight, processrates, etc.• Scaling Parameters Free; Undesirable parameters that aspire to becomecognitive process parameters, but we don’t knowwhat process or whyOutline:• Simple process model and experimental data• Identify, fix, and free parameters• Objective functions• Hand fitting, grid search, optimization algorithmsGenerate data from model w/ given parameters and then tryto reconstruct the generating parms from the dataClassification/Retention Model• Ss are trained to classify MDS stimuli (mushrooms,fish, etc.) into two categories Gaussian representation--not fully linearly separable(depends on σ2 ), but we’ll use a simple SLP neuralnet and delta learning rule• 2 Groups: Sober/Drunk Learn in state to asymptotic performance Generalization stimuli Test retention of learning over 0-10 week delay We get a group x delay interaction on perfomanceOur SLP Classification Model:• Input Gaussian vectors• Two output nodes; delta update on connection weights• Luce ratio choice rule determines p-correct givenactivation-- sensitivity parameter b (cf. d’)X1X2X3X4XnAB! oi= xjwi, jj=1Nx"! "wij=#(ti\$ oi)xjSee Classify.f95Our SLP Classification Model:• More realistic decision rule: The probability of chosingcategory A is based on a ratio of strength of the outputactivations• The ratio rule (from Luce, 1959):! p(A | x) =eboAeboA+ eboBOur SLP Classification Model:X1X2X3X4XnAB! oi= xjwi, jj=1Nx"! "wij=#(ti\$ oi)xjSee Classify.f95! p(A | x) =eboAeboA+ eboBPseudocode:Constants: N_In = 100 ! # input nodes N_Out = 2 ! # output nodes Alpha = 0.01 ! Learning rate parameterData Structures: Prototype[N_Out, N_In] Exemplar[N_In] Weight[N_In, N_Out] Output[N_Out]Tools: Di stort(Vector, D, µ, σ) R andom_Vector(D) Classify(Exemplar, Weight, Output) U pdate_W e i ghts(Weight, Error, Exemplar, Alpha)! oi= xjwi, jj=1Nx"! p(A | x) =eboAeboA+ eboB! wij= wij+"(ti# oi)xjPseudocode:Prototype[1, :] = Random_Vector (N_In)Prototype[2, :] = Random_Vector (N_In)FOR i = 1 to N_Train do ! Flip a coin to pick a category (result=x): Exemplar = Dis t ort(Prototype[x, :], 10, 0, 1) ! using z-dist Classi fy(Exemplar, Weight, Output) Error[1] = true_val - Output[1] Error[2] = true_val - Output[2] Upda t e_Weight s (Weight, Error, Exemplar, Alpha)ENDDO! Weights are now clamped:FOR i = 1 to N_Test d o ! Flip a coin to pick a category (result=x): Exemplar = Dis t ort(Prototype[x, :], 10, 0, 1) ! using z-dist Classi fy(Exemplar, Weight, Output)ENDDO! oi= xjwi, jj=1Nx"! p(A | x) =eboAeboA+ eboB! wij= wij+"(ti# oi)xjData:Classification+Retention Model• (Classificention?)• Simple hypothesis: Connection weights decay back tozero as a function of time• Weights immediately after learning: wij• After a delay of t weeks, the connection weights are:• Since• So the ratio rule becomes:! wij(t) ="twij! oi= xjwij", oi(t) = xjwij#t"= oi#t! p A | x(t)[ ]=eboA"teboA"t+ eboB"tClassificention Model• We could rely on the equation or, if we’re unsure,actually implement the decay in the model:1. Train the classification to asymptoticperformance, matching the learning rate of Ss2. At each time delay,3. Test on generalization stimuliDo this while varying the decay and sensitivity parameters.What are the likely parameter values for each group toproduce the data if our assumptions are correct.We no longer care about alpha at test (served purpose) See Retention.f95, and Parameter_Space.f95 for fits to data! p A | x(t)[ ]=eboA"teboA"t+ eboB"tAlternate Models• The null model: One free parameter: the mean Assumes p(correct) is constant across time delays(no effect) Simplest possible model, and provides lower boundfor measuring model fit• The saturated model: Free parameters = data points New free parameter for each data point perfectlyreproduces the data, and sets an upper bound formodel fitHow far above the null model and below the saturatedmodel is our retention model? …need fit measure• The improvement in fit of our cognitive model over thenull indicates the amount of the treatment effectpredicted by the cognitive model• The improvement of the saturated model over thecognitive model indicates the amount of the treatmenteffect left unexplained by the cognitive modelTo determine the performance of our model, we need:1. Observed data2. A cognitive model w/ identifiable parameters3. Statistical upper and lower bound models (nulland saturated)4. An objective function of (mis)fit5. A parameter optimization algorithmObjective Functions:1. Least-squares objective (SSE)2. Weighted least-squares (WSSE)3. Log-likelihood (G2)Objective functions map parameters onto fit indices: for eachcombination of parameter values, the predictions arecomputed, and the fit to the data is measured.Delay Observed Predicted0 0.9505 0.95111 0.9354 0.88892 0.917 0.81173 0.8976 0.73564 0.8754 0.67225 0.8526 0.62336 0.8298 0.58687 0.8058 0.56118 0.7826 0.54299 0.7599 0.5310 0.7388 0.5211! b = 3.0"= 0.7Least Squares:Sum of squared residuals (SSE)SSE = sum[(Obs-Pred)**2]! SSE = ( pt" P(t)[ ]#2Delay Observed Predicted0 0.9505 0.95111 0.9354 0.88892 0.917 0.81173 0.8976 0.73564 0.8754 0.67225 0.8526 0.62336 0.8298 0.58687 0.8058 0.56118 0.7826 0.54299 0.7599 0.5310 0.7388 0.5211Least Squares:Sum of squared residuals (SSE)SSE = sum[(Obs-Pred)**2]! SSE = ( pt" P(t)[ ]#2Delay Observed Predicted sqr(O-P)0 0.9505 0.9511 3.6E-071 0.9354 0.8889 0.002162252 0.917 0.8117 0.011088093 0.8976 0.7356

View Full Document