CSUN COMP 106 - Multivariate Regression - D1582906

Home> Schools> California State University, Northridge> Computer Science (COMP) > COMP 106> Multivariate Regression

DOC PREVIEW

CSUN COMP 106 - Multivariate Regression

School name California State University, Northridge

Course Comp 106-

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Third Programming Project – Multivariate RegressionObjectiveThis project illustrates the use of two-dimensional arrays and extends the linear regression example of exercise eight to the use of multivariate regression. It also shows a case of using two separate source files for a project and the passing of two-dimensional arrays from a calling program into a function.Background for the projectIn exercise eight, task one, you used a program for fitting a straight line to two experimental variables. In this project, you will be asked to program the equations that are used when several data items are used to predict one data item. For example we might have measurements on emissions from diesel engines as a function of three fuel properties, cetane number, aromatic content, and density. We would like to predict how the emissions performance from other fuels would vary as a function of the other three variables, based on these measurements.In this case we call emissions the response variable and cetane, aromatic content and density are called predictive variables. We want to use the measured data on these four variables to develop a model that will relate emissions to the predictive variables. The simplest model to do this is a linear model such as the one shown below. In order to use this model, we have to find the coefficients b0, b1, b2, and b3 in the equation:emissions = b0 + b1 (cetane) + b2 (aromatic content) + b3 (density)The above example uses three predictive variables; these could be labeled x1 = cetane, x2 = aromatic content and x3 = density. We are trying to predict a fourth variable, emissions. In general we can have K predictive variables, where K will change from problem to problem. (In the example above, K = 3.)We need a notation that will readily accommodate the ability to code a different number of predictive variables. In general, we label the predictive variables as x1 to xK. This is an extension of the labels used above (x1 = cetane, x2 = aromatic content and x3 = density.) The variable that we are trying to predict is labeled y. In the example above y is called emissions. In this general case where there are K different variables used to predict y the regression equation can be written as followsKjjjxbby10We will have several sets of input data. Each input data set will consist of one value of y and one value for each of the predictive variables, xj. In the example of predicting the emissions, each data set would consist of one value for each of the following variables: the emissions, the cetane number, the aromatic content and the density. See the sample data set on page 3 for an example of the required data sets for multivariate regression.Jacaranda (Engineering) 3333 Mail Code Phone: 818.677.6448E-mail: [email protected] 8348 Fax: 818.677.7062College of Engineering and Computer ScienceComputer Science DepartmentComputer Science 106 Computing in Engineering and ScienceSpring 2006 Class number: 11672 Instructor: Larry CarettoWe use the following notation for the variables that are associated with the mth data set. The value of the variable to be predicted, for data set m, is labeled ym. This mth data set will have one value for each of the K predictive variables. We label the values of the individual predictive variables in this data set as (x1m, x2m,..., xKm). The notation xim means the value of the variable xi for the mth data set. In our emissions example, where x2 is the notation for the aromatic content, the symbol x24 represents the aromatic content of the data set number four. If we have N sets of data we will have N(K+1) different numerical values in the input data set. These are used to determine the values of the K+1 coefficients b0 to bK. Of course the initial input data set is referred to as data set zero and the last of the N data sets is referred to as data set N-1 to use the conventional notation for C++ arrays.The coefficients b0 to bK are determined by solving a set of K+1 simultaneous linear equations. The unknowns in those equations are the K+1 coefficients, b0 to bK. The linear equation coefficients are denoted as Aij and the right hand sides of the linear equations are given the symbol ci. With this notation the system of linear equations is written as follows:KicbAiKjjij,,00[1]The coefficients, Aij, and the right-hand sides, ci, are found from various sums of the input data as shown below. These equations assume that a fictitious variable, x0, is defined such the value of x0 for each data set, m, is one.X0m = 1 m = 0, …, N-1 [2]With this definition the equations for computing Aij and ci are written as follows:Aij = 10Nmjmimxx and ci = 10Nmmimyx[3]Note that the coefficients Aij are symmetrical; that is Aij = Aji. You can use this symmetry relation to reduce the number of coefficients that need to be computed.Requirements for this projectWrite a program that can read an arbitrary number of data sets of {yj, xij,i = 1, K} from a file and compute the coefficients {Aij, i = 0,...K, j = 0,...,K} and {ci, i = 0,...K}.** Use equation [3] to compute these coefficients. You will be provided with a library program that accepts a two-dimensional arrayfor the Aij coefficients and a one-dimensional array for the ci coefficients on the right-hand-side of the system of linear equations shown in equation [1]. The library program then computes and returns the solutions, bj, to the set of simultaneous linear equations shown in equation [1].Once your program has obtained the coefficients {b0 … bK} from the library program, your should compute an estimated value of y for each data set. This will be found from the following equation. The equations for calculating b0 are derived in the same way as the equations for the case of two variables; in the multivariate case, however, the result is more complicated. You can show that the two-variable result follows from the general result shown above by applying the multivariate equations to the single variable case of K = 1.** See instructions below for obtaining and reading data from this file.Jacaranda (Engineering) 3333 Mail Code Phone: 818.677.6448E-mail: [email protected] 8348 Fax: 818.677.7062Kjjijixbby10ˆ[4]Note the difference between this equation and the regression equation at the bottom of page 1. The equation on page 1 describes a general relationship between the response variable,

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

CSUN COMP 106 - Multivariate Regression

Sign up for free to view:

Please select your school