CSUN COMP 106 - Programming Exercise Eight - D2594524

Home> Schools> California State University, Northridge> Computer Science (COMP) > COMP 106> Programming Exercise Eight

DOC PREVIEW

CSUN COMP 106 - Programming Exercise Eight

School name California State University, Northridge

Course Comp 106-

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Test Data and Results for Linear RegressionCollege of Engineering and Computer ScienceComputer Science DepartmentComputer Science 106 Computing in Engineering and ScienceSpring 2006 Class number: 11672 Instructor: Larry CarettoProgramming Exercise Eight (and last)ObjectiveThis assignment provides an example in the use of one-dimensional arrays and introduces the concept of regression analysis, which is used to estimate a relationship between two variables.Mathematical BackgroundIf several measurements are made on pairs of experimental data {(xi,yi), i = 1,...,N}, we can use atechnique, known as regression analysis, to determine an approximate equation of a straight line that gives a best fit to the data. The equation of this best-fit line is written as follows.^= a + b xIn this equation, we use the symbol ^ instead of y to indicate that the predicted value found from the equation, ^ = a + b x is an approximate result. For a given data point, (xi,yi), the value of yi represents the actual data and we would obtain the predicted value of y, at the point x = xi from the equation ^i = a + b xi. The difference between the measured and predicted value is |yi - ^i|.In the chart at the left, the data points are indicated by the small ellipses. The coordinates of one of a typical data point are shown by the dotted lines indicating the coordinates xi and yi. The solid line is the fitted regression line, ^= a + b x. The point where the dotted line at x = xi crosses the regression line has the coordinates (xi,^i). In this particular example the value of^i is less than the value of yi. There is a large scatter of data points about the regression line in this example.The example plot above might represent calibration data on an instrument. The x values would denote the instrument reading and the y values would indicate the true value of the quantity beingmeasured. Once the calibration tests were completed, it would be useful to have a simple equation to relate the instrument reading (x) to the actual quantity being measured(y).In addition to finding the values of a and b that give the best-fit line, we would also like to have some measure of how well the line fits the data. Two different goodness-of-fit measures, the standard error and the coefficient of variation are presented below in the equations section.Jacaranda (Engineering) 3333 Mail Code Phone: 818.677.6448E-mail: [email protected] 8348 Fax: 818.677.7062Equations usedThe equations used to calculate a and b can be found by an analysis which minimizes the distances between the actual data points, yi, and the fitted points, ^i = a + b xi. The results of this analysis are shown below. The equations to compute the intercept, a, and the slope, b, in terms of the entire set of data, {xi,yi}, use the following the definitions of mean values:NiiNiixNxandyNy1111With these definitions, the slope, b, and the intercept, a, are found as follows.xbyaandxNxyxNyxbNiiNiii2121)())((A statistical estimate of the variability can be found from the difference between the actual data points yi and the estimated value ^i = a + b xi. This measure, which is called the standard error and has the symbol sy|x, is defined as follows:sy|x = 2)ˆ(12NyyNiiiAnother measure, called the R2 value or the coefficient of variation is considered to be a measureof the amount of variation in the data which is explained by the regression equation. An R2 value of zero means that the regression cannot explain any of the variation in y; an R2 value of one means that all the variation in y can be explained by the regression equation. The value of R2 is computed from the following equation: 2122|2)2(1yNysNRNiixyTask OneYou can use a previously written program for this task. Download the program file from the exercise page on the course web site. Review that program and see how the various functions are used to enter array data and do calculations with array data in loops. Note that the program determines the number of data points (N in the equations above) by reading the data. The user isnot required to count the data and input a value for N. The program has summary output to the screen and detailed output of a, b, sy|x, R2, and a table of xi, yi, and ŷi.Prepare a data file for the test case below. Review the input statements to see how you should prepare this file. Run the program with your test data file to make sure you are using the programcorrectly by matching the results below.Jacaranda (Engineering) 3333 Mail Code Phone: 818.677.6448E-mail: [email protected] 8348 Fax: 818.677.7062Test Data and Results for Linear Regressionxi510 533 603 670 750yi1.3 0.1 1.5 1.8 3.9Results: a = -5.77566; b = 0.0122238; R2 = 0.768457Copy the output file from the test data set in the table above to your submission file. Do not copy the code or the full output file from the downloaded data set to the submission file.Task TwoDownload the data file for this exercise from the course web site. This data file has several pairs of (xi, yi) data points. In this task you will obtain some overall statistics (1) for the entire data file, (2) for the (xi, yi) data points in which xi ≥ 1000, and (3) for the (xi, yi) data points in which xi < 1000.You can use some of the code from task one for this task. You do not have to keep the same function structure used for task one. However, you should be able to use the function that reads data from an input data file with no changes.The program you write for this task should compute and print out the results listed below for the data in the data file that you download:- The count, mean value and standard deviation of all xi data.- The count, mean value and standard deviation of all yi data.- The maximum and minimum values of xi and yi for the full set of data.- The count, mean value and standard deviation of the subset of xi data for which xi ≥ 1000.- The count, mean value and standard deviation of the subset of yi data for which the corresponding value of xi ≥ 1000.- The maximum and minimum values of xi and yi for the subset of data in which the value of xi ≥ 1000.- The count, mean value and standard deviation of the set of xi data for which xi < 1000.- The count, mean value and standard deviation of the set of yi data for which the corresponding value of xi < 1000.- The maximum and minimum values of xi and yi for the set of data in which the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

CSUN COMP 106 - Programming Exercise Eight

Sign up for free to view:

Please select your school