DOC PREVIEW
CORNELL CS 404 - Study Notes

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 404: Lab 1: Finding LibrariesGetting Started1. Boot into Linux. To do this, click on “shutdown” and then select“restart.” The computer will begin restarting. You will eventuallyhave the option to select either Win2K or Linux (use up and downarrows to switch, press enter to select).2. Open a web browser and go to the course website1. Download Fpca.tar.3. To unpack the tar-file, type tar xvf Fpca.tar in a terminal window. Cdinto the directory “Fpca.”BLAS and LAPACK1. Fpca contains three FORTRAN files (main.f, subs.f, and system.f) anda Makefile. Open these files in an editor and have a look. The threecode files implement a statistical technique known as “principal com-ponent analysis” (PCA). The goal of PCA (also known as “empiri-cal orthogonal functions” analysis) is to reduce the dimensionality ofthe data set. Specifically, PCA determines from multiple observationsof k variables how the variables are related and assigns each variablea weight. By scaling each variable by its weight and adding themtogether, we can produce a one-dimensional approximation of our k-dimensional data. This approximation, known as the “first principalcomponent” or “leading mode” represents a pattern common in severalof the variables, and hopefully, it explains a large percentage of the totalvariance in the system. Although we’re typically most interested in theleading mode, it is possible to partition the variance among additionalmodes, each representing successively less variance.Although PCA sounds complicated, it is simple to compute using toolsfrom linear algebra. Given an array of data, C with each column repre-senting a variable and each row representing a sample, we first need tocompute the covariance matrix Cov. The covariance matrix is defined1www.cs.cornell.edu/Course/cs404/2002sp1using matrix multiplication:Cov =1m − 1CTCwhere m is the number of observations and the superscript T indicatesthe transpose of the matrix. We then need to compute the eigenvectorsand eigenvalues of Cov. The eigenvectors are the principal components(the weights), and the eigenvalues indicate the amount of variance ex-plained by each component.The PCA algorithm is implemented in the three FORTRAN files: main.fis the main program, subs.f contain subroutines for reading and writ-ing data and producing the covariance matirx, and system.f containssome utility routines. The first thing that main does is asks you fora data file, which is then read in the ReadData routine. The data isstored as a column of numbers. The first number is the number ofsamples (n), the second is the number of variables (m), and the nextn numbers are the first column of data (variable one), followed by thesamples for the remaining n − 1 variables. The data is s tored in thearray C. The covariance matrix is constructed in the routine GetCov.The covariance matrix is then saved to the file Cov.txt. The eigenval-ues/vectors are computed using the LAPACK routine SSYEV (Singleprecision, SYmetric EigenValue). The eigenvalues and the percentageof the variance which they explain are printed to the screen, and theprinicpal components are stored in the file PComp.txt.2. Whew! Don’t worry about the details of PCA–try to focus on the mainproblem: producing Cov and then solving the eigenproblem. Look inGetCov (in subs .f). This routine uses the BLAS-level 3 routine SSYRKwhich computesCov := fac ∗ CTC + 0.0 ∗ CovGo to Netlib and find this routine in the BLAS package. Try to figureout how we are calling it (you don’t need to download it, it is alreadyinstalled). Do we really need to initialize Cov to zero?3. Now, go back in main.f and look at the call to SSYEV. The easiestway to look up a LAPACK routine is with the LAPACK search engine2(go to the LAPACK package in Netlib and click on the search enginelink). We’re interested in driver routines, so click on that link (onthe left). Then click on the “Symmetric Eigenproblems” link. Thiswill bring up a Java applet with several menus. If you select “Real,Single” as the precision, “Simple” as the driver, “With Dependencies,”and ”Symmetric/Hermitian” then SSYEV will be listed in the box onthe upper right. Try changing the precision, does the name change?Restore the settings. LAPACK’s search engine is handy for looking upthe subroutine to call for a specific problem, but to figure out how tocall it, we need to see the code. Click on the “see code” button to viewthe code for SSYEV. The subroutine call and the explanation of theneeded variables is found at the top of all LAPACK routines. What isthe strange array WORK? How big should it be?4. Now you know how the code works, lets get it to compile. Try typingmake in the command prompt. What happens? The problem is thatour source code contains no information on either SSYEV or SSYRK–these are found in the LAPACK and BLAS libraries. As I explained,I’ve built the ATLAS version of BLAS as well as LAPACK and placedthem in my directory (/home/ajp9/cs404/ATLASLinuxPIIISSE1256).UNIX libraries are stored in “archive” files which end with the .a suffixand typically begin with “lib.” Each archive is actually a collection ofrelated object-code files (.o files–obtained by compiling with the -c flag),much as a .tar file can contain several files. What archives are availablein the ATLAS directory? To get our code to run, we need to link tosome of these libraries. This is done using the -l compiler flag: -lFoowould link to libFoo.a. Look at the Makefile. Everything is set, exceptfor the macro LIBS (-Lpath as defined for LIBPTH tells the compilerwhere to look for libraries). We need to link to the LAPACK library(for SSYEV) and the f77 BLAS library (for SSYRK). This suggest thatwe should put “-llapack and -lf77blas” on the LIBS line. However, thisis not quite complete. The routines in libf77blas call routines in theatlas library, so we need to link to that library as well. Note: the orderof the libraries is important. If library A calls routines in library B, theyou must link to B after linking to A. Finish setting up the Makefileand build Fpca.5. Run the program. Use the 5-by-3 sample problem in


View Full Document

CORNELL CS 404 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?