DOC PREVIEW
Berkeley STAT 133 - Batch Jobs Garbage Collection Memory Management Debugging

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Advanced Topics in R ProgrammingBatch JobsGarbage CollectionMemory ManagementDebuggingDuncan Temple [email protected] the 2 lectures I will present, we’ll try to cover:General questions (R, ad hoc networks, programming, etc.)Batch & Background jobs.Garbage collection.Managing memory.Debugging.Recursive functions.Notes at http://eeyore.ucdavis.edu/stat133/22“Batch” JobsUsually run R commands interactively.But if they take a long time, you want to leave them and come back when they are finished.Can lock the screen - BADInstead, use a batch or background job using the shellImportant part of Scientific Computing.33Batch JobsPut the R commands into a file, say code.R.Run R reading commands from that fileput output into another fileR --no-save < code.R >& output.Rout --no-save just tells R not to bother saving the work space when it finishesOther possible options are --vanilla, --save, --no-environ, etc. See documentation for R shell command, ?Startup44R --no-save < myCode.R >& output.RoutWhat does the < mean?The shell “redirects input” to R using the contents of the file myCode.RVery similar to typing the lines one at a time at the R promptNot quite the same as source(”myCode.R”), but close.55R --vanilla < myCode.R >& output.RoutThe >& means “redirect both output and errors” to the file output.RoutIf we just had > output.Routthe errors would go to the console/terminal.The >& is specific to the C shell (csh/tcsh)For the Bourne shell, bash/sh use R --no-save < myCode.R 2>1 > output.Rout66Background JobsWe still had to wait for the R --no-save < myCode.R ...command to finish before we start new command (in that terminal)If we logout, the process will terminate!We want to get a new prompt so we can do other things, including logging out.77nohup nice +18 R --no-save < myCode.R >& output.Rout &The second & tells the shell to put this process in the background and return the a new prompt.No connection to the >&. nice says “schedule my job when others aren’t using the computer”.+18 is the maximum amount of nicenessPrefix command with nohup - no hangup. On many machines this is not needed, but it never hurts and guarantees the job keeps running when you logout.88Things to RememberCan logout and return later to see if the job is finished.First, remember which machine you used.Often people check on the wrong machineHas the task finished?If you arrange for your code to generate output at different points, you can look at the output file and look for those markers, e.g. print something at the end of iteration of a loop.To look at the file, cat output.Rout or tail -f output.Rout99General Job MonitoringEach job or “process” has a unique identifier - a number.kestrel>/app/bin/R --no-save < long.R >& out &[1] 19766The 19766 is the process identifier.Use the commands top and ps find status of machine, and job.Use kill to force a job to finish kill -9 197661010More to rememberWhen creating plots, explicitly open graphics devices and close them.pdf(”myPlot.pdf”)hist(x)dev.off()This avoids them going to one big file, on different pages.1111And moreIf your job stops unexpectedly, you will have to start again from the beginning.Sometimes useful to save results as you go along, i.e. at different stages/parts of the script.save(a, b, c, file = “myFile.rda”)Then you can come back and reload them and continue on from that point or do additional computations.1212DebuggingIf you get an error in your script, the job will stop and there will be a message in output.Rout.Hopefully the message will make it clear how to fix the problem.Often we need to examine the state of the session to figure out why things failed.So we need to be able interactively explore the values of the different variables13Post-morten DebuggingFirst of all, test code on smaller datasets.But if it does happen in a batch job, we don’t have interactive access!!Can’t use options(error = recover)Do “post-mortem” debugging (see ?debugger)At start of script (myCode.R), put options(error=quote({dump.frames(to.file=TRUE); q()}))Then, after the error can explore in new R sessionload(“last.dump.rda”)debugger(last.dump)14DebuggingThis debugger is basically the same as the one used in interactive use e.g. with options(error = recover)Jump to different calls, find out what variables are available, print values, do computations.Debugging is an art. Get experience.Think about probable causes and then try to construct experiments to verify that is the reason.1515What is Garbage Collection?Notice that in R, when you create data you don’t have to explicitly declare or allocate it.And you don’t have to release it.e.g. x = 2*x + 10 + rnorm(length(x))the rnorm()s are created added to the other components and then discarded. Same for original x.Garbage collection is the process of reclaiming the memory that is associated with objects and computations that are no longer being used - garbage.1616When R needs memory to do a computation, it asks its memory manager for space. The memory manager has already allocated a lot of space that it doles out, and so it can provide space for such requests.If the memory manager doesn’t have enough space for the request, then it tries to cleanup - garbage collect.It runs through all the spaces that it has given out in earlier requests and reclaims it if it is no longer being used.If the Mem. Mgr. still needs more space, it can grow its pool.1717Preallocate Space for the Result Reiterating what Deb covered last time.Consider the following codeans = numeric() for(i in 1:n) ans = cbind(ans, foo(i))In each step, we combine the new result with the previous ones via cbind.1818Consider the last iteration, i.e. i == nThe result from the previous iteration is a matrix with n-1 columns.We then create a new result with n columns.So before we assign the new result to ans, we have approximately 2 copies of the results!And we have to copy all the data from the original to the new result.This is bad news.Some computations will not be feasible.1919AlternativeWe know the result is a matrix of size m x n,so allocate it first and then assign each iteration’s result into the corresponding column.ans = matrix(NA, m, n)for(i in 1:n) ans[, i] = foo(i)This does the allocation (for the result) just once and doesn’t create new objects, just modifies the existing one.The key thing is that ans[, i] doesn’t create a new copy of ans, but writes the values into the


View Full Document

Berkeley STAT 133 - Batch Jobs Garbage Collection Memory Management Debugging

Download Batch Jobs Garbage Collection Memory Management Debugging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Batch Jobs Garbage Collection Memory Management Debugging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Batch Jobs Garbage Collection Memory Management Debugging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?