Incorporating Domain-Specific Information into the Compilation ProcessMotivationFind the error – part 1Find the error – part 2Find the error – part 3ProblemSolutionThe Broadway CompilerBenefitsOutlineSecurity vulnerabilitiesRemote access vulnerabilityChallenge 1: PointersChallenge 2: ScopeChallenge 3: PrecisionInsufficient precisionCost versus precisionClient-Driven AlgorithmAlgorithm componentsSources of imprecisionIn action...MethodologyProgramsError detection problemsResultsOverall resultsWhy does it work?Slide 28Central contributionSpecific contributionsRelated workFuture workSlide 33Annotations (I)Annotations (II)Annotations (III)Annotations (IV)Slide 38TimeValidationType TheoryGeneratorsIs it correct?Annotation correctnessError Checking vs OptimizationComplexityOptimizationPLAPACK OptimizationsResultsSlide 50End backup slides1Incorporating Domain-Specific Information into the Compilation ProcessSamuel Z. GuyerSupervisor: Calvin LinApril 14, 20032MotivationTwo different views of software:Compiler’s viewAbstractions: numbers, pointers, loopsOperators: +, -, *, ->, []Programmer’s viewAbstractions: files, matrices, locks, graphicsOperators: read, factor, lock, draw This discrepancy is a problem...3Find the error – part 1Example:Error: case outside of switch statementPart of the language definitionError reported at compile timeCompiler indicates the location and nature of errorswitch (var_83) {case 0: func_24(); break;case 1: func_29(); break;}case 2: func_78();!4Find the error – part 2Example:Improper call to libfunc_38Syntax is correct – no compiler messageFails at run-timeProblem: what does libfunc_38 do? This is how compilers view reusablesstruct __sue_23 * var_72;char var_81[100];var_72 = libfunc_84(__str_14, __str_65);libfunc_44(var_72);libfunc_38(var_81, 100, 1, var_72);!5Find the error – part 3Example:Improper call to fread() after fclose()The names reveal the mistake No traditional compiler reports this errorRun-time system: how does the code fail?Code review: rarely this easy to spotFILE * my_file;char buffer[100];my_file = fopen(“my_data”, “r”);fclose(my_file);fread(buffer, 100, 1, my_file);!6ProblemCompilers are unaware of library semanticsLibrary calls have no special meaningThe compiler cannot provide any assistanceBurden is on the programmer:Use library routines correctlyUse library routines efficiently and effectively These are difficult manual tasksTedious and error-proneCan require considerable expertise7SolutionA library-level compilerCompiler support for software librariesTreat library routines more like built-in operatorsCompile at the library interface levelCheck programs for library-level errorsImprove performance with library-level optimizations Key: Libraries represent domainsCapture domain-specific semantics and expertiseEncode in a form that the compiler can use8The Broadway CompilerBroadway – source-to-source C compiler Domain-independent compiler mechanismsAnnotations – lightweight specification language Domain-specific analyses and transformations Many libraries, one compilerApplicationSource codeLibraryAnnotationsHeader filesSource codeBroadwayAnalyzerOptimizerError reportsLibrary-specific messagesApplication+LibraryIntegrated source code9BenefitsImproves capabilities of the compilerAdds many new error checks and optimizationsQualitatively differentWorks with existing systemsDomain-specific compilation without recodingFor us: more thorough and convincing validationImprove productivityLess time spent on manual tasksAll users benefit from one set of annotations10OutlineMotivationThe Broadway CompilerRecent work on scalable program analysisProblem: Error checking demands powerful analysisSolution: Client-driven analysis algorithmExample: Detecting security vulnerabilitiesContributionsRelated workConclusions and future work11Security vulnerabilitiesHow does remote hacking work?Most are not direct attacks (e.g., cracking passwords)Idea: trick a program into unintended behaviorAutomated vulnerability detection:How do we define “intended”?Difficult to formalize and check application logic Libraries control all critical system servicesCommunication, file access, process controlAnalyze routines to approximate vulnerability12Remote access vulnerabilityExample:Vulnerability: executes any remote commandWhat if this program runs as root?Clearly domain-specific: sockets, processes, etc.Requirement:Why is detecting this vulnerability hard?int sock;char buffer[100];sock = socket(AF_INET, SOCK_STREAM, 0);read(sock, buffer, 100);execl(buffer);Data from an Internet socket should not specify a program to execute!13Challenge 1: PointersExample:Still contains a vulnerabilityOnly one bufferVariables buffer and ref are aliases We need an accurate model of memoryint sock;char buffer[100];char * ref = buffer;sock = socket(AF_INET, SOCK_STREAM, 0);read(sock, buffer, 100);execl(ref);!14Challenge 2: ScopeCall graph:Objects flow throughout programNo scoping constraintsObjects referenced through pointers We need whole-program analysismainreadsocketsock = (AF_INET, SOCK_STREAM, 0); (sock, buffer, 100); (ref);execl!15Challenge 3: PrecisionStatic analysis is always an approximationPrecision: level of detail or sensitivityMultiple calls to a procedureContext-sensitive: analyze each call separatelyContext-insensitive: merge information from all callsMultiple assignments to a variableFlow-sensitive: record each value separatelyFlow-insensitive: merge values from all assignments Lower precision reduces the cost of analysis Exponential polynomial ~linear16Insufficient precisionExample:Context-insensitivityInformation merged at callAnalyzer reports 2 possible errorsOnly 1 real error Imprecision leads to false positives!mainsocketexeclexeclreadstdin??17Cost versus precisionProblem: A tradeoffPrecise analysis prohibitively expensiveCheap analysis too many false positivesIdea: Mixed precision analysisFocus effort on the parts of the program that matterDon’t waste time over-analyzing the rest Key: Let error detection problem drive precision Client-driven program
View Full Document