WSU CSE 6363 - Predictive algorithms in the management of computer systems

Unformatted text preview:

Predictive algorithmsin the managementof computer systemsby R. VilaltaC. V. ApteJ. L. HellersteinS. MaS. M. WeissPredictive algorithms play a crucial role insystems management by alerting the user topotential failures. We report on three casestudies dealing with the prediction of failuresin computer systems: (1) long-term predictionof performance variables (e.g., disk utilization),(2) short-term prediction of abnormal behavior(e.g., threshold violations), and (3) short-termprediction of system events (e.g., routerfailure). Empirical results show that predictivealgorithms can be successfully employed inthe estimation of performance variables andthe prediction of critical events.An important characteristic of an intelligent agentis its ability to learn from previous experience in or-der to predict future events. The mechanization ofthe learning process by computer algorithms has ledto vast amounts of research in the construction ofpredictive algorithms. In this paper, we narrow ourattention to the realm of computer systems; we dem-onstrate how predictive algorithms enable us to an-ticipate the occurrence of events of interest relatedto system failures, such as CPU overload, thresholdviolations, and low response time.Predictive algorithms can play a crucial role in sys-tems management. The ability to predict serviceproblems in computer networks, and to respond tothose warnings by applying corrective actions, bringsmultiple benefits. First, detecting system failures ona few servers can prevent the spread of those fail-ures to the entire network. For example, low re-sponse time on a server may gradually escalate totechnical difficulties on all nodes attempting to com-municate with that server. Second, prediction canbe used to ensure continuous provision of networkservices through the automatic implementation ofcorrective actions. For example, prediction of highCPU demand on a server can initiate a process to bal-ance the CPU load by rerouting new demands to aback-up server.Several types of questions are often raised in the areaof computer systems:●What will be the disk utilization or CPU utilizationnext month (next year)?●What will be the server workload in the next hour(n minutes)?●Can we predict a severe system event (e.g., routerfailure) in the next n minutes?The questions above differ in two main aspects: timehorizon and object of prediction. The former char-acterizes our ability to perform short-term or long-term predictions and has a direct bearing on the kindof corrective actions one can apply. Any action re-quiring human intervention requires at least severalhours, but if actions are automated, minutes or evenseconds may suffice. The latter relates to the out-come of a prediction and can be either a numericvariable (e.g., amount of disk utilization) or a cat-egorical event (e.g., router failure).娀Copyright 2002 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted with-out payment of royalty provided that (1) each reproduction is donewithout alteration and (2) the Journal reference and IBM copy-right notice are included on the first page. The title and abstract,but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based andother information-service systems. Permission to republish anyother portion of this paper must be obtained from the Editor.IBM SYSTEMS JOURNAL, VOL 41, NO 3, 2002 0018-8670/02/$5.00 © 2002 IBM VILALTA ET AL.461Both time horizon and object of prediction are im-portant factors in deciding which predictive algo-rithm to use. In this paper, we present three majorpredictive algorithms addressing the following prob-lems: (1) long-term prediction of performance var-iables (e.g., disk utilization), (2) short-term predic-tion of abnormal behavior (e.g., threshold violations),and (3) short-term prediction of system events (e.g.,router failure). The first problem is solved using aregression-based approach. A salient characteristicof a regression algorithm is the ability to form a piece-wise model of the time series that can capture pat-terns occurring at different points in time. The sec-ond problem employs time-series analysis to predictabnormal behavior (e.g., threshold violations); pre-diction is achieved through a form of hypothesis test-ing. The third problem predicts critical events by us-ing data-mining techniques to search for patternsfrequently occurring before these events.Our goal in this paper is to provide some criteria inthe selection of predictive algorithms. We proceedby matching problem characteristics (e.g., time ho-rizon and object of prediction) with the right pre-dictive algorithm. We use our selection criteria inthree case studies corresponding to the problems de-scribed above.Extensive work has been conducted in the past try-ing to predict computer performance. For example,work is reported in the prediction of network per-formance to support dynamic scheduling,1in the pre-diction of traffic network,2and in the production ofa branch predictor to improve the performance ofa deep pipelined micro-architecture.3Other stud-ies reported in the literature4–7focus on predictingat the instruction level, whereas we focus on predict-ing at the system and event level (e.g., response time,CPU utilization, network node down, etc.). A com-mon approach to performance prediction proceedsanalytically, by relying on specific performance mod-els; one example is in the study of prediction modelsat the source code level, which plays an importantrole for compiler optimization, programming envi-ronments, and debugging tools.8Our view of the pre-diction problem is mainly driven by historical data(i.e., is data-based). Many studies have tried to bridgethe gap between a model-based approach versus adata-based approach.9The rest of the paper is organized as follows. Firstwe provide a general view of prediction algorithmsand describe our approach to selecting an algorithmfor the problem at hand. In the following section wediscuss algorithms for long-term prediction of com-puter performance. Next we discuss an algorithm fordetecting threshold violations of workload demands,and then we describe our approach to the predic-tion of system events. We list our conclusions in thelast section.Prediction in computer networksWe begin by giving a general view of the predictionproblem. We then provide some criteria for select-ing a predictive algorithm to use, based


View Full Document

WSU CSE 6363 - Predictive algorithms in the management of computer systems

Download Predictive algorithms in the management of computer systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Predictive algorithms in the management of computer systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predictive algorithms in the management of computer systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?