DOC PREVIEW
CMU CS 15745 - Regression-Based Multi-Model Prediction of Data Reuse Signature

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Regression-Based Multi-Model Prediction ofData Reuse SignatureXipeng Shen Yutao Zhong Chen DingComputer Science Department, University of Rochester{xshen,ytzhong,cding}@cs.rochester.eduAbstractAs a locality metric, the distance of data reuses has been used in designing compiler, architecture, and file systems.Recently, Ding and Zhong described a method that predicts reuse distance histograms across all inputs of a program.In this paper we extend their method in two ways. First, we consider more than two training inputs using regressionanalysis. Second, we use a method called multi-model prediction to overcome the limitation due to small training inputsor coarse-grain data collection. Compared to Ding and Zhong’s method, the new locality prediction can reduce about halfof the prediction error, remove 95% of space cost, and use much smaller inputs and faster data collection in training.1 IntroductionAs the speed gap between CPU and main memory widens, memory hierarchy has become increasingly important in de-termining system performance, cost, and energy consumption. Cache performance depends on data locality in programs.A measure of locality is LRU stack distance or reuse distance, which is the number of distinct elements accessed betweentwo consecutive uses of the same datum [17, 11]. Reuse distance information has been used in managing various layersof the memory system including register allocation [15], cache optimization [2, 5, 6, 22], server caching [24], and filecaching [14]. Recently, Ding and Zhong shows that the reuse distance histogram, or reuse signature, of many programshas a consistent pattern across different inputs [11]. The pattern is a parameterized formula that for a given program input,it predicts the reuse signature for the corresponding execution. The pattern summarizes program locality and may havemany uses. Zhong et al. showed that a precise signature pattern can accurately predict cache miss rates across all programinputs [23].The pattern analysis method given by Ding and Zhong has two important limitations. First, it uses only two trainingruns and therefore may be misled by noises from specific executions. Second, the accuracy is limited by the precision ofdata collection. Accurate prediction requires using large size program inputs and fine-grained reuse distance histograms.The space and time cost of the analysis is consequently high, which makes the analysis slower and prohibits simultaneousanalysis of different patterns, for example, patterns of individual data elements.This paper presents a new set of techniques that overcome these limitations in two ways. First, we use regression toextract signature pattern from more than two training runs. Second, we employ multiple models (defined later). Next wedescribe the basic prediction method and the main factors affecting its accuracy.1.1 Basic Prediction MethodGiven an input of a program, we measure the locality of the execution by the histogram of the distance of all data reuses.We call it interchangeably as reuse distance histogram or reuse signature of the program execution (see Section 2 forformal definitions). The prediction method by Ding and Zhong uses a training step to construct a pattern by running twodifferent inputs of a program. Let s and ˆs be the sizes of the two input data. For each of the reuse distance histogram,the analysis forms 1000 groups by assigning 0.1% of all memory accesses to each group, starting from the shortest reusedistance to the largest. We denote the two sets of 1000 groups as hg1, g2, · · · , g1000i and hˆg1, ˆg2, · · · , ˆg1000i and denote theaverage reuse distances of giand ˆgiby rdiandˆrdirespectively (i = 1, 2, · · · , 1000.) Based on rdiandˆrdi, the analysisclassifies group i as a constant, linear, or sub-linear pattern. Group i has a constant pattern if its average reuse distancestays the same in the two runs, i.e. rdi=ˆrdi. Group i has a linear pattern if the average distance changes linearly with the1change in program input size, i.e.rdiˆrdi= c + ksˆs, where c and k are both constant parameters. Ding and Zhong measuredthe size of input data through distance-based sampling [11]. We use the same sampling method in this work.After the training step, the reuse signature for another input can be predicted by calculating the new distance for eachgroup according to its pattern. Interested reader can find a more detail discussion of this process in Ding and Zhong’spaper [11]. Figure 1 shows the flow diagram of their prediction method. We will explain the different types of histogramsin Section 2.Sample Size 1Sample Size 2RF−Histogram 1RF−Histogram 2 PatternsNew SampSizeNew RD−HistogramNew RF−HistogramRD−Histogram 1RD−Histogram 2Figure 1: The flow diagram of Ding and Zhong’s prediction method, which uses only two training inputs. A RD-Histogramis a reuse-distance histogram, and a RF-Histogram is a reference histogram. Sample size is the estimated input data sizeby sampling.Note that not all programs have a consistent pattern, and not all patterns are predictable. However, Ding and Zhongshowed that their method can find predictable patterns in a wide range of large, complex programs. The goal of this workis to improve the analysis accuracy and efficiency for programs that have a predictable pattern.1.2 Factors Affecting Prediction AccuracyThree factors strongly affect the prediction accuracy: the number of training inputs, the precision of data collection, andthe complexity of patterns. The number of training inputs needs to be at least two, although using more inputs may allowmore precise recognition of common patterns. The precision of data collection is determined by the number of groups.Since each group is represented by its average reuse distance, the more groups the analysis uses, the more precise thereuse distance information is. However, using more groups leads to slower pattern recognition and prediction since thespace and time costs are proportional to the number of groups. The third factor is the complexity of patterns in each group.If we assume that the entire group has a single pattern, the analysis is a single-model prediction. If we assume that thegroup may consist of different subgroups that each may have a different pattern, the analysis is a multi-model prediction.Single-model prediction has two limitations. First, the accuracy of the prediction is strictly limited by the precisionof data collection, i.e. the


View Full Document

CMU CS 15745 - Regression-Based Multi-Model Prediction of Data Reuse Signature

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Regression-Based Multi-Model Prediction of Data Reuse Signature
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regression-Based Multi-Model Prediction of Data Reuse Signature and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression-Based Multi-Model Prediction of Data Reuse Signature 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?