UT EE 382C - Node Prefetch Prediction in Dataflow Graphs - D2332307

Home> Schools> University of Texas at Austin> Electrical Computer (EE) > EE 382C> Node Prefetch Prediction in Dataflow Graphs

DOC PREVIEW

UT EE 382C - Node Prefetch Prediction in Dataflow Graphs

School name University of Texas at Austin

Course Ee 382c- Topics in Computer Engineering

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

AbstractINTRODUCTIONRELATED WORKDynamically Schedulable ModelsBranch PredictionREFERENCESNode Prefetch Prediction in Dataflow Graphs Newton G. Petersen Martin R. Wojcik The Department of Electrical and Computer Engineering – The University of Texas at Austin [email protected] [email protected] EE382C: Embedded Software Systems – Final Report May 8, 2002 Abstract Dataflow languages provide a high-level description that can expose inherent parallelism in many applications. This high level description can be applied to automatically create efficient code and schedules based on patterns in the dataflow graphs and knowledge of the target architecture. When targeting a dataflow graph to custom hardware, it is sometimes advantageous to share nodes with similar functionality to save silicon. Any state information associated with the caller of the shared node must be stored and subsequently loaded upon firing. If prediction logic can predict which caller of a shared node is next, the associated state information can be prefetched while other nodes of the graph are executing. While some applications can be entirely scheduled at compile time, many multi-channel measurement and control applications require some degree of dynamic scheduling. Prefetching in statically scheduled dataflow graphs can be done at compile time, while prefetching in dynamically scheduled dataflow graphs requires dynamic prediction.INTRODUCTION Many measurement and control applications attempt to accomplish similar tasks on multiple channels. When targeting systems to custom hardware, sharing functionality between multiple channels can save silicon. Many times these functional units contain state information that must be loaded from memory before execution, and prefetching this state information can enhance system performance. The sequence of callers can be determined at compile time if the application is statically schedulable. If the sequence is dynamic in nature, a prediction mechanism is necessary to support prefetching. Our main objective is to explore the feasibility of caller prediction as a method for predicting the next caller of a shared functional unit executing in a parallel processing system. Dataflow graphs are good at describing parallelism, and thus are a natural fit for modeling systems where the caller prediction mechanism may be beneficial. We examine related work in classifying dataflow graphs, and build on the concept of branch prediction in pipelined processors to implement our prefetch prediction unit for dynamically scheduled dataflow graphs. RELATED WORK Statically schedulable models of computation do not need prediction because all scheduling decisions are made at compile time; however, they can still take advantage of prefetching state data. Synchronous dataflow (SDF), computation graphs, and cyclo-static dataflow (CSDF) are all powerful models of computation for applications where the schedule is known at compile time [3]. For a valid schedule, it is possible to speed-up the process by simply pre-loading actors and their respective internal state data. Figure 1 shows the prefetch nodes 1explicitly in the diagram; however, the prefetch nodes could be added implicitly when targeting hardware capable of taking advantage of the parallelism exposed in the dataflow graph. Prefetch State D1 Prefetch State D2D F EDC B A Figure 1: Homogeneous SDF graph with prefetch of state information for the shared node D The idea of prefetching data is not new. Cahoon and McKinley have researched extracting dataflow dependencies from Java applications for the purpose of prefetching state data associated with nodes [2]. Wang et al., when exploring how to best schedule loops expressed as dataflow graphs, also try to schedule the prefetching of data needed for loop iterations [5]. Dynamically Schedulable Models While statically schedulable models of computation are common in signal and image processing applications and make efficient scheduling easier, the range of applications is restrictive because runtime scheduling is not allowed. Dynamically schedulable models of computation, such as Boolean dataflow, dynamic dataflow, and process networks, allow runtime decisions, but in the process make static prefetch difficult, if not impossible. G, the LabVIEW graphical programming language, requires dynamic scheduling. According to Andrade et al., the G model of computation is homogeneous, dynamically scheduled, and multidimensional [1]. Local variables, global variables, and other features of G make it possible to implement general process networks. The dynamic, yet structured nature of G makes certain subsets well suited to the exploration of prefetch prediction for shared G nodes. We used G as our prototyping environment for practical reasons. 2Branch Prediction Two-level branch prediction was pioneered by Patt and Yeh to help keep processor pipelines full for a greater percentage of the time [7]. This prediction model uses a lookup table of saturating two-bit counters that represent the likelihood that a branch will be taken given a certain history. As illustrated in Figure 2, the history register consists of data indicating if a branch was taken, or not taken, the past n times. A ‘1’ represents a taken branch, and a ‘0’ represents a branch not taken. The table therefore has 2n entries. Lookup Past History 10110 10 Saturating 2-bit counters Figure 2: Two-level branch prediction A ‘1’ in the counter MSB predicts that a branch will be taken, while a ‘0’ predicts that the branch will not be taken. This approach achieves a prediction success rate in the 90% range [4]. Patt and Yeh have tabulated the hardware costs to be significant for large history lengths [6]. IMPLEMENTATION Our initial pass at a design of our prediction mechanism borrowed from the two-level branch prediction mechanism. Figure 3 shows the overall block diagram architecture. The past call history shift register keeps track of the last n calls to a shared node. The values held in the shift register are wired directly to a hash function that converts the history entry to a row lookup index into the call history table. The call history table has a column for each possible caller of the shared node. 3Table Lookup Table Update Actual Caller Prediction Greatest Comparison Call History Table 01 …1100x-1 …1 0 Update Logic n-1… 1 0 Hash Function For LookupPast Call History Register

View Full Document