DOC PREVIEW
Instruction Generation and Regularity Extraction

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Instruction Generation and Regularity Extraction For Reconfigurable Processors Philip Brisk Adam Kaplan Ryan Kastner Majid Sarrafzadeh Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 {philip, kaplan, kastner, majid}@cs.ucla.edu ABSTRACT The increasing demand for complex and specialized embedded hardware must be met by processors which are optimized for performance, yet are also extremely flexible. In our work, we explore the tradeoff between flexibility and performance in the domain of reconfigurable processor design. Specifically, we seek to identify regularly occurring, computation-heavy patterns in an application or set of applications. These patterns become candidates for hard-logic implementation, potentially embedded in the flexible reconfigurable fabric as special optimized instructions. In this work we present an extension to previous work in instruction generation: an algorithm that identifies parallel templates. We discuss the advantages of parallel templates, and prove the correctness of our algorithm. We introduce an All-Pairs Common Slack Graph (APCSG) as an effective tool for parallel template generation. Finally, we demonstrate the effectiveness of our algorithm on several applications’ dataflow graphs, reducing latency on average by 51.98%, without unreasonably increasing chip area. Category 1. Compilers and Operating Systems General Terms Algorithms, Performance, Theory Keywords Hardware Compiler, Template, Slack, Control Data-flow Graph 1. INTRODUCTION The complexity of modern computing systems, combined with the creation of faster, smaller circuitry, has fueled the genesis of extremely small and powerful embedded systems. With the increasing design trend toward smaller and more intimate computing systems comes a need for specialization, as each of these smaller systems is more limited in functionality than a general-purpose processor. Embedded processors need not perform the same set of operations as their larger counterparts, and yet the limited number of computations they perform must be optimized for power, area, and also computational speed. Frequently, ASICs are employed in the creation of such devices, due to their traditional ability to meet these demands. Yet, ASICs are extremely inflexible; once the device is fabricated, the functionality can never be changed. Thus, as design standards and protocols evolve, embedded ASICs must be thrown away, and new ones built to meet the new demand. This is both wasteful of good circuitry and also extremely challenging to the designers of these chips. These devices should be able to satisfy the rigid performance requirements that ASICs have classically met. Reconfigurable devices (built upon programmable logic device technology) contain logic that can be quickly and completely re-programmed. Thus, due to their inherent flexibility, reconfigurable chips have been proposed as the design platforms of specialized embedded processors. Unfortunately, there exists a tradeoff between the flexibility and performance of a system. We desire the ability to customize a system towards the special set of tasks that it needs to perform, imbued with the ability to evolve. We believe that, given the subset of functions that such a system would need to execute frequently, it is possible to implement some of these instructions as hard (or fixed) logic blocks. Such blocks would perform better than the softer, reconfigurable parts of the device. If well chosen, they could enhance the performance of the system, providing the benefits of frequent hard-logic execution, while simultaneously affording flexibility to other (possibly subsuming) execution components. With the goal of extracting instruction regularity, and consequently generating more complex instruction candidates for hard logic implementation, we propose a scheme whereby an application or set of representative applications would be sent to a compiler. Using the compiler’s intermediate representation (IR) of a program, we can determine a candidate set of templates (instructions) to optimize. In Section 2, we will further expound upon template generation as an idea, and discuss the control dataflow graph as a compiler IR. In Section 3, we will explain a previous approach to sequential template generation, including some of its shortcomings. In Section 4, we present our Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CASES 2002, October 8-11, 2002, Grenoble, France. Copyright 2002 ACM 1-58113-575-0/02/0010…$5.00.algorithmic extension to template generation. In Section 5, we discuss our implementation as well as some preliminary empirical results of our instruction generation algorithm. In Section 6, we present some related work. Finally, in Section 7, we conclude with some future direction for this work. 2. TEMPLATE GENERATION Templates are repeated occurrences of possibly interdependent nodes and edges in a dataflow graph (DFG). Each node in a DFG represents some simple operation, such as ADD or LOAD. Templates can be thought of as vertex clusters, or super-nodes, which represent instructions at the architecture level. In this paper, we define two different types of templates. Sequential templates correspond to nodes connected by directed edges in the DFG (i.e. nodes which represent a sequential execution operation). Parallel templates correspond to nodes whose operations can be scheduled for simultaneous execution. Sequential templates contain direct data dependencies (as they follow DFG edges) and thus cannot directly increase inherent parallelism in the execution. Parallel templates have no data dependencies between them, but may restrict the scheduling mechanism at lower synthesis stages. Template generation attempts to extract regularities of simple mathematical operations on data flow graphs (DFGs). DFGs define interdependent sequences of ALU operations that occur between branching statements. DFGs are an integral piece of an intermediate compiler format known as control data flow graphs (CDFGs). CDFGs are a commonly accepted form of


Instruction Generation and Regularity Extraction

Download Instruction Generation and Regularity Extraction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction Generation and Regularity Extraction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Generation and Regularity Extraction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?