Princeton ELE 572 - Hardware/Software Instruction - D2770022

Home> Schools> Princeton University> Electrical Engineering (ELE) > ELE 572> Hardware/Software Instruction

Princeton ELE 572 - Hardware/Software Instruction

Course Ele 572- Processor Architectures for New Paradigms

Pages 5

Download Save

Unformatted text preview:

Hardware/Software Instruction Set Configurability for System-on-Chip Processors Albert Wang [email protected] Killian [email protected] Maydan [email protected] Chris Rowen [email protected] Tensilica, Inc. 3255-6 Scott Blvd. Santa Clara, CA 95054 +1 408 986 8000 ABSTRACT New application-focused system-on-chip platforms motivate new application-specific processors. Configurable and extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environment – compilers, debuggers, simulators and real-time operating systems – satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk. This paper describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description. It also describes two groups of benchmarks, EEMBC’s Consumer and Telecommunications suites, that show 20 to 40 times acceleration of a broad set of algorithms through application-specific instruction set extension, relative to high performance RISC processors. 1. WHY CONFIGURE PROCESSORS? Two major shifts – one technical, one economic – are changing the design of electronic systems. First, continuing growth in silicon chip capability is rapidly reducing the number of chips in a typical system, and magnifying the size, performance and power benefits of system-on-chip integration. Second, many of the fastest-growing electronics products demand ever-better cost, bandwidth, battery life, and software functionality. These systems – network routers, MP3 players, cell-phones, home gateways, PDAs, and many others – require both full programmability (to manage complexity and rapidly evolving requirements) and high silicon efficiency (for superior application performance per watt, per dollar and per mm2). Application-specific processor cores promise such a combination of full software flexibility with high efficiency The demand for application-specific processors creates a paradox for modern system design: how do architects develop new processors that combine the key benefits of generic programmable chips – longevity, development costs amortized over large volume, adaptability to changing market requirements – without taking too much development time or expense. If the cost of fashioning new optimized processors could be radically reduced, then a much broader array of highly refined processor cores could be used in system-on-chip designs. Tensilica enables rapid design of highly efficient processor cores by providing a base architecture, a lean core implementation, and an automated method to seamlessly extend the processor hardware and software to fit each system’s application requirements. Processors extended by this methodology close the performance gap between high-overhead general-purpose programmable processors and efficient, specialized hardware-only solutions based on hardwired-datapath-plus-state-machine logic functions [1]. This methodology also closes the design gap between the rapid, exponential growth of silicon capacity and the slower growth in designer productivity [2]. This paper outlines the capabilities of Tensilica’s Xtensa processor generator [3], including the Tensilica Instruction Extension (TIE) methodology and demonstrates a resolution to the paradox. 2. WHAT’S THE RIGHT ARCHITECTURE? But what is the right new architecture for extended processors? What instructions should we add? There is no universal extension, or even one for each application class. System designers may already know the answer for their own problem area. Good candidate instructions can be found in the datapaths of dedicated hardware solutions sometimes added outside the processor to enhance application performance. By moving these datapaths into the processor, the system architect can discard the external control logic: the finite state machines and micro-sequencers. The processor and its software can provide this sequencing much more flexibly. Moreover, removing the function-specific control logic also eliminates most of the verification infrastructure necessary to test that logic and guarantees flexibility to accommodate new algorithms using the same datapath functions. Moving the application-specific datapaths into the processor provides several other advantages. Fast instruction set extension and performance testing encourages, in turn, rapid prototype validation and experimentation. This allows the system designer to home in quickly on the best design for the target application set. The datapath is fully accessible from C/C++ code through Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 2001, June 18-22, 2001, Las Vegas, Nevada, USA. Copyright 2001 ACM 1-58113-297-2/01/0006…$5.00. 18412.3the compiler using extended intrinsics and data-types. Storage elements (register files and special state registers) and pipeline flip-flops are generated by the TIE compiler in response to a high-level specification, and need not be created manually. Storage elements can be configured in width and number to adapt to the data precision and bandwidth requirements of the algorithm. The paradigm also simplifies the use of data memory, since the processor can simply share a unified data memory across many different tasks. This avoids the typical duplication of the assorted RAM structures, address generators, access ports and external interfaces found in designs that attempt to combine a range of specialized execution engines. Xtensa's RAM structures are configurable in type, depth and width beyond what the processor core requires, so as to support the width required by the added datapaths. So, this approach design time also reduces system cost through hardware sharing. 3. EXTENSION

View Full Document


School:
Email:
New Password:
Confirm Password:

Princeton ELE 572 - Hardware/Software Instruction

Sign up for free to view:

Please select your school