DOC PREVIEW
MIT 6 375 - SIMD Extensions for the SMIPS core

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

SIMD Extensions for the SMIPS core (Group 4)Mykal Valentine Svilen KanevMay 13, 20101 Project ObjectiveThis project aims to create SIMD extensions to the SMIPS core already designed in the class. Our maingoal is to obtain better performance for applications with high level of data-parallelism. The instructionswe are planning to implement are general enough to be used in a wide class of applications that benefitfrom a smaller number of instructions for the same data throughput. Generally speaking, we are targetingmultimedia, DSP and scientific applications.Vector extensions to scalar processors aren’t exactly a novel idea. There are quite a few successfulcommercial products shipping (especially MMX and SSE1-4) and under development (f.e. Intel’s Larabee),not to mention the fact the whole GPU segment is based on the same basic principles. But, even if not agroundbreaking new idea, actually implementing the extensions allows us to get a lot of hands-on experiencein a field that is going to stay active in the next few years.On a very high level, our proposed architecture includes a scalar SMIPS core and a vector coprocessor thatcan exploit data-level parallelism. The coprocessor is more or less another execution unit in our core, whichsupports variable-length instructions. Synchronization is achieved via main memory and a limited amountof value forwarding. Since we are sharing the front-end of an in-order core, we achieve serial consistency.2 High-level Design2.1 Architecture OverviewChoice of extensions The first major choice we had to make was whether to implement an existing setof MIPS SIMD extensions or to create a custom solution. While we are aware of two extension sets for theMIPS architecture, we chose the latter option. From the existing solutions, MIPS-3D has been implementedin commercial processors, but is only focused on 3D operations (clipping, lightning), and MDMX has nevergotten implemented and is, therefore, lacking a clear standard. Our aim is to implement a slightly broaderset of extensions than MIPS-3D that can be used in a wider class of applications. A prior project for thisclass [2] has already implemented a similar set of extensions. In order to reuse the infrastructure that hasbeen built, we keep compatibility between the instructions shared by the two projects as much as possible.Architecture Our architecture includes a SIMD pipeline that operates on vectors of 4 32-bit values. Inorder to minimize area overheads, we intend to keep the width of the SIMD unit at 32-bits. An overall viewof the architecture can be seen in Figure 1. We will go into a much more detailed description in Section 4.The SIMD instructions follow the general MIPS ISA and are part of the regular instruction stream. Theyare fetched from the shared fetch unit, decoded and subsequently sent to the vector pipeline. Once suchan instruction is sent to the vector unit, the scalar pipeline is stalled until the vector operation is complete.Since it is operating on a different memory size, the vector unit uses a separate 128-bit register file, but sharesthe memory access port with the scalar pipeline. Consistency is ensured because no pipeline is allowed toexecute while the other one is busy.Initially, the vector pipeline consists of a single 32-bit ALU and a vector register file. Even though interms of pure computational capabilities this is not significantly more than a regular scalar pipeline, we stillexpect a moderate performance improvement. The main reason is reducing the stress to both instruction and1Figure 1: Overall architecture for the SMIPS SIMD machinedata memory - with less instructions per arithmetic operation and more regular (and hence predictable) datamemory access patterns. Such an architecture is also able to hide some of the data memory access latencyby beginning computation on vector elements before the whole vector is fetched.In Section 6 we explore decoupling the two pipelines and allowing the scalar one to run past a long-latencyvector operation, as long as there is no memory data dependency. With such decoupling, in theory, bothpipelines can execute on each cycle, keeping a throughput close to 1, despite the shared fetch unit. Thechoice of such architecture is motivated by the fact that most of today’s workloads are memory-bound. Byeffective usage of the instruction memory we are trying to decrease memory workload.2.2 Instruction set extensionsWe are extending the SMIPS ISA trough the COP2 interface described in the ISA definition [1]. A completelist of the extended instruction formats follows in Appendix A.Based on the instruction format that we chose early on, we have 16 opcodes available for c2 instructions.We have implemented 12 c2 instructions, so we are actually close to that limit. In choosing which instruc-tions to implement, we were guided by what our benchmarks require in order to execute. That is why theimplementation is missing, for example, a full spectrum of logic operations.Before describing the actual instructions, we should setup some semantic definitions. For a vector register,rx.compy refers to the y-th component of vector x. Furthermore, all vector operations are masked - thatis, results on some elements can be discarded if internal mask bits for those elements are cleared. This isdone mainly so that the machine can operate on vectors of length smaller than 4. We have implementedinstructions to explicitly deal with the masks.Also, we extend the SMIPS ISA convention that register $0 always evaluates to 0 to the vector register$0.2.2.1 Vector memory instructions - lwc2 and swc2These instructions perform the memory operations for the vector unit. Each of them operates on wholevectors (even though the actual implementation can take multiple cycles to execute). They are the onlyinterface to transfer data between the scalar and vector pipelines.The address base is stored in the scalar register file. The motivation for that decision comes from thefact that address calculations (meaning address generation by previous instructions, not simply offsetting theaddress base) are generally simple and scalar, so there is no reason for their results to be stored in the vector2register file. In that case, the full address generation can be done immediately after reading from the scalarregister file, which in our pipeline happens in the scalar decode stage.2.2.2 Two-operand instructions - addv and mulvThese instructions simply add two vectors, or perform


View Full Document

MIT 6 375 - SIMD Extensions for the SMIPS core

Documents in this Course
IP Lookup

IP Lookup

15 pages

Verilog 1

Verilog 1

19 pages

Verilog 2

Verilog 2

23 pages

Encoding

Encoding

21 pages

Quiz

Quiz

10 pages

IP Lookup

IP Lookup

30 pages

Load more
Download SIMD Extensions for the SMIPS core
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SIMD Extensions for the SMIPS core and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SIMD Extensions for the SMIPS core 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?