DOC PREVIEW
MIT 6 893 - Study References

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Alpha 21264 Microarchitecture21264 Overview21264 Fetch Unit21264 Dispatch and Execution21264 Memory SystemOut-of-order executionPowerPoint Presentation21264 Prediction Mechanisms21264 Execution UnitsAlpha 21264 MicroarchitectureKenneth Conley6.8939/14/0021264 Overview•64-bit RISC Processor•500-1000 Mhz•7-stage pipeline•15 million transistors•2.2V, 60W•310 mm2 (.35 micron)•Target apps: Internet servers, data warehousing, digital video, speech recognition21264 Fetch Unit•4 instructions/cycle, speculative•Prediction:–Line/way predictor for each icache line (2-way, 64K)–3 branch prediction mechanisms•Local: 2 level, 10-bit history pattern predictor (e.g. 10101010)•Global: History of last 12 branches, 4096 entry, 2-bit saturation•Chooser: Chooses between local/global–Prediction tables: 3.6KB–Targets: 6 KB–90-100% accurate on most benchmarks21264 Dispatch and Execution•4 integer execution units (2 clusters)–Each maintains copy of 80-entry register file–Single cycle latency for basic integer ops–Integer population count/leading zero count–Fully-pipelined multiplier–Motion Video Instructions (MVI)•2 FP execution units (1 cluster):–Upper: Multiply–Lower: Add, IEEE Divide, SQRT–72-entry RF21264 Memory System•2, 64-bit data buses for icache/dcache•32 in-flight loads, 32 in-flight stores•Dcache increased to 64K (2-way), double-pumped•L2 Cache:–Moved off-chip (increased latency by 6)–4 GB/s sustained bandwidth•Speculative issue consumers of loads for 3 cycle integer load hit latency•1.3 GB/s sustained bandwidth on McCalpin StreamOut-of-order execution•User visible registers: 32 int/32 float•Renaming registers: 41 int/41 float•Renaming map data saved for precise exception handling•80 instruction in-flight window, in-order retirement•Loads can speculatively bypass stores–Store wait bits for mis-speculation21264 Prediction Mechanisms21264 Execution


View Full Document

MIT 6 893 - Study References

Documents in this Course
Toolkits

Toolkits

16 pages

Cricket

Cricket

29 pages

Quiz 1

Quiz 1

8 pages

Security

Security

28 pages

Load more
Download Study References
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study References and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study References 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?