Unformatted text preview:

Spring 2010 Prof. Hyesoon Kim• Outstanding performance, especially on game/multimedia applications.– Challenges: Power Wall, Frequency Wall, Memory Wall• Real time responsiveness to the user and the network.– Challenges: Real-time in an SMP environment, Security• Applicable to a wide range of platforms.– Challenge: Maintain programmability while increasing performanceMicro38 Keynote• Memory wall:– More slower threads– Asynchronous loads• Efficiency wall:– More slower threads– Specialized function• Power wall:– Reduce transistor poweroperating voltagelimit oxide thickness scalinglimit channel length– Reduce switching per functionIncrease Concurrency Increase Serialization Micro38 Keynote• Compatibility with 64b Power Architecture™– Builds on and leverages IBM investment and community• Increased efficiency and performance– Non Homogeneous Coherent Chip Multiprocessor• Allows an attack on the “Frequency Wall”– Streaming DMA architecture attacks “Memory Wall”– High design frequency, low operating voltage attacks “Power Wall”– Highly optimized implementation• Interface between user and networked world– Flexibility and security– Multi-OS support, including RTOS/non-RTOS– Architectural extensions for real-time managementMicro38 Keynote• High design frequency -> low voltage and low power • Power architecture compatibility to utilize IBM software infrastructure & experiences • SPE: SIMD architecture. Support media/game applications • A power & area efficient PPEMicro38 KeynoteMicro38 KeynoteMicro38 Keynote• DMA into and out of Local Store equivalent to Power core loads & stores• Governed by Power Architecture page and segment tables for translation and protection• Shared memory model– Power architecture compatible addressing– MMIO capabilities for SPEs– Local Store is mapped (alias) allowing LS to LS DMA transfers– DMA equivalents of locking loads & stores– OS management/virtualization of SPEs• Pre-emptive context switch is supported (but not efficient)• Pipeline depth: 23 stages • Dual in-order issue • 2way - SMT (issue 2 instructions from 2 threads) • 1stlevel : 32KB 2ndlevel: 512KB • Cache optimization:– Set locking, no write (reduce pollution) feature• IU (Instruction unit): instruction fetch, decode, branch, issue and completion– Fetch 4 instructions per cycle per thread – 4KB branch predictor (global + local )– XU (Fixed point unit) • VSU (A vector scalar unit): vector scalar and floating point• Local store is a private memory• Load/store instruction to read or write • DMA (Direct Memory Access) unit transfers data between local store and system memory• SIMD RISC-style 32 bit fixed length instruction• 2-issue core (static scheduling)• 128 General purpose registers (both floating points, integers) • Most instructions operates on 128bit wide data (2 x 64-bit, 4 x 32-bit, 8 x 16-bit, 1638-bit, and 128x1-bit)• Operations: single precision floating point, integer arithmetic, logical, loads, stores, compares and branches• 256KB of private memoryStatic scheduling: Fetch 2 instructions Check whether it can be done in parallel or not If not execute in-order• No O/S on SPE• Only user mode • Fixed delay and without exception, greatly simplifying the core design• Transfers are divided into 128 Bytes packets for the on chip interconnect • Typical 128B requires 16 processor cycles • Instruction fetch 128B (reduce the pressure to DMA) • DMA priority– Commands (high) loads/stores  instruction (prefetch) – Special instruction to force instruction fetch• Compiler/programmer hint – An upcoming branch address and branch target, prefetching at least 17 instructions • 3-source bitwise selection instruction to eliminate branch (similar to predication) • Multi-path and select instructions • SMBTB: software managed BTB, software loads the target address into a register file.• Rambus XDR • 12.8 GB/s per 32-bit memory channel (x2 ) • High bandwidth support between cell processors • IOIF: Input–output interface; BIF: broadband interface• 360 Hardware:1. Support for DVD-video, DVD-Rom, DVD-R/RW, CD-DA, CD-Rom, CD-R, CD-RW, WMA CD, MP3 CD, JPEG photo CD2. All games supported at 16:9, 720p and 1080i, anti-aliasing3. Customizable face plates to change appearance4. 3 USB 2.0 ports5. Support for 4 wireless controllers6. Detachable drive7. Wi-Fi ready• Custom IBM PowerPC-based CPU- 3 symmetrical cores at 3.2 GHz each- 2 hardware threads per core- 1 VMX-128 vector unit per core- 1 MB L2 cacheCPU Game Math Performance- 9 billion dots per secondhttp://www.ps3vault.com/ps3-specifications/ps3-vs-xbox-360• Custom ATI Graphics Processor- 500 MHz- 10 MB embedded DRAM- 48-way parallel floating-point shader pipelines- unified shader architecture• Memory-512 MB GDDR3 RAM- 700 MHz DDR• Memory Bandwidth- 22.4 GB/s memory interface bus bandwidth- 256 GB/s memory bandwidth to EDRAM- 21.6 GB/s frontside bus• Audio- Mulitchannel surround sound output- Supports 48khz 16-bit audio- 320 independent decompression channels- 32 bit processing- 256+ audio channels• Games: Over 100 games available Marquee games include Gears of War, Tom Clancy line of games, Call of Duty 3, and F.E.A.R.http://www.ps3vault.com/ps3-specifications/ps3-vs-xbox-360• PS3 Specification• CPU: Cell Processor PowerPC-base Core @3.2GHz1 VMX vector unit per core512KB L2 cache7 x SPE @3.2GHz7 x 128b 128 SIMD GPRs7 x 256KB SRAM for SPE*1 of 8 SPEs reserved for redundancyTotal floating point performance: 218 gigaflops• GPU RSX @ 550MHz1.8 TFLOPS floating point PerformanceFull HD (up to 1080p) x 2 channelsMulti-way programmable parallel Floating point shader pipelinesSound Dolby 5.1ch, DTS, LPCM, etc. (Cell-based processing)• Memory256MB XDR Main RAM @3.2GHz256MB GDDR3 VRAM @700MHzSystem Bandwidth Main RAM– 25.6GB/sVRAM–22.4GB/sRSX– 20GB/s (write) + 15GB/s (read)SB2.5GB/s (write) + 2.5GB/s (read)http://www.ps3vault.com/ps3-specifications/ps3-vs-xbox-360• SYSTEM FLOATING POINT PERFORMANCE:2 teraflops• STORAGEHDD Detachable 2.5″ HDD slot x 1I/O–USB Front x 4, Rear x 2 (USB2.0)Memory Stickstandard/Duo, PRO x 1SD standard/mini x 1CompactFlash(Type I, II) x 1• COMMUNICATIONEthernet (10BASE-T, 100BASE-TX, 1000BASE-T) x 3 (input x 1 + output x 2)Wi-Fi IEEE 802.11 b/g (60gig only)Bluetooth–Bluetooth 2.0 (EDR) ControllerBluetooth (up to 7) USB 2.0 (wired)Wi-Fi


View Full Document

GT CS 4803 - LECTURE NOTES

Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?