Unformatted text preview:

UltraSPARC T2 Sun MicrosystemsOverview of the UltraSPARC T2HistoryMotivationComponentsChipPowerPoint PresentationEngineering SolutionsCore ArchitectureSlide 10Efficient in-order single issue pipeline“Server on a chip”Comparison Against AMD OpteronComparison Against Intel CoreOpen source release under GNU GPL Verilog, verification/tests, simulation/modeling tools ISA specification http://www.opensparc.net/FutureVideoSourcesUltraSPARC T2Sun MicrosystemsCS 433 – Computer System OrganizationManish AgrawalBrett DanielJosh SmithOverview of the UltraSPARC T2Multi-threaded(8), multi-core(8) CPUFrequency ranges from 900MHz to 1.4GHzPowered by less than 95 watts (nominal) with less than 2 watts per threadIntegrated10 Gb Ethernet networkingPCI Express I/O expansionFPU and cryptographic processing units per coreHistoryCodename Niagara2Member of SPARC family2 previous multi-core processorsUltraSPARC IVUltraSPARC IV+UltraSPARC T1 (first multi-core and multi-threaded)Released 14 November 20054, 6, or 8 cores with 4 threads eachUltraSPARC T2 Released 7 August 2007Now 8 threads per core (instead of 4)MotivationInstead of optimizing each core, overall goal was running as many concurrent threads as possible maximizing and utilizing each core’s pipelineEach core is less complex than those of current high end processor, allowing 8 cores to fit on the same die.Does not feature out-of-order execution, or a sizable amount of cacheEach core is a barrel processorComponentsSource: http://www.sun.com/processors/UltraSPARC-T2/datasheet.pdf8 Fully pipelined FPUs8 SPUs2 integer ALUs per core, each one shared by a group of four threads 4MB L2 Cache (8-banks, 16-way associative)8 KB data cache and 16 KB instruction cacheTwo 10Gb Ethernet ports and one PCIe portChipSource: http://www.opensparc.net/images/stories/t2/ultrasparc-t2-layout.pngSource: Golla R, „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006, http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdfFor a single thread • Memory is THE bottleneck to improving performance• Commercial server workloads exhibit poor memory locality• Only a modest throughput speedup is possible by reducing compute time• Conventional single-thread processors optimized for ILP have lowUtilizationsWith many threads• It’s possible to find something to execute every cycle• Significant throughput speedups are possible• Processor utilization is much higherEngineering Solutions• Goals of the T2 project were:•Double UltraSparc T1's throughput and throughput/watt•Improve UltraSparc T1's FP single-thread (T1 was unable to handle workloads with more than 1-3% FP instructions)•throughput performance•Minimize required area for these improvements• Considered doubling number of UltraSparc T1 cores•16 cores of 4 threads each•Takes too much die area• No area left for improving FP performanceCore ArchitectureSource: http://realworldtech.com/page.cfm?ArticleID=RWT090406012516&p=2Core ArchitectureSource: http://blogs.sun.com/sprack/resource/N2_Announce_Breakout_final.pdfEfficient in-order single issue pipelineEight-stage integer pipelinePick is for selecting 2 threads for execution (Added this stage for T2)In the bypass stage, the load/store unit (LSU) forwards data to the integer register files (IRFs) with sufficient write timing margin. All integer operations pass through the bypass stage.12-stage floating point pipeline Fetch Cache Pick Decode Execute Mem Bypass WFetch Cache Pick Decode Execute Fx1 Fx5 FW. . . FB6-cycle latency for dependent FP opsInteger multiplies are pipelined between different threads. Integer multiplies block within the same thread.Integer divide is a long latency operation. Integer divides are not pipelined between different threads.“Server on a chip”Two 10/1 Gigabit ethernet portsIntegrated PCI-ExpressEmbedded cryptographyhttp://www.podtech.net/home/1293/niagara-2-server-on-a-chip/Comparison Against AMD Opteron4 cores maxAllows multiprocessors“Hypertransport” between coresShared execution unitsComparison Against Intel Core4 cores6 in development8+ in “Nehalem”Allows multiprocessorsShared FSBOpen source release under GNU GPLVerilog, verification/tests, simulation/modeling toolsISA specificationhttp://www.opensparc.net/OpenSPARC"We truly believe OpenSparc will blossom in the future because it is open."Naxin Zhang, Polaris MicroFutureNiagra III: “Victoria Falls”"Pushing up threads and cores" Retain simplicity: In-order processingTarget multiprocessor


View Full Document

U of I CS 433 - Sun Microsystems

Download Sun Microsystems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sun Microsystems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sun Microsystems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?