UltraSPARC T2 Sun MicrosystemsOverview of the UltraSPARC T2HistoryMotivationComponentsChipPowerPoint PresentationEngineering SolutionsCore ArchitectureSlide 10Efficient in-order single issue pipeline“Server on a chip”Comparison Against AMD OpteronComparison Against Intel CoreOpen source release under GNU GPL Verilog, verification/tests, simulation/modeling tools ISA specification http://www.opensparc.net/FutureVideoSourcesUltraSPARC T2Sun MicrosystemsCS 433 – Computer System OrganizationManish AgrawalBrett DanielJosh SmithOverview of the UltraSPARC T2Multi-threaded(8), multi-core(8) CPUFrequency ranges from 900MHz to 1.4GHzPowered by less than 95 watts (nominal) with less than 2 watts per threadIntegrated10 Gb Ethernet networkingPCI Express I/O expansionFPU and cryptographic processing units per coreHistoryCodename Niagara2Member of SPARC family2 previous multi-core processorsUltraSPARC IVUltraSPARC IV+UltraSPARC T1 (first multi-core and multi-threaded)Released 14 November 20054, 6, or 8 cores with 4 threads eachUltraSPARC T2 Released 7 August 2007Now 8 threads per core (instead of 4)MotivationInstead of optimizing each core, overall goal was running as many concurrent threads as possible maximizing and utilizing each core’s pipelineEach core is less complex than those of current high end processor, allowing 8 cores to fit on the same die.Does not feature out-of-order execution, or a sizable amount of cacheEach core is a barrel processorComponentsSource: http://www.sun.com/processors/UltraSPARC-T2/datasheet.pdf8 Fully pipelined FPUs8 SPUs2 integer ALUs per core, each one shared by a group of four threads 4MB L2 Cache (8-banks, 16-way associative)8 KB data cache and 16 KB instruction cacheTwo 10Gb Ethernet ports and one PCIe portChipSource: http://www.opensparc.net/images/stories/t2/ultrasparc-t2-layout.pngSource: Golla R, „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006, http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdfFor a single thread • Memory is THE bottleneck to improving performance• Commercial server workloads exhibit poor memory locality• Only a modest throughput speedup is possible by reducing compute time• Conventional single-thread processors optimized for ILP have lowUtilizationsWith many threads• It’s possible to find something to execute every cycle• Significant throughput speedups are possible• Processor utilization is much higherEngineering Solutions• Goals of the T2 project were:•Double UltraSparc T1's throughput and throughput/watt•Improve UltraSparc T1's FP single-thread (T1 was unable to handle workloads with more than 1-3% FP instructions)•throughput performance•Minimize required area for these improvements• Considered doubling number of UltraSparc T1 cores•16 cores of 4 threads each•Takes too much die area• No area left for improving FP performanceCore ArchitectureSource: http://realworldtech.com/page.cfm?ArticleID=RWT090406012516&p=2Core ArchitectureSource: http://blogs.sun.com/sprack/resource/N2_Announce_Breakout_final.pdfEfficient in-order single issue pipelineEight-stage integer pipelinePick is for selecting 2 threads for execution (Added this stage for T2)In the bypass stage, the load/store unit (LSU) forwards data to the integer register files (IRFs) with sufficient write timing margin. All integer operations pass through the bypass stage.12-stage floating point pipeline Fetch Cache Pick Decode Execute Mem Bypass WFetch Cache Pick Decode Execute Fx1 Fx5 FW. . . FB6-cycle latency for dependent FP opsInteger multiplies are pipelined between different threads. Integer multiplies block within the same thread.Integer divide is a long latency operation. Integer divides are not pipelined between different threads.“Server on a chip”Two 10/1 Gigabit ethernet portsIntegrated PCI-ExpressEmbedded cryptographyhttp://www.podtech.net/home/1293/niagara-2-server-on-a-chip/Comparison Against AMD Opteron4 cores maxAllows multiprocessors“Hypertransport” between coresShared execution unitsComparison Against Intel Core4 cores6 in development8+ in “Nehalem”Allows multiprocessorsShared FSBOpen source release under GNU GPLVerilog, verification/tests, simulation/modeling toolsISA specificationhttp://www.opensparc.net/OpenSPARC"We truly believe OpenSparc will blossom in the future because it is open."Naxin Zhang, Polaris MicroFutureNiagra III: “Victoria Falls”"Pushing up threads and cores" Retain simplicity: In-order processingTarget multiprocessor
View Full Document