DOC PREVIEW
CSUN COMP 546 - Multicore Computers

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

William Stallings Computer Organization and Architecture 8th EditionHardware Performance IssuesAlternative Chip OrganizationsIntel Hardware TrendsIncreased ComplexityPower and Memory ConsiderationsChip Utilization of TransistorsSoftware Performance IssuesEffective Applications for Multicore ProcessorsMulticore OrganizationMulticore Organization AlternativesAdvantages of shared L2 CacheIndividual Core ArchitectureIntel x86 Multicore Organization - Core Duo (1)Intel x86 Multicore Organization - Core Duo (2)Intel x86 Multicore Organization - Core i7ARM11 MPCoreARM11 MPCore Block DiagramARM11 MPCore Interrupt HandlingDIC RoutingInterrupt StatesInterrupt SourcesARM11 MPCore Interrupt DistributorCache CoherencyRecommended ReadingIntel Core i& Block DiagramIntel Core Duo Block DiagramPerformance Effect of Multiple CoresSlide 29William Stallings Computer Organization and Architecture8th EditionChapter 18Multicore ComputersHardware Performance Issues•Microprocessors have seen an exponential increase in performance—Improved organization—Increased clock frequency•Increase in Parallelism—Pipelining—Superscalar—Simultaneous multithreading (SMT)•Diminishing returns—More complexity requires more logic—Increasing chip area for coordinating and signal transfer logic–Harder to design, make and debugAlternative Chip OrganizationsIntel Hardware TrendsIncreased Complexity•Power requirements grow exponentially with chip density and clock frequency—Can use more chip area for cache–Smaller–Order of magnitude lower power requirements•By 2015—100 billion transistors on 300mm2 die–Cache of 100MB–1 billion transistors for logic•Pollack’s rule: —Performance is roughly proportional to square root of increase in complexity–Double complexity gives 40% more performance•Multicore has potential for near-linear improvement•Unlikely that one core can use all cache effectivelyPower and Memory ConsiderationsChip Utilization of TransistorsSoftware Performance Issues•Performance benefits dependent on effective exploitation of parallel resources•Even small amounts of serial code impact performance—10% inherently serial on 8 processor system gives only 4.7 times performance•Communication, distribution of work and cache coherence overheads•Some applications effectively exploit multicore processorsEffective Applications for Multicore Processors•Database•Servers handling independent transactions•Multi-threaded native applications—Lotus Domino, Siebel CRM•Multi-process applications—Oracle, SAP, PeopleSoft•Java applications—Java VM is multi-thread with scheduling and memory management—Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat•Multi-instance applications—One application running multiple times•E.g. Value Game SoftwareMulticore Organization•Number of core processors on chip•Number of levels of cache on chip•Amount of shared cache•Next slide examples of each organization:•(a) ARM11 MPCore•(b) AMD Opteron•(c) Intel Core Duo•(d) Intel Core i7Multicore Organization AlternativesAdvantages of shared L2 Cache•Constructive interference reduces overall miss rate•Data shared by multiple cores not replicated at cache level•With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic—Threads with less locality can have more cache•Easy inter-process communication through shared memory•Cache coherency confined to L1•Dedicated L2 cache gives each core more rapid access—Good for threads with strong locality•Shared L3 cache may also improve performanceIndividual Core Architecture•Intel Core Duo uses superscalar cores•Intel Core i7 uses simultaneous multi-threading (SMT)—Scales up number of threads supported–4 SMT cores, each supporting 4 threads appears as 16 coreIntel x86 Multicore Organization -Core Duo (1)•2006•Two x86 superscalar, shared L2 cache•Dedicated L1 cache per core—32KB instruction and 32KB data•Thermal control unit per core—Manages chip heat dissipation—Maximize performance within constraints—Improved ergonomics•Advanced Programmable Interrupt Controlled (APIC)—Inter-process interrupts between cores—Routes interrupts to appropriate core—Includes timer so OS can interrupt coreIntel x86 Multicore Organization -Core Duo (2)•Power Management Logic—Monitors thermal conditions and CPU activity—Adjusts voltage and power consumption—Can switch individual logic subsystems•2MB shared L2 cache—Dynamic allocation—MESI support for L1 caches—Extended to support multiple Core Duo in SMP–L2 data shared between local cores or external•Bus interfaceIntel x86 Multicore Organization -Core i7•November 2008•Four x86 SMT processors•Dedicated L2, shared L3 cache•Speculative pre-fetch for caches•On chip DDR3 memory controller—Three 8 byte channels (192 bits) giving 32GB/s—No front side bus•QuickPath Interconnection—Cache coherent point-to-point link—High speed communications between processor chips—6.4G transfers per second, 16 bits per transfer—Dedicated bi-directional pairs—Total bandwidth 25.6GB/sARM11 MPCore•Up to 4 processors each with own L1 instruction and data cache•Distributed interrupt controller•Timer per CPU•Watchdog—Warning alerts for software failures—Counts down from predetermined values—Issues warning at zero•CPU interface—Interrupt acknowledgement, masking and completion acknowledgement•CPU—Single ARM11 called MP11•Vector floating-point unit—FP co-processor•L1 cache•Snoop control unit—L1 cache coherencyARM11 MPCore Block DiagramARM11 MPCore Interrupt Handling•Distributed Interrupt Controller (DIC) collates from many sources•Masking•Prioritization•Distribution to target MP11 CPUs•Status tracking•Software interrupt generation•Number of interrupts independent of MP11 CPU design•Memory mapped•Accessed by CPUs via private interface through SCU•Can route interrupts to single or multiple CPUs•Provides inter-process communication—Thread on one CPU can cause activity by thread on another CPUDIC Routing•Direct to specific CPU•To defined group of CPUs•To all CPUs•OS can generate interrupt to:—All but self—Self—Other specific CPU•Typically combined with shared memory for inter-process communication•16 interrupt ids available for inter-process communicationInterrupt States•Inactive—Non-asserted—Completed by


View Full Document

CSUN COMP 546 - Multicore Computers

Download Multicore Computers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multicore Computers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multicore Computers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?