Unformatted text preview:

Advanced Computer Architecture CSE 8383Slide 2Put-it-all-togetherSlide 4Memory HierarchyPentium IV two-level cachePlacement PoliciesDirect MappingExample – Fully AssociateExample – Set AssociateSlide 11Pipeline5 Tasks on 4 stage pipelineSpeedupLinear PipelineReservation TableNon Linear Pipelines3 stages & 2 functionsReservation Tables for X & YState DiagramSlide 21Types of ParallelismAmdhal’s LawSlide 24Gustafson – Barsis Law (1988)Slide 26SIMD SystemsMIMD Shared Memory SystemsCache Coherent NUMAMIMD Distributed Memory SystemsCluster ArchitectureGridsMulti-coreSlide 34Four ErasSlide 36Slide 37Computer Science and EngineeringCopyright by Hesham El-RewiniAdvanced Computer Advanced Computer ArchitectureArchitectureCSE 8383CSE 8383May 2, 2006May 2, 2006Session 29Session 29Computer Science and EngineeringCopyright by Hesham El-RewiniContentsGroup work ExamsAssignments ProjectPresentationsLiterature SearchLecturesComputer Science and EngineeringCopyright by Hesham El-RewiniPut-it-all-togetherMemory System DesignPipeline Design TechniquesMultiprocessorsShared Memory SystemsMessage Passing SystemsMultiprocessor Systems-on-Chips Network ComputingComputer Science and EngineeringCopyright by Hesham El-RewiniPut-it-all-togetherMemory System DesignComputer Science and EngineeringCopyright by Hesham El-RewiniMemory HierarchyCPU RegistersCacheMain MemorySecondary StorageLatencyBandwidthSpeedCost per bitComputer Science and EngineeringCopyright by Hesham El-RewiniPentium IV two-level cacheCacheLevel 1L1CacheLevel 2L2Main MemoryProcessorComputer Science and EngineeringCopyright by Hesham El-RewiniPlacement PoliciesHow to Map memory blocks (lines) to Cache block frames (line frames)Blocks(lines)Block Frames(Line Frames)MemoryCacheDirect MappingFully AssociativeSet AssociativeComputer Science and EngineeringCopyright by Hesham El-RewiniDirect Mapping1281292550112739684095012127MemoryTagcache0 1 315 bitsTag Block frame Word475Computer Science and EngineeringCopyright by Hesham El-RewiniExample – Fully Associate0140944095012127MemoryTagcache12 bitsTag Word412Computer Science and EngineeringCopyright by Hesham El-RewiniExample – Set Associate0123126127Set 0Tagcache7 bitsSet 313233630131 4095Memory0 11271241254Tag Set Word57Computer Science and EngineeringCopyright by Hesham El-RewiniPut-it-all-togetherPipeline Design TechniquesComputer Science and EngineeringCopyright by Hesham El-RewiniPipelineTask12nSub-tasks12nPipelineStream ofTasksComputer Science and EngineeringCopyright by Hesham El-Rewini5 Tasks on 4 stage pipelineTask 1Task 2Task 3Task 4Task 512 3456 78TimeComputer Science and EngineeringCopyright by Hesham El-RewiniSpeedupttt12nPipelineStream ofm TasksT (Seq) = n * m * tT(Pipe) = n * t + (m-1) * tSpeedup = n * m/n + m -1Computer Science and EngineeringCopyright by Hesham El-RewiniLinear PipelineProcessing Stages are linearly connectedPerform fixed functionSynchronous PipelineClocked latches between Stage i and Stage i+1Equal delays in all stagesAsynchronous Pipeline (Handshaking)Computer Science and EngineeringCopyright by Hesham El-RewiniReservation TableXXXXS1S2S3S4TimeComputer Science and EngineeringCopyright by Hesham El-RewiniNon Linear PipelinesVariable functionsFeed-ForwardFeedbackComputer Science and EngineeringCopyright by Hesham El-Rewini3 stages & 2 functionsS1S2S3YXComputer Science and EngineeringCopyright by Hesham El-RewiniReservation Tables for X & YX X XX XX X XY YYY Y YS1S2S3S1S2S3Computer Science and EngineeringCopyright by Hesham El-RewiniState Diagram1 0 1 1 0 1 01 1 1 1 1 1 11 0 1 1 0 1 1368+68+8+3*1*Computer Science and EngineeringCopyright by Hesham El-RewiniPut-it-all-togetherMultiprocessorsShared Memory SystemsMessage Passing SystemsMultiprocessor Systems-on-Chips Network ComputingComputer Science and EngineeringCopyright by Hesham El-RewiniTypes of Parallelism Single Data Stream Multiple Data StreamSingleInstructionStream SISDUniprocessors SIMD Array ProcessorsVectorMultipleInstruction StreamMISD MIMDMultiprocessorsMulticomputersFlynn’s TaxonomyComputer Science and EngineeringCopyright by Hesham El-RewiniWalk 4 miles /hourBike 10 miles / hourCar-1 50 miles / hourCar-2 120 miles / hourCar-3 600 miles /hour200 miles20 hoursABmust walkAmdhal’s LawComputer Science and EngineeringCopyright by Hesham El-Rewini10% 20% 30% 40% 50% 60% 70% 80% 90% 99%0510152025Speedup% Serial1000 CPUs16 CPUs4 CPUsAmdahl’s LawComputer Science and EngineeringCopyright by Hesham El-RewiniGustafson – Barsis Law (1988)Gordon Bell Prize Overcoming the conceptual barrier established by Amdahl’s lawScale the problem to the size of the parallel system No fixed size problemComputer Science and EngineeringCopyright by Hesham El-Rewini02040608010010% 20% 30% 40% 50% 60% 70% 80% 90% 99%% SerialSpeedupGustafson-BarsisAmdhalAmdahl vs. Gustafson-BarsisComputer Science and EngineeringCopyright by Hesham El-RewiniSIMD SystemsProcessorMemoryPMPMPMPMPMPMPMPMPMPMPMPMPMPMPMPMvon Neumann ComputerSome Interconnection NetworkOne control unitLockstepAll Ps do the same or nothingComputer Science and EngineeringCopyright by Hesham El-RewiniMIMD Shared Memory SystemsInterconnection NetworksM M M MP P P P P P C P C P C P CM M M MGlobal Memory P C P C P COne global memoryCache CoherenceAll Ps have equal access to memoryComputer Science and EngineeringCopyright by Hesham El-RewiniCache Coherent NUMAInterconnection NetworkMCPMCPMCPMCPEach P has part of the shared memoryNon uniform memory accessComputer Science and EngineeringCopyright by Hesham El-RewiniMIMD Distributed Memory SystemsInterconnection NetworksM M M MP P P P1110 1111 1010 10110110 01110010 00111101 10101000 10010100 0101 00100000 0001SLAN/WANNo shared memoryMessage PassingTopologyComputer Science and EngineeringCopyright by Hesham El-RewiniCluster ArchitectureMCPI/OOSMCPI/OOSMCPI/OOSMiddlewareProgramming EnvironmentInterconnection NetworkHome clusterComputer Science and EngineeringCopyright by Hesham El-RewiniInternetInternetGridsDependable, consistent, pervasive, and inexpensive access to high end computing.Geographically distributed platforms.Computer Science and EngineeringCopyright by Hesham El-RewiniMulti-core •Gate delay does not reduce much •The frequency and performance of each core is the same or a little less than previous generationGeneration


View Full Document

SMU CSE 8383 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?