Berkeley COMPSCI 252 - Lecture 18: Introduction to Multiprocessors - D2404518

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Lecture 18: Introduction to Multiprocessors

DOC PREVIEW

Berkeley COMPSCI 252 - Lecture 18: Introduction to Multiprocessors

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 94

This preview shows page 1-2-3-4-5-6-44-45-46-47-48-49-50-89-90-91-92-93-94 out of 94 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 94 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture 18: Introduction to MultiprocessorsWhy Multiprocessors?Exploiting (Program) ParallelismExploiting (Program) Parallelism -2Need for Parallel ComputingWhat to do with a billion transistors ?Elements of a multiprocessing systemUse, GranularityTopologyCouplingControl/DataTask allocation and routingReconfigurationProgrammer’s modelParallel Programming ModelsMessage Passing MulticomputersShared-Memory MultiprocessorsCache Coherence - A Quick OverviewImplementation issuesPerformance objectivesFlynn’s Taxonomy of MultiprocessingExamplesPredominant ApproachesC62x Pipeline Operation Pipeline PhasesSuperscalar: PowerPC 604 and Pentium ProIA-64 aka EPIC aka VLIWPhillips Trimedia ProcessorTMS320C6201 Revision 2TMS320C6701 DSP Block DiagramTMS320C67x CPU CoreSingle-Chip Multiprocessors CMPIntel IXP1200 Network ProcessorIXP1200 MicroEngineIXP1200 Instruction SetUCB: Processor with DRAM (PIM) IRAM, VIRAMIRAM Vision StatementPotential Multimedia ArchitectureRevive Vector (= VSIW) Architecture!V-IRAM1: 0.18 µm, Fast Logic, 200 MHz 1.6 GFLOPS(64b)/6.4 GOPS(16b)/16MBTentative VIRAM-1 FloorplanTentative VIRAM-”0.25” FloorplanStanford: Hydra DesignMescal ArchitectureOutlineArchitectural Rationale and MotivationSlide 46Architecture GoalsArchitecture TemplateRange of ArchitecturesSlide 50Slide 51Slide 52Slide 53Range of Architectures (Future)The RAW ArchitectureSlide 56RAW Machine OverviewRAW TilesRAW Tiles(cont.)Configurable Hardware in RAWBenefits of RAWDisadvantages of RAWTraditional Operations on RAWCompiling for RAW machinesCompiling for RAW(cont.)Structure of RAWCCThe MAPS SystemSpace-Time SchedulerBasic Block OrchestratorInitial Code TransformationInstruction PartitionerGlobal Data PartitionerData and Instruction PlacerEvent SchedulerControl FlowPerformanceFuture WorkReconfigurable processorsSCORE Stream Computation Organized for Reconfigurable ExecutionOpportunityProblemIntroduce: SCOREViewpointSlide 84…borrows heavily from...Enabling HardwareBRASS ArchitectureArray ModelPlatform VisionExample: SCORE ExecutionSpatial ImplementationSerial ImplementationSummary: Elements of a multiprocessing systemConclusions1Lecture 18:Introduction to MultiprocessorsPrepared and presented by:Kurt Keutzerwith thanks for materials fromKunle Olukotun, Stanford; David Patterson, UC Berkeley2Why Multiprocessors?NeedsRelentless demand for higher performance»Servers»NetworksCommercial desire for product differentiationOpportunitiesSilicon capabilityUbiquitous computers3Exploiting (Program) ParallelismInstructionLoopThreadProcessLevels of ParallelismGrain Size (instructions)1 10 100 1K 10K 100K 1M4Exploiting (Program) Parallelism -2InstructionLoopThreadProcessLevels of ParallelismGrain Size (instructions)1 10 100 1K 10K 100K 1MBit5Need for Parallel ComputingDiminishing returns from ILP»Limited ILP in programs»ILP increasingly expensive to exploitPeak performance increases linearly with more processors»Amhdahl’s law appliesAdding processors is inexpensive»But most people add memory alsoDie AreaPerformanceDie AreaPerformanceP+M2P+M2P+2M6What to do with a billion transistors ?Technology changes the cost and performance of computer elements in a non-uniform manner»logic and arithmetic is becoming plentiful and cheap»wires are becoming slow and scarceThis changes the tradeoffs between alternative architectures»superscalar doesn’t scale well–global control and dataSo what will the architectures of the future be?20072004200119981 clk3 (10, 16, 20?) clks64 x the area4x the speedslower wires7Elements of a multiprocessing systemGeneral purpose/special purposeGranularity - capability of a basic module Topology - interconnection/communication geometry Nature of coupling - loose to tightControl-data mechanismsTask allocation and routing methodology Reconfigurable»Computation»InterconnectProgrammer’s model/Language support/ models of computationImplementation - IC, Board, Multiboard, NetworkedPerformance measures and objectives[After E. V. Krishnamurty - Chapter 58Use, GranularityGeneral purposeattempting to improve general purpose computation (e.g. Spec benchmarks) by means of multiprocessingSpecial purposeattempting to improve a specific application or class of applications by means of multiprocessingGranularity - scope and capability of a processing element (PE)Nand gate ALU with registersExecution unit with local memoryRISC R1000 processor9TopologyTopology - method of interconnection of processors BusFull-crossbar switchMeshN-cubeTorusPerfect shuffle, m-shuffleCube-connected componentsFat-trees10CouplingRelationship of communication among processors Shared clock (Pipelined)Shared registers (VLIW)Shared memory (SMM)Shared network11Control/DataWay in which data and control are organizedControl - how the instruction stream is managed (e.g. sequential instruction fetch)Data - how the data is accessed (e.g. numbered memory addresses) Multithreaded control flow - explicit constructs: fork, join, wait, control program flow - central controllerDataflow model - instructions execute as soon as operands are ready, program structures flow of data, decentralized control12Task allocation and routingWay in which tasks are scheduled and managedStatic - allocation of tasks onto processing elements pre-determined before runtimeDynamic - hardware/software support allocation of tasks to processors at runtime13ReconfigurationComputationalrestructuring of computational elements»reconfigurable - reconfiguration at compile time»dynamically reconfigurable- restructuring of computational elements at runtimeInterconnection schemeswitching network - software controlledreconfigurable fabric14Programmer’s modelHow is parallelism expressed by the user?Expressive powerProcess-level parallelism»Shared-memory»Message-passingOperator-level parallelismBit-level parallelismFormal guaranteesDeadlock-freeLivelock freeSupport for other real-time notionsException handling15Parallel Programming ModelsMessage Passing»Fork thread–Typically one per node»Explicit communication–Send messages–send(tid, tag, message)–receive(tid, tag, message)»Synchronization–Block on messages (implicit sync)–BarriersShared Memory (address space)»Fork thread–Typically one per node»Implicit communication–Using shared address space–Loads and stores»Synchronization–Atomic

View Full Document