Berkeley COMPSCI 252 - Lecture 11: Multiprocessor 1 - D1977540

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Lecture 11: Multiprocessor 1

DOC PREVIEW

Berkeley COMPSCI 252 - Lecture 11: Multiprocessor 1

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 55

This preview shows page 1-2-3-4-25-26-27-52-53-54-55 out of 55 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 55 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 11: Multiprocessor 1: Reasons, Classifications, Performance Metrics, ApplicationsReview: NetworkingParallel ComputersParallel Processors “Religion”What level Parallelism?Why Multiprocessors?Parallel Processing IntroWhither Supercomputing?Popular Flynn Categories (e.g., ~RAID level for MPPs)Major MIMD StylesDecentralized Memory versionsPerformance Metrics: Latency and BandwidthCS 252 AdministriviaParallel ArchitectureParallel FrameworkShared Address Model SummaryShared Address/Memory Multiprocessor ModelSMP InterconnectMessage Passing ModelData Parallel ModelSlide 21Advantages shared-memory communication modelAdvantages message-passing communication modelCommunication Models3 Parallel ApplicationsParallel App: Commercial WorkloadParallel App: Multiprogramming and OSParallel App: Scientific/TechnicalSlide 29Parallel Scientific App: ScalingAmdahl’s Law and Parallel ComputersSmall-Scale—Shared MemoryWhat Does Coherency Mean?Potential HW Cohernecy SolutionsBasic Snoopy ProtocolsSlide 36Snooping Cache VariationsAn Example Snoopy ProtocolSnoopy-Cache State Machine-ISnoopy-Cache State Machine-IISnoopy-Cache State Machine-IIIExampleSlide 43Slide 44Slide 45Slide 46Slide 47Implementation ComplicationsImplementing Snooping CachesSlide 50Fundamental IssuesFundamental Issue #1: NamingSlide 53Fundamental Issue #2: SynchronizationSummary: Parallel FrameworkCS252/PattersonLec 11.12/23/01CS252Graduate Computer ArchitectureLecture 11: Multiprocessor 1: Reasons, Classifications, Performance Metrics, ApplicationsFebruary 23, 2001Prof. David A. PattersonComputer Science 252Spring 2001CS252/PattersonLec 11.22/23/01Review: Networking•Clusters +: fault isolation and repair, scaling, cost•Clusters -: maintenance, network interface performance, memory efficiency•Google as cluster example:–scaling (6000 PCs, 1 petabyte storage)–fault isolation (2 failures per day yet available)–repair (replace failures weekly/repair offline)–Maintenance: 8 people for 6000 PCs•Cell phone as portable network device–# Handsets >> # PCs–Univerisal mobile interface?•Is future services built on Google-like clusters delivered to gadgets like cell phone handset?CS252/PattersonLec 11.32/23/01Parallel Computers•Definition: “A parallel computer is a collection of processiong elements that cooperate and communicate to solve large problems fast.”Almasi and Gottlieb, Highly Parallel Computing ,1989•Questions about parallel computers:–How large a collection?–How powerful are processing elements?–How do they cooperate and communicate?–How are data transmitted? –What type of interconnection?–What are HW and SW primitives for programmer?–Does it translate into performance?CS252/PattersonLec 11.42/23/01Parallel Processors “Religion”•The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor•Led to innovative organization tied to particular programming models since “uniprocessors can’t keep going”–e.g., uniprocessors must stop getting faster due to limit of speed of light: 1972, … , 1989–Borders religious fervor: you must believe!–Fervor damped some when 1990s companies went out of business: Thinking Machines, Kendall Square, ...•Argument instead is the “pull” of opportunity of scalable performance, not the “push” of uniprocessor performance plateau?CS252/PattersonLec 11.52/23/01What level Parallelism?•Bit level parallelism: 1970 to ~1985–4 bits, 8 bit, 16 bit, 32 bit microprocessors•Instruction level parallelism (ILP): ~1985 through today–Pipelining–Superscalar–VLIW–Out-of-Order execution–Limits to benefits of ILP?•Process Level or Thread level parallelism; mainstream for general purpose computing?–Servers are parallel–Highend Desktop dual processor PC soon?? (or just the sell the socket?)CS252/PattersonLec 11.62/23/01Why Multiprocessors?1. Microprocessors as the fastest CPUs•Collecting several much easier than redesigning 12. Complexity of current microprocessors•Do we have enough ideas to sustain 1.5X/yr?•Can we deliver such complexity on schedule?3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS)4. Emergence of embedded and server markets driving microprocessors in addition to desktops•Embedded functional parallelism, producer/consumer model•Server figure of merit is tasks per hour vs. latencyCS252/PattersonLec 11.72/23/01Parallel Processing Intro•Long term goal of the field: scale number processors to size of budget, desired performance•Machines today: Sun Enterprise 10000 (8/00)–64 400 MHz UltraSPARC® II CPUs,64 GB SDRAM memory, 868 18GB disk,tape –$4,720,800 total –64 CPUs 15%,64 GB DRAM 11%, disks 55%, cabinet 16% ($10,800 per processor or ~0.2% per processor)–Minimal E10K - 1 CPU, 1 GB DRAM, 0 disks, tape ~$286,700–$10,800 (4%) per CPU, plus $39,600 board/4 CPUs (~8%/CPU)•Machines today: Dell Workstation 220 (2/01)–866 MHz Intel Pentium® III (in Minitower)–0.125 GB RDRAM memory, 1 10GB disk, 12X CD, 17” monitor, nVIDIA GeForce 2 GTS,32MB DDR Graphics card, 1yr service–$1,600; for extra processor, add $350 (~20%)CS252/PattersonLec 11.82/23/01Whither Supercomputing?•Linpack (dense linear algebra) for Vector Supercomputers vs. Microprocessors•“Attack of the Killer Micros”–(see Chapter 1, Figure 1-10, page 22 of [CSG99])–100 x 100 vs. 1000 x 1000•MPPs vs. Supercomputers when rewrite linpack to get peak performance–(see Chapter 1, Figure 1-11, page 24 of [CSG99])•1997, 500 fastest machines in the world: 319 MPPs, 73 bus-based shared memory (SMP), 106 parallel vector processors (PVP)–(see Chapter 1, Figure 1-12, page 24 of [CSG99])•2000, 381 of 500 fastest: 144 IBM SP (~cluster), 121 Sun (bus SMP), 62 SGI (NUMA SMP), 54 Cray (NUMA SMP)[CSG99] = Parallel computer architecture : a hardware/ software approach, David E. Culler, Jaswinder Pal Singh, with Anoop Gupta. San Francisco : Morgan Kaufmann, c1999.CS252/PattersonLec 11.92/23/01Popular Flynn Categories (e.g., ~RAID level for MPPs)•SISD (Single Instruction Single Data)–Uniprocessors•MISD (Multiple Instruction Single Data)–???; multiple processors on a single data stream•SIMD (Single Instruction Multiple Data)–Examples: Illiac-IV, CM-2»Simple programming model»Low overhead»Flexibility»All custom integrated circuits–(Phrase reused by Intel marketing for media instructions ~

View Full Document