DOC PREVIEW
Berkeley COMPSCI 252 - Hardware-Software Trade-offs in Synchronization

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hardware-Software Trade-offs in SynchronizationRole of SynchronizationLayers of synch supportMini-Instruction Set debateOther forms of hardware supportComponents of a Synchronization EventStrawman LockAtomic InstructionsSimple Test&Set LockPerformance Criteria for Synch. OpsT&S Lock Microbenchmark: SGI Chal.Enhancements to Simple LockImproved Hardware Primitives: LL-SCSimple Lock with LL-SCTrade-offs So FarTicket LockArray-based Queuing LocksLock Performance on SGI ChallengeFairnessPoint to Point Event SynchronizationBarriersA Simple Centralized BarrierA Working Centralized BarrierCentralized Barrier PerformanceImproved Barrier Algorithms for a BusBarrier Performance on SGI ChallengeSynchronization SummaryImplications for SoftwareMemory Consistency ModelMultiprogrammed Uniprocessor Mem. ModelReasoning with Sequential ConsistencyThen again, . . .Requirements for SC (Dubois & Scheurich)Architecture ImplicationsSummary of Sequential ConsistencyDo we really need SC?Does SC eliminate synchronization?Is SC hardware enough?What orderings are essential?How do we exploit this?Hardware Centric ModelsProperly Synchronized ProgramsComplete Relaxed Consistency ModelRelaxing write-to-read (PC, TSO)Detecting weakness wrt SCRelaxing write-to-read and write-to-write (PSO)Relaxing all ordersPreserved OrderingsExamplesProgrammer’s InterfaceIdentifying Synch eventsExampleHow should programs be labeled?Summary of Programmer ModelInterplay of Micro and multi processor designQuestionsHardware-Software Trade-offs in Synchronization CS 252, Spring 05David E. CullerComputer Science DivisionU.C. Berkeley01/13/19CS252 S052Role of Synchronization•“A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”•Types of Synchronization–Mutual Exclusion–Event synchronization»point-to-point»group»global (barriers)•How much hardware support?–high-level operations?–atomic instructions?–specialized interconnect?01/13/19CS252 S053Layers of synch supportHW SupportAtomic RMW opsSynchronization LibraryOperating System SupportUser libraryApplication01/13/19CS252 S054Mini-Instruction Set debate•atomic read-modify-write instructions–IBM 370: included atomic compare&swap for multiprogramming–x86: any instruction can be prefixed with a lock modifier–High-level language advocates want hardware locks/barriers»but it’s goes against the “RISC” flow,and has other problems–SPARC: atomic register-memory ops (swap, compare&swap)–MIPS, IBM Power: no atomic operations but pair of instructions»load-locked, store-conditional»later used by PowerPC and DEC Alpha too•Rich set of tradeoffs01/13/19CS252 S055Other forms of hardware support•Separate lock lines on the bus•Lock locations in memory•Lock registers (Cray Xmp)•Hardware full/empty bits (Tera)•Bus support for interrupt dispatch01/13/19CS252 S056Components of a Synchronization Event•Acquire method–Acquire right to the synch »enter critical section, go past event•Waiting algorithm–Wait for synch to become available when it isn’t–busy-waiting, blocking, or hybrid•Release method–Enable other processors to acquire right to the synch•Waiting algorithm is independent of type of synchronization–makes no sense to put in hardware01/13/19CS252 S057Strawman Locklock: ld register, location /* copy location to register */cmp location, #0 /* compare with 0 */bnz lock /* if not 0, try again */st location, #1 /* store 1 to mark it locked */ret /* return control to caller */unlock: st location, #0 /* write 0 to location */ret /* return control to caller */Busy-WaitWhy doesn’t the acquire method work?Release method?01/13/19CS252 S058Atomic Instructions•Specifies a location, register, & atomic operation–Value in location read into a register–Another value (function of value read or not) stored into location•Many variants–Varying degrees of flexibility in second part•Simple example: test&set–Value in location read into a specified register–Constant 1 stored into location–Successful if value loaded into register is 0–Other constants could be used instead of 1 and 001/13/19CS252 S059Simple Test&Set Locklock: t&s register, location bnz lock /* if not 0, try again */ret /* return control to caller */unlock: st location, #0 /* write 0 to location */ret /* return control to caller */• Other read-modify-write primitives –Swap, Exch–Fetch&op–Compare&swap»Three operands: location, register to compare with, register to swap with»Not commonly supported by RISC instruction sets• cacheable or uncacheable01/13/19CS252 S0510Performance Criteria for Synch. Ops•Latency (time per op)–especially when light contention•Bandwidth (ops per sec)–especially under high contention•Traffic–load on critical resources–especially on failures under contention•Storage•FairnessUnder what conditions do you measure synchronization performance?•Contention? Scale? Duration?01/13/19CS252 S0511 Number of processorsTime (s)11 13 1502468101214161820Test&set, c = 0Test&set, exponential backoff, c = 3.64Test&set, exponential backoff, c = 0Ideal9753T&S Lock Microbenchmark: SGI Chal.lock; delay(c); unlock;•Why does performance degrade?•Bus Transactions on T&S?•Hardware support in CC protocol?01/13/19CS252 S0512Enhancements to Simple Lock•Reduce frequency of issuing test&sets while waiting–Test&set lock with backoff–Don’t back off too much or will be backed off when lock becomes free–Exponential backoff works quite well empirically: ith time = k*ci•Busy-wait with read operations rather than test&set–Test-and-test&set lock–Keep testing with ordinary load»cached lock variable will be invalidated when release occurs–When value changes (to 0), try to obtain lock with test&set»only one attemptor will succeed; others will fail and start testing again01/13/19CS252 S0513Improved Hardware Primitives: LL-SC•Goals: –Test with reads–Failed read-modify-write attempts don’t generate invalidations–Nice if single primitive can implement range of r-m-w operations•Load-Locked (or -linked), Store-Conditional–LL reads variable into register–Follow with arbitrary instructions to manipulate its value–SC tries to store back to


View Full Document

Berkeley COMPSCI 252 - Hardware-Software Trade-offs in Synchronization

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Hardware-Software Trade-offs in Synchronization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hardware-Software Trade-offs in Synchronization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hardware-Software Trade-offs in Synchronization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?