DOC PREVIEW
Berkeley COMPSCI 252 - Hardware-Software Trade-offs in Synchronization

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS258 S99 1NOW Handout Page 1Hardware-Software Trade-offs in Synchronization CS 252, Spring 05David E. CullerComputer Science DivisionU.C. Berkeley3/29/2005 CS252 S05 2Role of Synchronization• “A parallel computer is a collection of processing elements that cooperateand communicate to solve large problems fast.”• Types of Synchronization– Mutual Exclusion– Event synchronization» point-to-point» group» global (barriers)• How much hardware support?– high-level operations?– atomic instructions?– specialized interconnect?3/29/2005 CS252 S05 3Layers of synch supportHW SupportAtomic RMW opsSynchronization LibraryOperating System SupportUser libraryApplication3/29/2005 CS252 S05 4Mini-Instruction Set debate• atomic read-modify-write instructions– IBM 370: included atomic compare&swap for multiprogramming– x86: any instruction can be prefixed with a lock modifier– High-level language advocates want hardware locks/barriers» but it’s goes against the “RISC” flow,and has other problems– SPARC: atomic register-memory ops (swap, compare&swap)– MIPS, IBM Power: no atomic operations but pair of instructions» load-locked, store-conditional» later used by PowerPC and DEC Alpha too• Rich set of tradeoffs3/29/2005 CS252 S05 5Other forms of hardware support• Separate lock lines on the bus• Lock locations in memory• Lock registers (Cray Xmp)• Hardware full/empty bits (Tera)• Bus support for interrupt dispatch3/29/2005 CS252 S05 6Components of a Synchronization Event• Acquire method– Acquire right to the synch » enter critical section, go past event• Waiting algorithm– Wait for synch to become available when it isn’t– busy-waiting, blocking, or hybrid• Release method– Enable other processors to acquire right to the synch• Waiting algorithm is independent of type of synchronization– makes no sense to put in hardwareCS258 S992NOW Handout Page 23/29/2005 CS252 S05 7Strawman Locklock: ld register, location /* copy location to register */cmp location, #0 /* compare with 0 */bnz lock /* if not 0, try again */st location, #1 /* store 1 to mark it locked */ret /* return control to caller */unlock: st location, #0 /* write 0 to location */ret /* return control to caller */Busy-WaitWhy doesn’t the acquire method work?Release method?3/29/2005 CS252 S05 8Atomic Instructions• Specifies a location, register, & atomic operation– Value in location read into a register– Another value (function of value read or not) stored into location• Many variants– Varying degrees of flexibility in second part• Simple example: test&set– Value in location read into a specified register– Constant 1 stored into location– Successful if value loaded into register is 0– Other constants could be used instead of 1 and 03/29/2005 CS252 S05 9Simple Test&Set Locklock: t&s register, locationbnz lock /* if not 0, try again */ret /* return control to caller */unlock: st location, #0 /* write 0 to location */ret /* return control to caller */• Other read-modify-write primitives – Swap, Exch– Fetch&op– Compare&swap»Three operands: location, register to compare with, register to swap with»Not commonly supported by RISC instruction sets• cacheable or uncacheable3/29/2005 CS252 S05 10Performance Criteria for Synch. Ops• Latency (time per op)– especially when light contention• Bandwidth (ops per sec)– especially under high contention• Traffic– load on critical resources– especially on failures under contention• Storage• FairnessUnder what conditions do you measure synchronization performance?•Contention? Scale? Duration?3/29/2005 CS252 S05 11zzzzzzzzzzzzzzzz Number of processorsTime (µs)11 13 1502468101214161820Test&set, c = 0zTest&set, exponential backoff, c = 3.64Test&set, exponential backoff, c = 0Ideal9753T&S Lock Microbenchmark: SGI Chal.lock; delay(c); unlock;• Why does performance degrade?• Bus Transactions on T&S?• Hardware support in CC protocol?3/29/2005 CS252 S05 12Enhancements to Simple Lock• Reduce frequency of issuing test&sets while waiting– Test&set lock with backoff– Don’t back off too much or will be backed off when lock becomes free– Exponential backoff works quite well empirically: ithtime = k*ci• Busy-wait with read operations rather than test&set– Test-and-test&set lock– Keep testing with ordinary load» cached lock variable will be invalidated when release occurs– When value changes (to 0), try to obtain lock with test&set» only one attemptor will succeed; others will fail and start testing againCS258 S993NOW Handout Page 33/29/2005 CS252 S05 13Improved Hardware Primitives: LL-SC• Goals: – Test with reads– Failed read-modify-write attempts don’t generate invalidations– Nice if single primitive can implement range of r-m-w operations• Load-Locked (or -linked), Store-Conditional– LL reads variable into register– Follow with arbitrary instructions to manipulate its value– SC tries to store back to location – succeed if and only if no other write to the variable since thisprocessor’s LL» indicated by condition codes;• If SC succeeds, all three steps happened atomically• If fails, doesn’t write or generate invalidations– must retry aquire3/29/2005 CS252 S05 14Simple Lock with LL-SClock: ll reg1, location /* LL location to reg1 */sc location, reg2 /* SC reg2 into location*/beqz reg2, lock /* if failed, start again */retunlock: st location, #0 /* write 0 to location */ret• Can do more fancy atomic ops by changing what’s between LL & SC– But keep it small so SC likely to succeed– Don’t include instructions that would need to be undone (e.g. stores)• SC can fail (without putting transaction on bus) if:– Detects intervening write even before trying to get bus– Tries to get bus but another processor’s SC gets bus first• LL, SC are not lock, unlock respectively– Only guarantee no conflicting write to lock variable between them– But can use directly to implement simple operations on shared variables3/29/2005 CS252 S05 15Trade-offs So Far• Latency?• Bandwidth?• Traffic?• Storage?• Fairness?• What happens when several processors spinning on lock and it is released?– traffic per P lock operations?3/29/2005 CS252 S05 16Ticket Lock• Only one r-m-w per acquire•


View Full Document

Berkeley COMPSCI 252 - Hardware-Software Trade-offs in Synchronization

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Hardware-Software Trade-offs in Synchronization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hardware-Software Trade-offs in Synchronization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hardware-Software Trade-offs in Synchronization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?