New version page

CORNELL CS 3410 - Atomic Instructions

Documents in this Course
Marra

Marra

43 pages

Caches

Caches

34 pages

ALUs

ALUs

5 pages

Caches!

Caches!

54 pages

Memory

Memory

41 pages

Caches

Caches

32 pages

Caches

Caches

54 pages

Caches

Caches

34 pages

Caches

Caches

54 pages

Load more

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

View Full Document
View Full Document

End of preview. Want to read all 34 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Slide 1AnnouncementsAnnouncementsGoals for TodayMutexesSynchronizationSlide 7Atomic Test and SetUsing test-and-set for mutual exclusionSpin waitingAlternative Atomic InstructionsAlternative Atomic Instructionsmutex from LL and SCSlide 14Broken invariantsProtecting an invariantGuidelines for successful mutexingSlide 18Remember: Cache CoherenceRelaxed consistency implicationsAcquire/releaseSlide 22Beyond mutexesBeyond mutexesBeyond mutexesBeyond mutexesSlide 27Condition variablesUsing a condition variableUsing a condition variableMonitorsJava concurrencyMore synchronization mechanismsSummaryAtomic InstructionsHakim WeatherspoonCS 3410, Spring 2011Computer ScienceCornell UniversityP&H Chapter 2.112AnnouncementsPA4 due next, Friday, May 13th•Work in pairs•Will not be able to use slip days•Need to schedule time for presentation May 16, 17, or 18•Signup today after class (in front)3AnnouncementsPrelim2 results•Mean 56.4 ± 16.3 (median 57.8), Max 95.5•Pickup in Homework pass back room (Upson 360)! "#"$"%"&"' ! "' #"' $"' %"' &"' ( " #! " #( " ) ! " ) ( " $! " $( " ( ! " ( ( " %! " %( " *! " *( " &! " &( " +! " +( " ' ! ! "!"#$%&' () *+"#,(4Goals for TodayFinish Synchronization•Threads and processes•Critical sections, race conditions, and mutexes•Atomic Instructions•HW support for synchronization•Using sync primitives to build concurrency-safe data structures•Cache coherency causes problems•Locks + barriers•Language level synchronization5MutexesQ: How to implement critical section in code?A: Lots of approaches….Mutual Exclusion Lock (mutex)lock(m): wait till it becomes free, then lock itunlock(m): unlock itsafe_increment() {pthread_mutex_lock(m);hits = hits + 1;pthread_mutex_unlock(m)}6SynchronizationSynchronization techniquesclever code •must work despite adversarial scheduler/interrupts•used by: hackers•also: noobsdisable interrupts•used by: exception handler, scheduler, device drivers, …disable preemption•dangerous for user code, but okay for some kernel codemutual exclusion locks (mutex)•general purpose, except for some interrupt-related cases7Hardware Support for Synchronization8Atomic Test and SetMutex implementation•Suppose hardware has atomic test-and-setHardware atomic equivalent of…int test_and_set(int *m) {old = *m;*m = 1;return old;}9Using test-and-set for mutual exclusionUse test-and-set to implement mutex / spinlock / crit. sec.int m = 0;...while (test_and_set(&m)) { /* skip */ };m = 0;10Spin waitingAlso called: spinlock, busy waiting, spin waiting, …•Efficient if wait is short•Wasteful if wait is longPossible heuristic:•spin for time proportional to expected wait time•If time runs out, context-switch to some other thread11Alternative Atomic InstructionsOther atomic hardware primitives - test and set (x86) - atomic increment (x86) - bus lock prefix (x86)12Alternative Atomic InstructionsOther atomic hardware primitives - test and set (x86) - atomic increment (x86) - bus lock prefix (x86) - compare and exchange (x86, ARM deprecated) - linked load / store conditional (MIPS, ARM, PowerPC, DEC Alpha, …)13mutex from LL and SCLinked load / Store Conditionalmutex_lock(int *m) {again:LL t0, 0(a0)BNE t0, zero, againADDI t0, t0, 1SC t0, 0(a0)BEQ t0, zero, again}14Using synchronization primitives to buildconcurrency-safe datastructures15Broken invariantsAccess to shared data must be synchronized•goal: enforce datastructure invariants// invariant: // data is in A[h … t-1]char A[100];int h = 0, t = 0;// writer: add to list tailvoid put(char c) {A[t] = c;t++;}// reader: take from list headchar get() {while (h == t) { };char c = A[h];h++;return c;}16Protecting an invariantRule of thumb: all updates that can affectinvariant become critical sections// invariant: (protected by m)// data is in A[h … t-1]pthread_mutex_t *m = pthread_mutex_create();char A[100];int h = 0, t = 0;// writer: add to list tailvoid put(char c) {pthread_mutex_lock(m);A[t] = c;t++;pthread_mutex_unlock(m);}// reader: take from list headchar get() {pthread_mutex_lock(m);char c = A[h];h++;pthread_mutex_unlock(m);return c;}17Guidelines for successful mutexingInsufficient locking can cause races•Skimping on mutexes? Just say no!Poorly designed locking can cause deadlock•know why you are using mutexes!•acquire locks in a consistent order to avoid cycles•use lock/unlock like braces (match them lexically)–lock(&m); …; unlock(&m)–watch out for return, goto, and function calls!–watch out for exception/error conditions!P1: lock(m1);lock(m2);P2: lock(m2);lock(m1);18Cache Coherencycauses yet more trouble19Remember: Cache CoherenceRecall: Cache coherence defined...Informal: Reads return most recently written valueFormal: For concurrent processes P1 and P2•P writes X before P reads X (with no intervening writes) read returns written value•P1 writes X before P2 reads X  read returns written value•P1 writes X and P2 writes X all processors see writes in the same order–all see the same final value for X20Relaxed consistency implicationsIdeal case: sequential consistency•Globally: writes appear in interleaved order•Locally: other core’s writes show up in program orderIn practice: not so much…•write-back caches  sequential consistency is tricky•writes appear in semi-random order•locks alone don’t help* MIPS has sequential consistency; Intel does not21Acquire/releaseMemory Barriers and Release Consistency •Less strict than sequential consistency; easier to buildOne protocol:•Acquire: lock, and force subsequent accesses after•Release: unlock, and force previous accesses beforeP1: ...acquire(m);A[t] = c;t++;release(m);P2: ...acquire(m);A[t] = c;t++;unlock(m);Moral: can’t rely on sequential consistency(so use synchronization libraries)22Are Locks + Barriers enough?23Beyond mutexesWriters must check for full buffer& Readers must check if for empty buffer•ideal: don’t busy wait… go to sleep insteadchar get() {acquire(L);char c = A[h];h++;release(L);return c;}24Beyond mutexesWriters must check for full buffer& Readers must check if for empty buffer•ideal: don’t busy wait… go to sleep insteadchar get() {acquire(L);char c = A[h];h++;release(L);return c;}char get() {acquire(L);while (h == f) { };char c = A[h];h++;release(L);return c;}char get() {while (h == t) { };acquire(L);char c = A[h];h++;release(L);return c;}25Beyond mutexesWriters must check for full buffer& Readers must check if for empty buffer•ideal:


View Full Document
Loading Unlocking...
Login

Join to view Atomic Instructions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Atomic Instructions and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?