DOC PREVIEW
Stanford CS 140 - Thread Package API

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Review: Thread package APIhypertarget {prog-a}{Program A}hypertarget {prog-b}{Program B}hypertarget {prog-c}{Program C}Correct answersSequential ConsistencySC thwarts hardware optimizationsSC thwarts compiler optimizationsx86 consistencyx86 WB consistencyx86 WB consistencyx86 atomicityAssuming sequential consistencyData racesData races (continued)Data races (continued)Desired solutionPeterson's solutionDoes Peterson's solution work?MutexesSame concept, many namesImproved producerImproved consumerCondition variablesImproved producerImproved consumerCondition variables (continued)Condition variables (continued)Other thread package featuresImplementing synchronizationApproach #1: Disable interruptsApproach #2: SpinlocksSynchronization on x86Synchronization on alphaKernel SynchronizationKernel locksKernel locksMonitors [BH]href {http://www.scs.stanford.edu/10wi-cs140/sched/readings/monitors.pdf}{[Hoar]}Monitor implementationSemaphores href {http://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html}{[Dijkstra]}Semaphore producer/consumerReview: Thread package API• tid threadcreate (void (*fn) (void *), void *arg);- Create a new thread that calls fn with arg• void threadexit ();• void thread join (tid thread);• The execution of multiple threads is interleaved• Can havenon-preemptive threads:- One thread executes exclusively until it makes a blocking call.• Orpreemptive threads:- May switch to another thread between any two instructions.• Using multiple CPUs is inherently preemptive- Even if you don’t take CPU0away from thread T, anotherthread on CPU1can execute between any two instructions of T.1/39Program Aint flag1 = 0, flag2 = 0;void p1 (void *ignored) {flag1 = 1;if (!flag2) { critical_section_1 (); }}void p2 (void *ignored) {flag2 = 1;if (!flag1) { critical_section_2 (); }}int main () {tid id = thread_create (p1, NULL);p2 (); thread_join (id);}• Can both critical sections run?2/39Program Bint data = 0, ready = 0;void p1 (void *ignored) {data = 2000;ready = 1;}void p2 (void *ignored) {while (!ready);use (data);}int main () { ... }• Can use be called with value 0?3/39Program Cint a = 0, b = 0;void p1 (void *ignored) { a = 1; }void p2 (void *ignored) {if (a == 1)b = 1;}void p3 (void *ignored) {if (b == 1)use (a);}int main () { ... }• Can use be called with value 0?4/39Correct answers• Program A: I don’t know• Program B: I don’t know• Program C: I don’t know• Why?-It depends on your hardware- If it provides sequential consistency, then answers all No- But not all hardware provides sequential consistency• Note: Examples and other slide content from[Adve & Gharachorloo]5/39Sequential Consistency• Sequential consistency: The result of execution is as if alloperations were executed in so me sequential order,and the operations of each processor occurred in theorder specified by the prog ram .[Lamport]• Boils down to two requirem ents :1. Maintaining program order on individual proces sors2. Ensuring write atomicity• Without SC, multiple CPUs can be “worse” thanpreemptive threads- May see results that cannot occur with any interleaving on 1 CPU• Why doesn’t all hardware support sequentialconsistency?6/39SC thwarts hardware optimizations• Complicates write buffers- E.g., read flagn before flag(2 − n) w ritten through in Program A• Can’t re-order overlapping write operations- Concurrent writes to different memory modules- Coalescing writes to same cache line• Complicates non-blocking reads- E.g., speculatively prefetch data in Program B• Makes cache coherence more expensive- Must delay write completion until invalidation/update(Program B)- Can’t allow overlapping updates if no globally visible order(Program C)7/39SC thwarts compiler optimizations• Code motion• Caching value in register- E.g., ready flag inProgram B• Common subexpression elimination- Could cause memory location to be read fewer times• Loop blocking- Re-arrange loops for better cache performance• Software pipelining- Move instructions across iterations of a loop to overlapinstruction latency w ith branch cost8/39x86 consiste ncy• x86 supports multiple consistency/caching models- Memory Type Range Registers (MTRR) specify consistency forranges of physical memory (e.g., frame buffer)- Page Attribute Table (PAT) allows control for each 4K page• Choices include:- WB: Write-back caching (the default)- WT: Write-through caching (all writes go to memory)- UC: Uncacheable (for device memory)- WC: Write-combining – weak consistency & no caching• Some instructions have weaker consistency- String instructions- Special “non-temporal” instructions that bypass cache9/39x86 WB consiste nc y• Old x86s (e.g, 486, Pentium 1) had almost SC- Exception: A read could finish before an earlier write to adifferent location- Which of ProgramsA, B, C might be affected?Just A• Newer x86s let a processor read its own writes early10/39x86 WB consiste nc y• Old x86s (e.g, 486, Pentium 1) had almost SC- Exception: A read could finish before an earlier write to adifferent location- Which of ProgramsA, B, C might be affected?Just A• Newer x86s let a processor read its own writes early- E.g., both of these functions can return 2:int flag1 = 0, flag2 = 0;int p1 (void *ignored) int p2 (void *ignored){ {register int f, g; register int f, g;flag1 = 1; flag2 = 1;f = flag1; f = flag2;g = flag2; g = flag1;return 2*f + g; return 2*f + g;} }- Older CPUs would wait at “f = ...” until store complete10/39x86 atomicity• lock prefix makes a memory instruction atomic- Usually locks bus for duration of instruction (expensive!)- Can avoid locking if memory already exclusively cached- All lock instructions totally ordered- Other memory instructions cannot be re-ordered w. locked ones• xchg instruction is always locked (even w/o prefix)• Special fence instructions can prevent re-ordering- LFENCE – can’t be reordered w. reads (or later writes)- SFENCE – can’t be reordered w. writes- MFENCE – can’t be reordered w. reads or writes11/39Assuming sequential consistency• Let’s for now say we have sequential consistency• Example concurrent code: Producer/Consumer- buffer stores BUFFERSIZE items- count is number of used slots- out is next empty buffer slot to fill (if any)- in is oldest filled slot to consume (if any)12/39void producer (void *ignored) {for (;;) {/* produce an item and put in nextProduced */while (count == BUFFER_SIZE); // do nothingbuffer


View Full Document

Stanford CS 140 - Thread Package API

Documents in this Course
Homework

Homework

25 pages

Notes

Notes

8 pages

Load more
Download Thread Package API
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Thread Package API and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Thread Package API 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?