CSE332: Data AbstractionsLecture 19: Introduction to Multithreading and Fork-Join ParallelismTyler RobisonSummer 20101Changing a major assumption2So far in 142, 143, 311, and 332, we have assumedOne thing happened at a timeCalled sequential programming – everything part of one sequenceRemoving this assumption creates major challenges & opportunities Programming: Divide work among threads of execution and coordinate (synchronize) among them Algorithms: How can parallel activity provide speed-up (more throughput: work done per unit time) Data structures: May need to support concurrent access (multiple threads operating on data at the same time)Writing correct and efficient multithreaded code is often much more difficult than for single-threaded (i.e., sequential) codeA simplified view of history3From roughly 1980-2005, desktop computers got exponentially faster at running sequential programs About twice as fast every couple yearsBut nobody knows how to continue this Increasing clock rate generates too much heat Relative cost of memory access is too high But we can keep making “wires exponentially smaller” (Moore‟s “Law”), so put multiple processors on the same chip (“multicore”)What to do with multiple processors?4 Next computer you buy will likely have 4 processors Wait a few years and it will be 8, 16, 32, … The chip companies have decided to do this (not a “law”) What can you do with them? Run multiple totally different programs at the same time Already do that? Yes, but with time-slicing Do multiple things at once in one program Our focus – more difficult Requires rethinking everything from asymptotic complexity to how to implement data-structure operationsParallelism vs. Concurrency5Note: These terms are not yet standard, but the difference in perspective is essential Many programmers confuse them Remember that Parallelism != ConcurrencyParallelism: Use more resources for a faster answerConcurrency: Correctly and efficiently allow simultaneous access to something (memory, printer, etc.)There is some connection: Many programmers use threads for both If parallel computations need access to shared resources, then something needs to manage the concurrencyCSE332: Next few lectures on parallelism, then a few on concurrencyParallelism Example6Parallelism: Increasing throughput by using additional computational resources (code running simultaneously on different processors)Ex: We have a huge array of numbers to add up; split between 4 peopleExample in pseudocode (not Java, yet) below: sum elements of an array No such „FORALL‟ construct, but we‟ll see something similar If you had 4 processors, might get roughly 4x speedupint sum(int[] arr){res = new int[4];len = arr.length;FORALL(i=0; i < 4; i++) { //parallel iterationsres[i] = help(arr,i*len/4,(i+1)*len/4);}return res[0]+res[1]+res[2]+res[3];}int help(int[] arr, int lo, int hi) {result = 0;for(j=lo; j < hi; j++)result += arr[j];return result;}Concurrency Example7Concurrency: Allowing simultaneous or interleaved access to shared resources from multiple clientsEx: Multiple threads accessing a hash-table, but not getting in each others‟ waysExample in pseudo-code (not Java, yet): chaining hash-table Essential correctness issue is preventing bad inter-leavings Essential performance issue not preventing good concurrency One „solution‟ to preventing bad inter-leavings is to do it all sequentiallyclass Hashtable<K,V> {…Hashtable(Comparator<K> c, Hasher<K> h) { … };void insert(K key, V value) {int bucket = …;prevent-other-inserts/lookups in table[bucket];do the insertionre-enable access to arr[bucket];}V lookup(K key) {(like insert, but can allow concurrent lookups to same bucket)}}An analogy8CSE142 idea: Writing a program is like writing a recipe for a cook One step at a timeParallelism: Have lots of potatoes to slice? Hire helpers, hand out potatoes and knives But we can go too far: if we had 1 helper per potato, we‟d spend too much time coordinatingConcurrency: Lots of cooks making different things, but only 2 stove burners Want to allow simultaneous access to both burners, but not cause spills or incorrect burner settingsShared memory with Threads9The model we will assume is shared memory with explicit threadsOld story: A running program has One call stack (with each stack frame holding local variables) One program counter (current statement executing) Static fields Objects (created by new) in the heap (nothing to do with heap data structure)New story: A set of threads, each with its own call stack & program counter No access to another thread‟s local variables Threads can (implicitly) share static fields / objects To communicate, write somewhere another thread readsShared memory with Threads 10…Heap for all objects and static fieldsThreads, each with own unshared call stack and current statement (pc for “program counter”)– local variables are numbers/null or heap referencespc=0x……pc=0x……pc=0x……Other models11We will focus on shared memory, but you should know several other models exist Message-passing: Each thread has its own collection of objects. Communication is via explicit messages; language has primitives for sending and receiving them. Cooks working in separate kitchens, with telephones Dataflow: Programmers write programs in terms of a DAG and a node executes after all of its predecessors in the graph Cooks wait to be handed results of previous steps Data parallelism: Have primitives for things like “apply function to every element of an array in parallel” …Java Threads (at a high level)12 Many languages/libraries (including Java) provide primitives for creating threads and synchronizing them Steps to creating another thread:1. Define a subclass C of java.lang.Thread, overriding run()2. Create an object of class C3. Call that object‟s start() method The code that called start will continue to execute after start A new thread will be created, with code executing in the object‟s run()method What happens if, for step 3, we called run instead of start?Parallelism idea: First approach13 Example: Sum elements of an array (presumably large) Use 4 threads, which each sum 1/4 of the arrayans0 ans1 ans2 ans3+ans Steps: Create 4 thread objects, assigning their portion of the work Call
View Full Document