Unformatted text preview:

COSC 6374 Parallel Computation Introduction to OpenMP Edgar Gabriel Fall 2011 Edgar Gabriel Introduction Threads vs processes Recap of shared memory systems SMP vs NUMA Cache Coherence Process and thread affinity Operating system scheduling Load balancing across multiple processors Parallel Computation Edgar Gabriel 1 Processes vs Threads Process an address space with 1 or more threads executing within that address space and the required system resources for those threads a program that is running Thread a sequence of instructions executed within a process shares the resources in that process Parallel Computation Edgar Gabriel Processes vs Threads II Advantages of multi threaded programming easier programming than multi process models lower overhead for creating a thread compared to creating a process lower overhead for switching between threads required by the OS compared to switching between processes Drawbacks of multi threaded programming more difficult to debug than single threaded programs for single processor machines creating several threads in a program may not necessarily produce an increase in performance Parallel Computation Edgar Gabriel 2 Processes vs Threads III Data per process Address space Open files Data per thread Program counter Processor register Child processes Signal handler Timer Processor status Signal mask Stack Accounting Parallel Computation Edgar Gabriel Execution model Main thread initial thread created when main in C invoked by the process loader once in main the application can create additional threads exit or return called in the main function will terminate all threads E g the process terminates even if there are still running threads Threads can be terminated separately by using different functions Parallel Computation Edgar Gabriel 3 Threads vs processes In Linux no difference on the OS level between processes and threads Threads are processes which have access to the same resources e g address space etc E g internally Linux creates a thread by calling clone CLONE VM CLONE FS CLONE FILES CLONE SIGHAND 0 In contrast a new process is created internally by Linux calling clone SIGCHLD 0 Parallel Computation Edgar Gabriel Recap shared memory systems All processes have access to the same address space PC with more than one processor PC with multi core processor Data exchange between processes by writing reading shared variables Two versions of shared memory systems available Symmetric multiprocessors SMP Non uniform memory access NUMA architectures Parallel Computation Edgar Gabriel 4 Symmetric multi processors SMPs All processors share the same physical main memory CPU CPU Memory CPU CPU Memory bandwidth per processor is limiting factor for this type of architecture Parallel Computation Edgar Gabriel NUMA architectures I Some memory is closer to a certain processor than other memory The whole memory is still addressable from all processors Depending on what data item a processor retrieves the access time might vary strongly CPU CPU CPU CPU CPU Memory Memory Memory Memory CPU CPU CPU Parallel Computation Edgar Gabriel 5 NUMA architectures II Reduces the memory bottleneck compared to SMPs More difficult to program efficiently E g first touch policy data item will be located in the memory of the processor which uses a data item first To reduce effects of non uniform memory access caches are often used ccNUMA cache coherent non uniform memory access architectures Parallel Computation Edgar Gabriel Cache Coherence Real world shared memory systems have caches between memory and CPU Copies of a single data item can exist in multiple caches Modification of a shared data item by one CPU leads to outdated copies in the cache of another CPU Memory Original data item Cache Cache Copy of data item in cache of CPU 0 CPU 0 CPU 1 Copy of data item in cache of CPU 1 Parallel Computation Edgar Gabriel 6 Cache coherence II Typical solution Caches keep track on whether a data item is shared between multiple processes Upon modification of a shared data item notification of other caches has to occur Other caches will have to reload the shared data item on the next access into their cache Cache coherence only an issue in case multiple tasks access the same item Multiple threads Multiple processes have a joint shared memory segment Process is being migrated from one CPU to another Parallel Computation Edgar Gabriel Thread and Process Affinity Each thread process has an affinity mask Specifies what processors a thread is allowed to use Different threads can have different masks Affinities are inherited across process creation Example 4 way multi core 1 1 core 3 core 2 0 core 1 1 core 0 Process thread is allowed to run on cores 0 2 3 but not on core 1 Slide based on a lecture of Jernej Barbic MIT Parallel Computation http people csail mit edu barbic multi core 15213 sp07 ppt Edgar Gabriel 7 Linux Kernel scheduler API Retrieve the current affinity mask of a process include include include include sys types h sched h unistd h errno h unsigned int len sizeof cpu set t cpu set t mask pid t pid getpid get the process id of this app ret sched getaffinity pid len mask if ret 0 printf Error in getaffinity d s n errno strerror errno for i 0 i NUMCPUS i if CPU ISSET i mask printf Process could run on CPU d n i Parallel Computation Edgar Gabriel Linux Kernel scheduler API II Set the affinity mask of a process unsigned int len sizeof cpu set t cpu set t mask pid t pid getpid get the process id of this app clear the mask CPU ZERO mask set the mask such that the process is only allowed to execute on the desired CPU CPU SET cpu id mask ret sched setaffinity pid len mask if ret 0 printf Error in setaffinity d s n errno strerror errno Parallel Computation Edgar Gabriel 8 Linux Kernel scheduler API III Setting affinity mask of a thread define USE GNU pthread setaffinity np thread t t len mask pthread attr setaffinity np thread attr t a len mask First function modifies the affinity mask of an existing thread Second function sets the affinity mask of a thread before it is created A thread inherits the affinity mask of the main thread and will run on the same core initially as the main thread otherwise Parallel Computation Edgar Gabriel ifndef GNU SOURCE define GNU SOURCE 1 endif include stdlib h include stdio h include pthread h include unistd h pthread t tid pthread attr t attr cpu set t cpuset CPU ZERO cpuset CPU SET 1 cpuset thread will be allowed to run on core 1 only pthread attr init


View Full Document

UH COSC 6374 - Introduction to OpenMP

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Introduction to OpenMP and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to OpenMP and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?