UNO CSCI 8150 - Program and Network Properties

Unformatted text preview:

1CSCI 8150Advanced Computer ArchitectureHwang, Chapter 2Program and Network Properties2.1 Conditions of Parallelism2.2 Program Partitioning and SchedulingConditions of ParallelismThe exploitation of parallelism in computing requires understanding the basic theory associated with it. Progress is needed in several areas:computation models for parallel computinginterprocessor communication in parallel architecturesintegration of parallel systems into general environmentsData and Resource DependenciesProgram segments cannot be executed in parallel unless they are independent.Independence comes in several forms:Data dependence: data modified by one segement must not be modified by another parallel segment.Control dependence: if the control flow of segments cannot be identified before run time, then the data dependence between thesegments is variable.Resource dependence: even if several segments are independent inother ways, they cannot be executed in parallel if there aren’t sufficient processing resources (e.g. functional units).Data Dependence - 1Flow dependence: S1 precedes S2, and at least one output of S1 is input to S2.Antidependence: S1 precedes S2, and the output of S2 overlaps the input to S1.Output dependence: S1 and S2 write to the same output variable.I/O dependence: two I/O statements (read/write) reference the same variable, and/or the same file.Data Dependence - 2Unknown dependence:The subscript of a variable is itself subscripted.The subscript does not contain the loop index variable.A variable appears more than once with subscripts having different coefficients of the loop variable (that is, different functions of the loop variable).The subscript is nonlinear in the loop index variable.Parallel execution of program segments which do not have total data independence can produce non-deterministic results.Control DependenceControl-independent example:for (i=0;i<n;i++) {a[i] = c[i];if (a[i] < 0) a[i] = 1;}Control-dependent example:for (i=1;i<n;i++) {if (a[i-1] < 0) a[i] = 1;}Compiler techniques are needed to get around control dependence limitations.2Resource DependenceData and control dependencies are based on the independence of the work to be done.Resource independence is concerned with conflicts in using shared resources, such as registers, integer and floating point ALUs, etc.ALU conflicts are called ALU dependence.Memory (storage) conflicts are called storage dependence.Bernstein’s Conditions - 1Bernstein’s conditions are a set of conditions which must exist if two processes can execute in parallel.NotationIiis the set of all input variables for a process Pi .Oiis the set of all output variables for a process Pi .If P1and P2can execute in parallel (which is written as P1|| P2), then:122112IOIOOO∩=∅∩=∅∩=∅Bernstein’s Conditions - 2In terms of data dependencies, Bernstein’s conditions imply that two processes can execute in parallel if they are flow-independent, antiindependent, and output-independent.The parallelism relation || is commutative (Pi|| Pjimplies Pj|| Pi ), but not transitive (Pi|| Pjand Pj|| Pkdoes not imply Pi|| Pk) . Therefore, || is not an equivalence relation.Intersection of the input sets is allowed.Hardware ParallelismHardware parallelism is defined by machine architecture and hardware multiplicity.It can be characterized by the number of instructions that can be issued per machine cycle. If a processor issues kinstructions per machine cycle, it is called a k-issueprocessor. Conventional processors are one-issuemachines.Examples. Intel i960CA is a three-issue processor (arithmetic, memory access, branch). IBM RS -6000 is a four-issue processor (arithmetic, floating-point, memory access, branch).A machine with n k-issue processors should be able to handle a maximum of nk threads simultaneously.Software ParallelismSoftware parallelism is defined by the control and data dependence of programs, and is revealed in the program’s flow graph.It is a function of algorithm, programming style, and compiler optimization.Mismatch between software and hardware parallelism - 1L1L2L3L4X1X2+ -A BMaximum software parallelism (L=load, X/+/- = arithmetic).Cycle 1Cycle 2Cycle 33Mismatch between software and hardware parallelism - 2L1L2L4L3X1X2+-ABCycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Same problem, but considering the parallelism on a two-issue superscalar processor.Mismatch between software and hardware parallelism - 3L1L2S1X1+L5L3L4S2X2-L6BACycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Same problem, with two single-issue processors= inserted for synchronizationTypes of Software ParallelismControl Parallelism – two or more operations can be performed simultaneously. This can be detected by a compiler, or a programmer can explicitly indicate control parallelism by using special language constructs or dividing a program into multiple processes.Data parallelism – multiple data elements have the same operations applied to them at the same time. This offers the highest potential for concurrency (in SIMD and MIMD modes). Synchronization in SIMD machines handled by hardware.Solving the Mismatch ProblemsDevelop compilation supportRedesign hardware for more efficient exploitation by compilersUse large register files and sustained instruction pipelining.Have the compiler fill the branch and load delay slots in code generated for RISC processors.The Role of CompilersCompilers used to exploit hardware features to improve performance.Interaction between compiler and architecture design is a necessity in modern computer development.It is not necessarily the case that more software parallelism will improve performance in conventional scalar processors.The hardware and compiler should be designed at the same time.Program Partitioning & SchedulingThe size of the parts or pieces of a program that can be considered for parallel execution can vary.The sizes are roughly classified using the term “granule size,” or simply “granularity.”The simplest measure, for example, is the number of instructions in a program part.Grain sizes are usually described as fine, mediumor coarse, depending on the level of parallelism involved.4LatencyLatency is the time required for communication between different subsystems in a computer.Memory latency, for example, is the time required by a processor to access memory.Synchronization latency is the time required for two processes to synchronize their execution.Computational granularity and communicatoin


View Full Document

UNO CSCI 8150 - Program and Network Properties

Download Program and Network Properties
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Program and Network Properties and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Program and Network Properties 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?