TAMU ECEN 676 - ch3_2 (23 pages)

Previewing pages 1, 2, 22, 23 of 23 page document View the full content.
View Full Document

ch3_2



Previewing pages 1, 2, 22, 23 of actual document.

View the full content.
View Full Document
View Full Document

ch3_2

47 views


Pages:
23
School:
Texas A&M University
Course:
Ecen 676 - Adv Computer Architec
Adv Computer Architec Documents

Unformatted text preview:

thus amount of communication not dealt with adequately artifactual communication caused by program implementation and architectural interactions can even dominate and cost of communication in system To understand techniques first look at system interactions Both architecture dependent and addressed in orchestration step also how communication is structured Cost of communication determined not only by amount Inherent communication in parallel algorithm is not all Limitations of Algorithm Analysis 24 Goals balance load reduce inherent communication and extra work Prog model and comm abstr affect specific performance tradeoffs Most of remaining perf issues focus on second aspect Role of these components essential regardless of programming model A multi cache multi memory system View taken so far A collection of communicating processors What is a Multiprocessor 25 as seen by a given processor Glued together by communication architecture Levels communicate at a certain granularity of data transfer Otherwise extra communication may also be caused Especially important since communication is expensive Need to exploit spatial and temporal locality in hierarchy Registers caches local memory remote memory topology Levels in extended hierarchy Multiprocessor as Extended Memory Hierarchy Memory oriented View 26 Divide by cycles to get CPI equation Optimizing machine bigger caches lower latency Optimizing program temporal and spatial locality Data access time can be reduced by Timeprog 1 Busy 1 Data Access 1 Time spent by a program Performance depends heavily on memory hierarchy Uniprocessor 27 Management of levels Improve performance through architecture or program locality Tradeoff with parallelism need good node performance and parallelism 28 Levels closer to processor are lower latency and higher bandwidth caches managed by hardware main memory depends on programming model SAS data movement between local and remote transparent message passing explicit Distributed Memory some



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view ch3_2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ch3_2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?