DOC PREVIEW
UMD ASTR 415 - Computer Architecture

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Class 2. Computer Architecture (1/30/07)Computer Architecture• The components that make up a computer system, and their interconnections.• Basic components ( draw schematic):1. Processor.2. Memory.3. I/O (input/output) devices.4. Communication channels (buses).We will discuss each of these in turn.Processors• Component that executes a program.• Most PCs have only one processor (CPU)—these are “serial” or “scalar” machines.• High-performance machines usually have many processors—these are “vector” or “par-allel” machines.• Processors execute a...fetch—get instruction and/or data from memory;decode—store instruction and/or data in register;execute—perform operation, storing results in memory...cycle (e.g., LD A,R1; LD B,R2; ADD R1,R2,R3; STORE R3,C).– Instruction address held in program counter (PC).– PC incremented after each cycle.• Very primitive commands! “Compilers” or “interpreters” are used to translate high-level code into such low-level o perations.Cycle• Timing of cycle depends on internal construction and complexity of instructions.• Quantum of time in a processor is called a “clock cycle.” All tasks take an integernumber of clock cycles to occur.• The fewer the clo ck cycles for a ta sk, the faster it occurs.• NOTE: Higher clock speeds imply faster heating of compo nents, increasing coolingrequirements.1Measuring CPU performance• Time to execute a program:t = ni× CP I × tc,whereni= number of instructions,CP I = cycles per instruction,tc= time per cycle.Improving CPU performance1. Obviously, can decrease tc. Mostly engineering problem (e.g., increase clock frequency,use better chip materials, ...).2. Decrease CP I, e.g., by making instructions as simple as possible (RISC—ReducedInstruction Set Computer, as opposed to CISC—Complex Instruction Set Computer).Can also “pipeline” by performing different stages of fetch/decode/execute cycle at thesame time, like an assembly line.3. Decrease niany one processor works on:• Improve algorithm.• Distribute niover npprocessors, thus ideallyn0i= ni/np.– Actually, process of distributing work adds overhead: n0i= ni/np+ n0.Defining CPU performanceMIPS—“million instructions per second”: not useful due to variations in instruction length,implementatio n, etc.Mflop/s—“million floating-point operations per second”: measures time to complete ameaningful task, e.g., multiplying two matrices ∼ n3ops.• Computer A and B may have different MIPS but same Mflop/s.• Often refer to “ peak Mflop/s” ( highest possible performance if machine only didarithmetic calculations) and “sustained Mflop/s” (effective speed over entire run).“Benchmark”—standard perfor mance t est, e.g., LINPACK1, SPEC2, etc.1See http: //www.netlib.org/benchmark/performance.ps.2Visit http://www.specbench.org/.2Memory• Passive compo nent that stores data or instructions, accessed by address.• Data flows fr om memory (“read”) or to memory (“write”).• RAM: “Random Access Memory” supports both reads and writes.• ROM: “Read Only Memory”—no writes.Bits & bytes• Smallest piece of memory = 1 bit (off/on).– 8 bits = 1 byte.– 4 bytes = 1 word (on 32-bit machines).– 8 bytes = 1 word (on 64-bit machines).• 1 word = number of bits used to store, e.g., single-precision floating-point number.Usually equals width of data bus.• Typical home computers these days have ∼ 128–51 2 MB of useable RAM.– 1 MB = 1 megabyte or 1,048,576 (220) bytes (sometimes just 106).– 1 Mb = 1 megabit or 106bits (rarely 220).Memory p er formance• Determined by a ccess timeor latency, usually 10–80 ns.3– Latency hiding: perform other operations while waiting for memory to respond.• Would like to build all memory fro m fastest chips, but this is often too expensive.• Instead, exploit “locality of reference.”Improving memory performance• Typical applications store and access data in sequence.• Instructions also sequentially stored in memory.• Hence if address M accessed at time t, there is a high probability that address M + 1will be accessed at time t + 1 (e.g., vector o ps).• Instead of building entire memory from fast chips, use “hierarchical memory”:3Note: DDR-SDRAM (double data rate, s ynchronous dynamic RAM), the newest type of memory, issp e e d-rated in terms “memory cycles,” i.e., the time required between successive memory accesses, typically∼ 10 ns or less.3– Memory closest to processor built from fastest chips—“cache” (often more thanone level).– Main memory built from RAM—“primary memory.”– Additional memory buily from slowest/cheapest components (e.g., hard disks)—“secondary memory.”• Then, transfer entire blo cks of memory between levels, not just individual values.– Block of memory transferred between cache and primary memory = “ cache line.”– Between primary and secondary memory = “page.”How does it work?– If processor needs item x, and it’s not in cache, request forwarded to primarymemory.– Instead of just sending x, primary memory sends entire cache line (x, x + 1, ...).– Then, when/if processor needs x + 1 next cycle, it’s already there.– Possible cache block replacement strategies: ra ndom, first-in-first-out (FIFO, i.e.,replace block that has been in cache longest), least-recently-used (LRU).Hits & Misses– Memory request to cache which is satisfied is called a “hit.”– Memory request which must be passed to next level is called a “miss.”– Fraction of requests which are hits is called the “hit rate.”– Must try to optimize hit rate (>∼ 90%).Measuring memory per formance• Define the “effective access time” as:teff= (HR)tcache+ (1 − HR)tpmwheretcache= access time of cache,tpm= access time of primary memory,HR = hit rate.• E.g., tcache= 10 ns, tpm= 100 ns, HR = 98% ⇒ teff== 11.8 ns, close to cache itself.4Maximizing hit rate• Key to good performance is to design application code to maximize hit rate.• One simple rule: always try to access memory contiguously, e.g., in array operations,fastest-changing index should correspond to successive locations in memory.Good Example– In FORTRAN:DO J = 1, 1000DO I = 1, 1000A(I,J) = 0ENDDOENDDO– This references A(1,1), A(2,1), etc., which are stored contiguous in memory.– NOTE: C, unlike FORTRAN, stores 2 -D array data by column, not by row,so this is a bad example for C!Bad Example–


View Full Document

UMD ASTR 415 - Computer Architecture

Download Computer Architecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computer Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computer Architecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?