CMU CS 15213 - Lecture - D1639672

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15213> Lecture

DOC PREVIEW

CMU CS 15213 - Lecture

School name Carnegie Mellon University

Course Cs 15213- Introduction to Computer Systems

Pages 44

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Carnegie MellonIntroduction to Computer Systems15-213/18-243, spring 200912thLecture, Feb. 19thInstructors:Gregory Kesden and Markus PüschelCarnegie MellonLast Time Program optimization Optimization blocker: Memory aliasing One solution: Scalar replacement of array accesses that are reusedfor (i = 0; i < n; i++) {b[i] = 0;for (j = 0; j < n; j++)b[i] += a[i*n + j];}for (i = 0; i < n; i++) {double val = 0;for (j = 0; j < n; j++)val += a[i*n + j];b[i] = val;}Carnegie MellonLast Time Instruction level parallelism Latency versus throughputFunctionalUnitsInteger/BranchFPAddFPMult/DivLoad StoreGeneralIntegerStep 11 cycleStep 21 cycleStep 101 cyclelatency cycles/issueInteger Multiply 10 1Carnegie MellonLast Time Consequence**1 d0d1*d2*d3*d4*d5*d6*d7**1 d1d3*d5*d7***1 d0d2*d4*d6Twice as fastCarnegie MellonToday Memory hierarchy, caches, locality Cache organization Program optimization: Cache optimizationsCarnegie MellonProblem: Processor-Memory BottleneckMain MemoryCPU RegProcessor performancedoubled about every 18 monthsBus bandwidthevolved much slowerCore 2 Duo:Can process at least256 Bytes/cycle(1 SSE two operand add and mult)Core 2 Duo:Bandwidth2 Bytes/cycleLatency100 cyclesSolution: CachesCarnegie MellonCache Definition: Computer memory with short access time used for the storage of frequently or recently used instructions or dataCarnegie MellonGeneral Cache Mechanics0 1 2 34 5 6 78 9 10 1112 13 14 158 9 14 3CacheMemoryLarger, slower, cheaper memoryviewed as partitioned into “blocks”Data is copied in block-sized transfer unitsSmaller, faster, more expensivememory caches a subset ofthe blocks444101010Carnegie MellonGeneral Cache Concepts: Hit0 1 2 34 5 6 78 9 10 1112 13 14 158 9 14 3CacheMemoryData in block b is neededRequest: 1414Block b is in cache:Hit!Carnegie MellonGeneral Cache Concepts: Miss0 1 2 34 5 6 78 9 10 1112 13 14 158 9 14 3CacheMemoryData in block b is neededRequest: 12Block b is not in cache:Miss!Block b is fetched frommemoryRequest: 12121212Block b is stored in cache• Placement policy:determines where b goes• Replacement policy:determines which blockgets evicted (victim)Carnegie MellonCache Performance Metrics Miss Rate Fraction of memory references not found in cache (misses / accesses)= 1 – hit rate Typical numbers (in percentages): 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc. Hit Time Time to deliver a line in the cache to the processor includes time to determine whether the line is in the cache Typical numbers: 1-2 clock cycle for L1 5-20 clock cycles for L2 Miss Penalty Additional time required because of a miss typically 50-200 cycles for main memory (Trend: increasing!)Carnegie MellonLets think about those numbers Huge difference between a hit and a miss Could be 100x, if just L1 and main memory Would you believe 99% hits is twice as good as 97%? Consider: cache hit time of 1 cyclemiss penalty of 100 cycles Average access time:97% hits: 1 cycle + 0.03 * 100 cycles = 4 cycles99% hits: 1 cycle + 0.01 * 100 cycles = 2 cycles This is why “miss rate” is used instead of “hit rate”Carnegie MellonTypes of Cache Misses Cold (compulsory) miss Occurs on first access to a block Conflict miss Most hardware caches limit blocks to a small subset (sometimes a singleton) of the available cache slots e.g., block i must be placed in slot (i mod 4) Conflict misses occur when the cache is large enough, but multiple data objects all map to the same slot e.g., referencing blocks 0, 8, 0, 8, ... would miss every time Capacity miss Occurs when the set of active cache blocks (working set) is larger than the cacheCarnegie MellonWhy Caches Work Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality:  Recently referenced items are likely to be referenced again in the near future Spatial locality:  Items with nearby addresses tend to be referenced close together in timeblockblockCarnegie MellonExample: Locality? Data: Temporal: sum referenced in each iteration Spatial: array a[] accessed in stride-1 pattern Instructions: Temporal: cycle through loop repeatedly Spatial: reference instructions in sequence Being able to assess the locality of code is a crucial skill for a programmersum = 0;for (i = 0; i < n; i++)sum += a[i];return sum;Carnegie MellonLocality Example #1int sum_array_rows(int a[M][N]){int i, j, sum = 0;for (i = 0; i < M; i++)for (j = 0; j < N; j++)sum += a[i][j];return sum;}Carnegie MellonLocality Example #2int sum_array_cols(int a[M][N]){int i, j, sum = 0;for (j = 0; j < N; j++)for (i = 0; i < M; i++)sum += a[i][j];return sum;}Carnegie MellonLocality Example #3int sum_array_3d(int a[M][N][N]){int i, j, k, sum = 0;for (i = 0; i < M; i++)for (j = 0; j < N; j++)for (k = 0; k < N; k++)sum += a[k][i][j];return sum;} How can it be fixed?Carnegie MellonMemory Hierarchies Some fundamental and enduring properties of hardware and software systems: Faster storage technologies almost always cost more per byte and have lower capacity The gaps between memory technology speeds are widening True of registers ↔ DRAM, DRAM ↔ disk, etc. Well-written programs tend to exhibit good locality These properties complement each other beautifully They suggest an approach for organizing memory and storage systems known as a memory hierarchyCarnegie MellonAn Example Memory Hierarchyregisterson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)Larger, slower, cheaper per byteremote secondary storage(tapes, distributed file systems, Web servers)Local disks hold files retrieved from disks on remote network serversMain memory holds disk blocks retrieved from local disksoff-chip L2cache (SRAM)L1 cache holds cache lines retrieved from L2 cacheCPU registers hold words retrieved fromL1 cacheL2 cache holds cache lines retrieved from main memoryL0:L1:L2:L3:L4:L5:Smaller,faster,costlierper byteCarnegie MellonExamples of Caching in the HierarchyHardware0On-Chip TLBAddress translationsTLBWeb browser10,000,000Local diskWeb pagesBrowser cacheWeb cacheNetwork buffer cacheBuffer cacheVirtual MemoryL2 cacheL1 cacheRegistersCache TypeWeb pagesParts of filesParts of files4-KB page64-bytes block64-bytes block4-byte wordsWhat is Cached?Web proxy server1,000,000,000Remote server disksOS100Main

View Full Document

CMU CS 15213 - Lecture

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

CMU CS 15213 - Lecture

Sign up for free to view:

Please select your school