DOC PREVIEW
Berkeley COMPSCI 152 - Locality and Memory Technology

This preview shows page 1-2-3-4-24-25-26-50-51-52-53 out of 53 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 19 Locality and Memory TechnologyReview: Explicit Renaming/Limits to ILPRecall: Upper Limit to ILP: Ideal MachineRecall: More Realistic HW: Branch ImpactThe Big Picture: Where are We Now?Recall: Who Cares About the Memory Hierarchy?Recall: Memory Hierarchy of a Modern Computer SystemImpact of Memory Hierarchy on AlgorithmsQuicksort vs. Radix as vary number keys: InstructionsQuicksort vs. Radix as vary number keys: Instrs & TimeQuicksort vs. Radix as vary number keys: Cache missesMain Memory BackgroundRandom Access Memory (RAM) TechnologyStatic RAM CellTypical SRAM Organization: 16-word x 4-bitLogic Diagram of a Typical SRAMTypical SRAM TimingProblems with SRAMAdministrative IssuesMain Memory Deep Background1-Transistor Memory Cell (DRAM)DRAM Capacitors: more capacitance in a small areaClassical DRAM Organization (square)DRAM logical organization (4 Mbit)DRAM physical organization (4 Mbit)Logic Diagram of a Typical (Asynchronous) DRAMDRAM Read TimingDRAM Write TimingKey DRAM Timing ParametersDRAM PerformanceSomething new: Structure of Tunneling Magnetic JunctionMain Memory PerformanceSlide 33Increasing Bandwidth - InterleavingSlide 35Independent Memory BanksFewer DRAMs/System over TimeFast Page Mode OperationSlide 39What does “Syncronous” RAM mean?Example: SDRAM timing for Lab6DRAMs over TimeDRAM HistoryDRAM Design GoalsToday’s Situation: DRAMSlide 46The Art of Memory System DesignA Summary on Sources of Cache MissesExample: 1 KB Direct Mapped Cache with 32 B BlocksSet Associative CacheDisadvantage of Set Associative CacheExample: Fully AssociativeSummaryCS152Computer Architecture and EngineeringLecture 19Locality and Memory TechnologyApril 12, 2004John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.2Review: Explicit Renaming/Limits to ILP°Explicit Renaming: more physical registers than ISA. •Separates renaming from scheduling -Opens up lots of options for resolving RAW hazards•Rename table: tracks current association between architectural registers and physical registers•Potentially complicated rename table management°Multi-issue: simple matter of accounting•Must do dataflow analysis across multiple instructions simultaneously•Rename table updated as if instructions happened serially!°To sustain: need execution bandwidth+commit bandwidth•To sustain ILP of X need at least-X-way issue, > X execution bandwidth (for mix), X way commit °Limits to ILP•Inherent parallelism of applications as high as 150 IPC•Realistic limits rapidly reduce this to < 4 IPC for most applications4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.3ProgramsInstruction Issues per cycle020406080100120140160gcc espresso li fpppp doducd tomcatv54.862.617.975.2118.7150.1Integer: 18 - 60FP: 75 - 150IPCRecall: Upper Limit to ILP: Ideal Machine4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.4ProgramInstruction issues per cycle0102030405060gcc espresso li fpppp doducd tomcatv3541166158609121048156764613456674514452222941946PerfectSelective predictorStandard 2-bitStatic NoneChange from Infinite window to examine to 2000 and maximum issue of 64 instructions per clock cycleProfileBHT (512)Pick Cor. or BHTPerfect No predictionFP: 15 - 45Integer: 6 - 12IPCRecall: More Realistic HW: Branch Impact4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.5°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Locality and Memory Hierarchy•Administrivia•SRAM Memory Technology•DRAM Memory Technology•Memory OrganizationThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.6µProc60%/yr.(2X/1.5yr)DRAM9%/yr.(2X/10 yrs)110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)PerformanceTime“Moore’s Law”Processor-DRAM Memory Gap (latency)Recall: Who Cares About the Memory Hierarchy?“Less’ Law?”4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.7Recall: Memory Hierarchy of a Modern Computer System°By taking advantage of the principle of locality:•Present the user with as much memory as is available in the cheapest technology.•Provide access at the speed offered by the fastest technology.ControlDatapathSecondaryStorage(Disk)ProcessorRegistersMainMemory(DRAM)SecondLevelCache(SRAM)On-ChipCache1s10,000,000s (10s ms)Speed (ns): 10s 100s100s GsSize (bytes): Ks MsTertiaryStorage(Tape)10,000,000,000s (10s sec)Ts4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.8Impact of Memory Hierarchy on Algorithms°Today CPU time is a function of (ops, cache misses)°What does this mean to Compilers, Data structures, Algorithms?•Quicksort: fastest comparison based sorting algorithm when keys fit in memory•Radix sort: also called “linear time” sortFor keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys°“The Influence of Caches on the Performance of Sorting” by A. LaMarca and R.E. Ladner. Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January, 1997, 370-379.•For Alphastation 250, 32 byte blocks, direct mapped L2 2MB cache, 8 byte keys, from 4000 to 40000004/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.9Quicksort vs. Radix as vary number keys: InstructionsJob size in keysInstructions/keyRadix sortQuicksort4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.10Quicksort vs. Radix as vary number keys: Instrs & TimeTimeJob size in keysInstructionsRadix sortQuicksort4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.11Quicksort vs. Radix as vary number keys: Cache missesCache missesJob size in keysRadix sortQuicksortWhat is proper approach to fast algorithms?4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec19.12°Performance of Main Memory: •Latency: Cache Miss Penalty-Access Time: time between request and word arrives-Cycle Time: time between requests•Bandwidth: I/O & Large Block Miss Penalty (L2)°Cache uses SRAM : Static Random Access Memory•No refresh (6 transistors/bit vs. 1 transistor)Size: DRAM/SRAM 4-8 Cost/Cycle time: SRAM/DRAM 8-16°Main Memory is DRAM : Dynamic Random Access Memory•Dynamic since needs to be refreshed periodically (8 ms)•Addresses divided into 2 halves (Memory as a 2D


View Full Document

Berkeley COMPSCI 152 - Locality and Memory Technology

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Locality and Memory Technology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Locality and Memory Technology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Locality and Memory Technology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?