This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The MemoryTopics for discussionData and Computation ContinuumOn chip memoryOn board memorySystem memoryOff-system storage (Earlier Lectures covered these)Database and Database Management SystemDistributed file system(DFS)Issues with ultra-scale dataOn to Google FileHadoop File System (HFS)MapReduceExercise: Count the number of occurrences of the word in the textThe MemoryB. RamamurthyC B. Ramamurthy 1Topics for discussion•On chip memory•On board memory•System memory•Off system/online storage/ secondary memory•File system abstraction•Offline/ tertiary memory•RAID: Redundant Array of Inexpensive Disks •NAS: Network Accessible Storage•SAN: Storage area networks•DB and DBMS: Data base and DB management systems•Distributed file system•Google file system•Hadoop file systemC B. Ramamurthy 2Data and Computation ContinuumCompute intensiveEx: computation of digits of PIData intensiveEx: analyzing web logsC B. Ramamurthy 3On chip memory•Registers•Cache•Buffers (instruction pipeline)•Characteristics: volatileC B. Ramamurthy 4On board memory•Cache –Instructions cache–Data cache–Translation look aside buffers (TLB)•Characteristics: content addressable, set-associative organizationC B. Ramamurthy 5System memoryC B. Ramamurthy 6Off-system storage (Earlier Lectures covered these)C B. Ramamurthy 7Database and Database Management System•Data source•Transactional •Data base server•Relational db or similar foundation•Tables, rows, result set, SQL•ODBC: open data base connectivity•Very successful business model: Oracle, DB2, MySQL, and others•Persistence models: EJB, DAO, ADO (I am not going to expand the abbreviation.. )C B. Ramamurthy 8Distributed file system(DFS)•A dedicated server manages the files for an compute environment•For example, nickelback,cse.buffalo.edu is your file server and that is why we did not want you to run your user applications on this machine.•DFS addresses various transparencies: location transparency, sharing, performance etc.•Examples: NFS, NFS+, AFS (Andrew FS)… (you will study these in Distributed Systems course)C B. Ramamurthy 9Issues with ultra-scale data•How to store the large amount of data? –On commodity hardware or special hardware•Large storage implies large number of devices to store them.–How to address shortening MTTF (Mean time to failure)? –How to realize “fault tolerance”?–Redundancy/replication is a solution•How to manage the replication and the health of the large number of devices?•More importantly how to partition the large scale data to store in these storage devices (nodes)?•How to parallelize processing of the data stored at multiple “nodes”?C B. Ramamurthy 10On to Google File•Internet introduced a new challenge in the form web logs, web crawler’s data: large scale “peta scale”•But observe that this type of data has an uniquely different characteristic than your transactional or the “order” data on amazon.com: “write once” ; so is HIPPA protected healthcare and patient information;•Google exploited this characteristics in its Google file system: S. GhemavatC B. Ramamurthy 11Hadoop File System (HFS)•Hadoop file system is a reverse engineered version of the GFS : this is my first opinion on HFS•HFS is a distributed file system for large scale data•Data throughput is more important than latency•Batch computing than interactive time shared computingC B. Ramamurthy 12CatBatDogOther Words(size:TByte)mapmapmapmapsplitsplitsplitsplitcombinecombinecombinereducereducereducepart0part1part2MapReduceExercise: Count the number of occurrences of the word in the textThis is a cat. Cat sits on a roof. The roof is a tin roof. There is a tin can on the roof. Cat kicks the can. It rolls on the roof and falls on the next roof. The cat rolls too.It sits on the can.C B. Ramamurthy


View Full Document

UB CSE 421 - The Memory

Documents in this Course
Security

Security

28 pages

Threads

Threads

24 pages

Security

Security

20 pages

Security

Security

52 pages

Security

Security

20 pages

Load more
Download The Memory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Memory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Memory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?