DOC PREVIEW
Berkeley COMPSCI 186 - Storing Data: Disks and Files

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Storing Data: Disks and FilesCS 186 Spring 2006, Lecture 3(R&G Chapter 9)“Yea, from the table of my memoryI’ll wipe away all trivial fond records.”-- Shakespeare, HamletThe BIG PictureQuery Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBQueriesTransactions: ACID Properties• Key concept is a transaction: a sequence of database actions (reads/writes).– Bracketed by “Begin Transaction” and “Commit” or “Abort”– (actually, first statement starts a transaction, and you end with “commit work” or “rollback work”)• For Transactions, DBMS ensures:•Atomicity(all-or-nothing property) even if system crashes in the middle of a Xact.• Each transaction, executed completely, must take the DB between Consistentstates or must not run at all.• Concurrent transactions appear to run in Isolation.•Durability of committed Xacts even if system crashes.Disks and Files • DBMS stores information on disks.– In an electronic world, disks are a mechanical anachronism!• This has major implications for DBMS design!–READ: transfer data from disk to main memory (RAM).–WRITE: transfer data from RAM to disk.– Both are high-cost operations, relative to in-memory operations, so must be planned carefully!Why Not Store It All in Main Memory?•Costs too much. $100 will buy you either 1 GB of RAM or 150 GB of disk (EIDI/ATA) today.– High-end Databases today in the 10-200 TB range.– Approx 60% of the cost of a production system is in the disks.•Main memory is volatile. We want data to be saved between runs. (Obviously!)• Note, some specialized systems do store entire database in main memory. – Vendors claim 10x speed up vs. traditional DBMS running in main memory.The Storage HierarchySource: Operating Systems Concepts 5th Edition –Main memory (RAM) for currently used data.–Disk for the main database (secondary storage).–Tapes for archiving older versions of the data (tertiary storage).Smaller, FasterBigger, SlowerQUESTION: Why does it have to be a hierarchy?2Jim Gray’s Storage Latency Analogy: How Far Away is the Data?RegistersOn Chip CacheOn Board CacheMemory Disk1210100Tape /Optical Robot109106SacramentoThis CampusThis RoomMy Head10 min1.5 hr2 Years1 minPluto2,000 YearsAndromedaDisks• Secondary storage device of choice. • Main advantage over tapes: random accessvs.sequential.– Also, they work. (Tapes deteriorate over time)• Data is stored and retrieved in units called disk blocks or pages.• Unlike RAM, time to retrieve a disk page varies depending upon location on disk. – Therefore, relative placement of pages on disk has major impact on DBMS performance!Anatomy of a Disk PlattersThe platters spin (say, 150 rps).SpindleThe arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder(imaginary!).Disk headArm movementArm assemblyOnly one head reads/writes at any one time.TracksSector Block size is a multiple of sector size (which is fixed).Newer disks have several “zones”, with more data on outer tracks.Accessing a Disk Page• Time to access (read/write) a disk block:–seek time (moving arms to position disk head on track)–rotational delay (waiting for block to rotate under head)–transfer time (actually moving data to/from disk surface)• Seek time and rotational delay dominate.– Seek time varies from about 1 to 20msec– Rotational delay varies from 0 to 10msec– Transfer rate is < 1msec per 4KB page• Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?• Also note: For shared disks most time spent waiting in queue for access to arm/controllerSeekRotateTransferSeekRotateTransferWaitArranging Pages on Disk• `Next’ block concept: – blocks on same track, followed by– blocks on same cylinder, followed by– blocks on adjacent cylinder• Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay.• For a sequential scan, pre-fetchingseveral pages at a time is a big win!• Also, modern controllers do their own caching.Disk Space Management• Lowest layer of DBMS software manages space on disk (using OS file system or not?).• Higher levels call upon this layer to:– allocate/de-allocate a page– read/write a page• Best if a request for a sequenceof pages is satisfied by pages stored sequentially on disk! Higher levels don’t need to know if/how this is done, or how free space is managed.3• Homework 1 – PostgreSQL Buffer Manager– Two parts: 1 Individual, 1 with group of 2– Individual part (no programming) due next Wed.– To be posted today– Will be discussed in Sections this Wednesday• TAs will be running a PostgreSQL/C programming tutorial T.B.D.– Good if at least one of each group can go.• Group memberships due next Wednesday– Assignment handout will tell how to register.• Eugene Wu is officially on the team!• Web page is mostly there!Administrivia BreakTrivia BreakLast week, Seagate announced a new 2.5” disk drive for notebook PCs with up to 160GB capacity. The new capacity breakthrough they claimed was:A. They put bits onbothsides of each platter.B. They increased the spin to 7,200 RPM.C. They used quantum bits instead of regular ones.D. They placed the bits perpendicular to the platter instead of flat.E. They switched to three-valued bits instead of boring old ones and zeros.ContextQuery Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBBuffer Management in a DBMS•Data must be in RAM for DBMS to operate on it!•Buffer Mgr hides the fact that not all data is in RAMDBMAIN MEMORYDISKdisk pagefree framePage Requests from Higher LevelsBUFFER POOLchoice of frame dictatedby replacement policyWhen a Page is Requested ...• Buffer pool information table contains: <frame#, pageid, pin_count, dirty>• If requested page is not in pool:– Choose a frame for replacement (only un-pinned pages are candidates)– If frame is “dirty”, write it to disk– Read requested page into chosen frame•Pin the page and return its address.  If requests can be predicted (e.g., sequential scans)pages can be pre-fetchedseveral pages at a time!More on


View Full Document

Berkeley COMPSCI 186 - Storing Data: Disks and Files

Documents in this Course
Load more
Download Storing Data: Disks and Files
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Storing Data: Disks and Files and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Storing Data: Disks and Files 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?