DOC PREVIEW
Duke CPS 210 - I/O Buffering and Streaming

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1I/O Buffering and StreamingI/O Buffering and StreamingI/O Buffering and CachingI/O Buffering and CachingI/O accesses are reads or writes (e.g., to files)Application access is arbitary (offset, len)Convert accesses to read/write of fixed-size blocks or pagesBlocks have an (object, logical block) identityBlocks/pages are cached in memory• Spatial and temporal locality• Fetch/replacement issues just as VM paging• Tradeoff of block sizeI/OI/OApplication processingI/O initiation (e.g., syscall, driver, etc.)I/O access request latency (e.g., disk seek, network)block transfer (disk surface, bus, network)I/O completion overhead (e.g., block copy)Effective BandwidthEffective BandwidthCall this the gap gDefine G to be transfer time per byte (bandwidth = 1/G) Block size is B bytes; transfer time is BGg BGWhat’s the effective bandwidth (throughput)? Impact of Transfer SizeImpact of Transfer SizeB/(gi+ BGi)B = transfer sizeg = overhead (µs)G = inverse bandwidthFor these curves, G matches 32-bit 33MHz PCI and Myrinet LANai-4 link speed (132 MB/s).Bubbles in the I/O PipelineBubbles in the I/O PipelineThe CPU and I/O units are both underutilized in this example.In this case, latency is critical for throughput.There are “bubbles” in the pipeline: how to overlap activity on the CPU and I/O units?• Multiprogramming is one wayBut what if there is only one task?Goals: keep all units fully utilized to improve throughput.Hide the latency2Prefetching Prefetching or Streamingor Streaming PredictionPredictionCompiler-driven• Compile-time information about loop nests, etc.Markov prediction• “Learn” repeated patterns as program executes.Pre-execution• Execute the program speculatively, and watch its accessesQuery optimization or I/O-efficient algorithm• “Choreograph” I/O accesses for complex operationHow to get application-level hints to the kernel?Hinting or asynchronous I/OReadaheadReadaheadn n+1App requests block nApp requests block n+1n+2System prefetchesblock n+2System prefetchesblock n+3Readahead: the system predictively issues I/Os in advance of need. This may use low-level asynchrony or create threads to issue the I/Os and wait for them to complete (e.g., RPC-based file systems such as NFS).Prefetching Prefetching and Streaming I/O: Examplesand Streaming I/O: ExamplesParallel disksNetwork data fetchE.g., network memoryFetch from server cacheLatency for request propagationLatency for arm movementPrefetching Prefetching and I/O Schedulingand I/O SchedulingAsynchronous I/O or prefetching can expose more information to the I/O system, which may allow it to schedule accesses more efficiently .E.g., read one large block with a single seek/rotation.The I/O Pipeline and I/O OverheadThe I/O Pipeline and I/O OverheadNetwork data fetchBandwidth-limitedFaster networkCPU-limitedIn this example, overhead rather than latency is the bottleneck for I/O throughput. How important is it to reduce I/O overhead as I/O devices get faster?3Can Can Prefetching Prefetching Hurt Performance?Hurt Performance?Prefetching“trades bandwidth for latency”.• Need some bandwidth to trade…Mispredictions impose a cost.How deeply should we prefetch?• Prefetching requires memory for the prefetch buffer.• Must prefetch deeply enough to absorb bursts.• How much do I need to avoid stallsFixed-depth vs. variable depth• ForestallFile Block Buffer CacheFile Block Buffer CacheHASH(vnode, logical block)Buffers with valid data are retained in memory in a buffer cache or file cache .Each item in the cache is a buffer headerpointing at a buffer .Blocks from different files may be intermingled in the hash chains.System data structures hold pointers to buffers only when I/O is pending or imminent.- busy bit instead of refcount-most buffers are “free”Most systems use a pool of buffers in kernel memory as a staging area for memory<->disk transfers. Why Are File Caches Effective?Why Are File Caches Effective?1. Locality of reference: storage accesses come in clumps.• spatial locality: If a process accesses data in block B, it is likely to reference other nearby data soon.(e.g., the remainder of block B)example: reading or writing a file one byte at a time• temporal locality: Recently accessed data is likely to be used again.2. Read-ahead: if we can predict what blocks will be needed soon, we can prefetch them into the cache.• most files are accessed sequentiallyI/O Caching vs. Memory CachesI/O Caching vs. Memory CachesAssociativitysoftware to track referencesvariable-cost backing storage (e.g., rotational)what's different from paging?• but don't need to sample to track referencesAlso: access properties are differentI/O Block Caching: When, What,Where?I/O Block Caching: When, What,Where?Question: should I/O caching be the responsibility of the kernel?…or…Can/should we push it up to the application level? Page/block cache(Be sure you understand the tradeoffs.)ReplacementReplacementWhat’s the right cache replacement policy for sequentially accessed files?How is replacement different from virtual memory page cache management?How to control the impact of deep prefetching on the cache?• Integrated caching and prefetching4Handling Updates in the File CacheHandling Updates in the File Cache1. Blocks may be modified in memory once they have been brought into the cache.Modified blocks are dirty and must (eventually) be written back.2. Once a block is modified in memory, the write back to disk may not be immediate (synchronous).• Delayed writesabsorb many small updates with one disk write.How long should the system hold dirty data in memory?• Asynchronous writesallow overlapping of computation and disk update activity ( write-behind).Do the write call for block n+1 while transfer of block n is in progress.• Thus file caches also can improve performance for writes.WriteWrite--BehindBehindThis is write-behind.Prediction? Performance? Memory cost? Reliability?Delayed WritesDelayed WritesThis is a delayed write strategy.Prediction? Performance? Memory cost? Reliability?Block N Block N Block NBlock N, byte range i, j, k.Block NWrite Batching/GatheringWrite Batching/GatheringThis combines delayed write and write-behind.Prediction? Performance? Memory cost? Reliability?Block N Block N+1 Block N+2Block N, N+1, N+2.Block N to N+2Exploiting Asynchrony in WritesExploiting Asynchrony in WritesAdvantages:• Absorb multiple writes to the same block.• Batch consecutive writes to a single


View Full Document

Duke CPS 210 - I/O Buffering and Streaming

Download I/O Buffering and Streaming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view I/O Buffering and Streaming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view I/O Buffering and Streaming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?