DOC PREVIEW
SDSU CS 696 - DORA

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DORA: Exploring a Dynamic File Assignment Strategy with Replication Jonathan Tjioe, Renata Widjaja, and Abraham Lee Computer Science Department San Diego State University San Diego, CA 92182 1. Introduction The problem of managing and distributing files to maximize disk performance has been a popular topic of many discussions [1][2][3][4][5]. There are several effective static algorithms that have addressed this issue such as the static round robin (SOR) algorithm. SOR has been proven to produce better response time than other static algorithms such as Greedy, Sort Partition (SP), and Hybrid Partition (HP) [1]. SOR is unique compared to the other static algorithms because it provides considerable performance improvements even if the workload assumption, which says that there is an inverse correlation between file size and its popularity (small files are more popular than large files), does not hold [1]. However, as its name states, it is a static algorithm, and its functionality is limited by the assumption that the file and their access patterns do not change over time. In reality, however, this workload assumption is not accurate for all cases. We, therefore, propose a new dynamic algorithm called dynamic round robin with replication (DORA). The main characteristic of DORA is that it takes into account the dynamic nature of file or data access patterns to uniquely adapt to changing users’ demand. In addition, DORA utilizes file replication to further maximize response time and file throughput; these will be further explained in the project details section. This proposal is organized as follows. Section 1 provides a brief introduction. In Section 2, the motivating factors prompting this research are detailed. Section 3 summarizes the project at a high level perspective. Section 4.1 begins by describing the overall architecture and implementation details of DORA. Section 4.2 discusses several challenges that must be addressed when implementing DORA with replication. Sections 4.3 and 4.4 outline specific deliverables and the schedule of the project. Finally, Section 5 reiterates the functionality of DORA and how it compares to other dynamic algorithms such as cool vanilla (C-V) [11] and simple cost minimization (CM) [11]. 2. Motivation Fast response time is a technology factor that end-users are accustomed to. In a world of distributed applications and web pages that grow increasingly more bandwidth intensive, considerable research has been done to improve methods which can lead to providing instantaneous response to the impatient end-user. Oftentimes, the physical disk is the bottleneck to providing timely response to users’ requests. As a result, much of today’s research centers on efficiently managing the assignment of file and disk scheduling. Some examples of these research areas are RAID architecture that focuses on data striping, data replication, and data mirroring to achieve high data throughput and high data reliability [6][7]. Substantial research has also been done to reduce disk head latencies associated with moving the head to the physical location of the data on disk [2][3]. For example, SOR has contributed a successful static file management algorithm which operates by first sorting the files according to their size and then allocating them to homogeneous disks in a round robin manner such that the popularity (heat) of files are distributed equally across the disks [1]. Moreover, several published dynamic algorithms such as C-V, and CM have also made significant contributions to this research field. C-V and CM both work by constantly monitoring the heat imbalance between disks in order to distribute them evenly across the disks. In addition the CM algorithm implements methods to find an optimized location for each newly created file; therefore, the cost from the need to reorganize files can be minimized [11]. The DORA algorithm furthers the research in the data management algorithms by dynamicallyassigning files, while still providing fast access to those files. Let us take the example of a web-server application, Amazon.com, where prospective buyers can search for a book title and the server will respond in a matter of seconds. Sellers can add new product which will be stored in the database, or modify an existing post, thereby changing the contents and ultimately the size of the file. In those kinds of scenarios, the assumption is that there are algorithms in place to manage the thousands of files in such a way such that users can receive fast response time to requests they have issued to the server. In such a dynamic scenario mentioned in the above example, static algorithms have a common drawback: they do not take into account the changing popularity (heat) of files. Therefore, the advantage of DORA over static algorithms is it has the ability to dynamically adapt to the heat of files by constantly monitoring and periodically rearranging the files in such as way that file throughput can be maximized without compromising the response time. Additionally, DORA will be designed to handle file reorganization while the system remains online and continues to serve users’ request. 3. Project Summary The following techniques will be implemented in DORA to improve the response time performance in a dynamic environment: file replication [10], file-size variance minimization [5], and garbage collection for files. In addition, simulations will corroborate that performance can be improved by considering the heat variance of files and replication method for hot files. Also, a performance comparison with other dynamic algorithms such as C-V and CM will be shown over varying workloads to simulate the robustness of the DORA algorithm. The simulation results will show that DORA will be able to improve the response time regardless of files’ sizes or the changing heat of files. 4. Project Details 4.1 Architecture and Environment Matlab software will be used for all simulations in this project. For benchmark comparison purposes, several other dynamic algorithms will be simulated against DORA: C-V, and CM. There will be ten homogeneous physical hard disks used in the simulation; however, this will not be a RAID configuration. Non-partitioned files of varying sizes and loads corresponding to variable heat distributions will be used to convincingly show that DORA significantly performs better than other dynamic algorithms by adding replication for


View Full Document

SDSU CS 696 - DORA

Download DORA
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DORA and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DORA 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?