U of I CS 525 - A Fast Array of Wimpy Nodes - D545569

Home> Schools> University of Illinois> Computer Science (CS) > CS 525> A Fast Array of Wimpy Nodes

DOC PREVIEW

U of I CS 525 - A Fast Array of Wimpy Nodes

School name University of Illinois

Course Cs 525- Advanced Graphics Processor Programming

Pages 51

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

FAWNHaystack_ver2RAIDFAWN A Fast Array of Wimpy Nodes David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan 22nd ACM Symposium on Operating Systems Principles – October 2009Projected Electricity Usage Source: Report to Congress on Server and Data Center Energy Efficiency – August 2007 2Distributed Key Value Store Dynamo Cassandra Voldemort 3200 Watts 410 Watts 200 Watts 5FAWN-KV store Frontend A B C D F E Frontend Switch Store Lookup Delete . . . . 6Flash • Fast random access – Optimized for random reads • Slow random writes – Sequential Writes using append only log 71001 Hash Index 000 001 010 100 101 1001 Y Offset Data Log FAWN-DS 101 Key Fragment Key Fragment Valid 8 Index bitsFAWN – Replication R = 3 A B C D F E 9FAWN – Join Protocol B D E C head tail 10FAWN – Join Protocol B D E C head tail pre-copy 11FAWN – Join Protocol B D E C head tail Log flush 12FAWN cluster Source: http://www.cs.cmu.edu/~fawnproj 13Throughput Source: http://www.sigops.org/sosp/sosp09/papers/andersen-sosp09.pdf 1100 - 1700 QPS per node 21 Node FAWN cluster - 20 GB data No Frontend cache 14Performance and Power System/Storage QPS Watts Queries/Joule Alix3cs/Sandisk(CF) 1298 3.75 346 Desktop / Mobi(SSD) 4289 83 51.7 Desktop / HD 171 87 1.96 256 Byte lookups 15FAWN vs Traditional servers Capital Cost + 3 Year Power Cost (0.10 /kWh) Source: http://www.sigops.org/sosp/sosp09/slides/andersen-slides-sosp09.pdf 16 Total Cost =Discussion • Rethink Hadoop/Dryad for FAWN – Read as Key Value Pairs in place of bulk reads • Low Power Processor vs SSD savings – CPU Intensive workloads • From RAID to FAWN – I/O bound drives, Memory wall – Flash Arrays, Limited power • Log Based store – More efficient for frequent reads 17Questions 18SSD vs HDD Sandisk 5000 HDD – WD2500 250GB 7400RPM Access Time 0.1 ms 13.4 ms Sequential read rate 28.5 MB/s Sequential write rate 24 MB/s Random read IOPS 1424 QPS Random write IOPS 125 QPS 19HAYSTACKPeter Vajgel, Doug Beaver and Jason SobelPresented by: Rini KaushikFacebook Photo Storage Needs 15 Billion Photos Each has 4 images  60 Billion Photos  1.5PB (1015) Bytes Growth rate 220 million new photos/week  25TB storage Bandwidth requirements550,000 photos/second550,000 photos/second Assuming avg photo size = 1MB  550GB/sec bandwidth 300 million users currently 1.3 billion people with quality internet 4x growth possible Two workloads Profile pictures – heavy access, smaller size Photos – intermittent access, more in the beginning, periodically afterwardsMotivation for Haystack Build a specialized store for photo workload Highly Scalable to meet growing storage needs High disk bandwidth Reduce metadata disk IOReduce metadata disk IO Reduce Content Distribution Network (CDN) reliance Build from commodity servers as opposed to expensive Netapp filers ($2million each) Simple key-value lookup of photos, no need for PosixEnter Haystack Generic Object store Several photos (needles) combined into a large 10 GB append able file (Haystack)Index file per Haystack for determining needle Index file per Haystack for determining needle offsetsThen and NowNASHaystackStorage Challenges before Haystack Photos stored in traditional Netapp’s NFS Filers (Network Attached Storage (NAS)) Metadata Too Huge to be Cached Posix compliance resulted in more metadata/file Each image a file  60 Billion Files  15TB metadata (256B inode)10 disk IO (3 with lookup cache) per file for metadata10 disk IO (3 with lookup cache) per file for metadata Drastically reduces disk throughput No direct path from client  storage  limited bandwidth Result Relied heavily on CDNs to cache data to meet goals 99.98% hit rate profile 92% photos NAS more as a backup Inefficient and ExpensiveHaystack objectHeaderHaystackFooterIndex FileAdvantages Reduced disk IO  higher disk throughput 1 MB of in-memory secondary metadata for every 1GB on-disk 10TB per node  10GB metadata  easily cacheable Simpler metadata  easier lookupsNot posixcompliantNot posixcompliant No directory structures/file names  64 Bit ID instead of file names Single photo serving and storage layer Direct IO path between client and storage  higher bandwidth Less metadata for XFSHaystack Infrastructure Photo Store Server Accepts HTTP requests  Haystack store operations Maintains in-memory Haystack Index Haystack Object StoreFilesystemFilesystem Extent based XFS Storage 2 x quad-core CPUs 16GB – 32GB memory hardware raid controller with 256MB – 512MB of NVRAM cache 12+ 1TB SATA drivesOperations Upload Photo assigned 64 bit ID Scaled into 4 image sizes profileID, photo key  pvolID (volumeID) mapping stored in MySQL DB pvolID used to identify the volume container of a haystackReadRead profileID , photo key, size and cookie Output – needle data Write/Modify Selects a haystack to store the photo Updates in-memory index Modify results in new version with higher offset DeleteExisting limitations/Discussion Adhoc data allocation of photos haystacks If photos (in the same album) are placed at different times by the same user, it would be good if they are placed sequentially or close by for better data locality. No support for delete/overwrites May lead to a lot of unnecessary versions and data  hence, reduced storage efficiency Compaction operation seems to be pretty expensive as it involves creating a new copy of haystack. LFS has a much more sophisticated cleaning mechanismWhat happens if request come at the same time?What happens if request come at the same time? If a file is updated, is it guaranteed to be placed on the same Haystack ID or a separate one? If old Haystack is already full, how will version check work? How will the older versions get identified and deleted? Assumes just one disk read per photo what if XFS doesn’t have the information in the cache, then it will have an extra lookup for the file Once, Haystack’s size becomes bigger than the largest extent size supported by XFS, extra lookups may be necessary if a needle is split across extents It would be good to have an abstraction at the album level as well to reduce the lookup overheadExisting

View Full Document