DOC PREVIEW
HARVARD CS 263 - Modern Distributed Systems

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS263 - Modern Distributed SystemsSeptember 16, 2003Matt [email protected]://www.eecs.harvard.edu/~mdw/course/cs2631IntroductionWhat is this course about?• Modern renaissance in “systems research”• “Systems” is no longer about the kernel running on your desktop• Networking vast numbers of computers together yields a new kind of systems scienceHarness large amounts of resources for novel applications• Enormous clusters as scalable Internet services (e.g., Google)• The Grid: Global network of supercomputers allowing a user virtually anywhere tosubmit jobs to run onNapster/Gnutella/KaZaa - pool resources to “share” music• Napster spawned a new research field• “Peer-to-peer computing”• Try to build legitimate applications on this modelEmbedding networked systems in the physical world• Sensor networks allow us to monitor and compute over real physical space at highresolutionMatt Welsh, Harvard University 2The goals of this classSurvey the brave new world of systems researchRead a bunch of papers on exciting new topicsExperiment with sensor networksDo a research project on your favorite topicHopefully publish a paper on your workMatt Welsh, Harvard University 3Other Systems Classes this TermCS261 - Advanced Operating Systems (M. Seltzer)• Focus on historical context and “operating systems”• Here we focus on networked, scalable, decentralized systemsCS246 - Power-Aware Computing (D. Brooks)• Really “Advanced Computer Architecture”• Both hardware and software approaches to energy managementCS222 - Algorithms at the End of the Wire (M. Mitzenmacher)• An algorithms class, but focuses on great ideas that one can apply to real systemsMatt Welsh, Harvard University 4Running themesMassive scale• Internet consists of hundreds of millions of nodes• Potential number of users is incredibly large• CNN.com on Sept 11, 2001: 30,000 hits a secondSelf-organization and decentralization• No central authority managing, organzing, or deploying system• e.g., Gnutella nodes discover each other through broadcasting advertisements• Any part of the network can be taken down and the rest will surviveRobustness and fault tolerance• Novel systems not deployed on well-maintained, well-configured hardware in an “en-gineered environment”• Systems must tolerate unprecedented degrees of heterogeneity and rate of failureMatt Welsh, Harvard University 5Large-scale Internet ServicesComplex, multi-tiered, clustered systems• Front-end web server pool• Cluster of “middle tier” application logic servers• Back-end databaseGreat deal of work on scalability, performance, and reliability• Load balancing across nodes to avoid bottlenecks• Rapid failover in case of node failure• Novel directions in data storage and retrievalMatt Welsh, Harvard University 6Overload in the InternetOverload is an inevitable aspect of systems connected to the Internet• (Approximately) infinite user populations• Large correllation of user demand (e.g., flash crowds)• Peak load can be orders of magnitude greater than averageModern Internet services as highly dynamic• Web servers do much more than serve up static pages• e.g., server-side scripts (CGI, PHP), SSL, database access• Requests have highly unpredictable CPU, memory, and I/O demands• Makes overload very difficult to predict and manageMatt Welsh, Harvard University 7Massive Overload is Sudden and UnpredictableSeptember 11 - unprecedented web traffic• CNN: 20x over expected peak - 30,000+ hits/sec• Grew server farm by 5x by scrounging machines, but still no service for 2.5 hoursUSGS site load after M7.1 earthquake• 3 orders of magnitude increase in 10 minutes, disk log filled up0102030405060708000:00 03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00Hits per secondTimeUSGS Web server loadMatt Welsh, Harvard University 8Paper TopicsClustered search engines• Inktomi and Google• “Lessons from massive scale services”Storage systems• Distributed Data Structures - novel approach to cluster-based storage• Porcupine - A clustered, replicated e-mail serverManaging concurrency and load• SEDA - Event-driven server design for managing massive concurrency• Capriccio - lightweight threads to simplify server designResource management in clustered systems• Allocation of resources to different competing apps on a cluster• Develop models of application resource requirements and loadMatt Welsh, Harvard University 9Peer-to-peer computing and the GridHuge, decentralized systems formed out of heterogeneous machinesscattered across the Internet• Napster and Gnutella among the first examples• Lots of recent research to build useful systems on this paradigmDistributed, massively replicated data storage• Access your data from anywhere, efficiently, and robustly• Replicate data across many nodes on the Internet, encode and encrypt to ensuresecurity• Question - Are wide-area distributed filesystems really compelling?“The Grid”• Pool the CPU/memory/disk resources of the world’s supercomputers• Allow anyone to get access to vast amounts of computing powerDistributed Hash Tables• All data represented as (key, value) pairs• Each node in the system associated with a range of keys• Route messages to the node with the appropriate keyMatt Welsh, Harvard University 10Chord approach• Nodes associated with address between 0 . . . 2N• Data stored on node with nearest address• Nodes maintain set of “fingers” to other nodes• Route request to nearest predecessor to requested key½¼1/321/641/1281/16Matt Welsh, Harvard University 11Research ChallengesData caching and replication• Replicate data to improve resilience to node failure• Cache frequently requested data items along lookup pathSecurity and resilience to attacks• e.g., “Sybil attack” – attacker owns enough nodes in keyspace to foil lookupsLocality and performance• Reduce number of hops for each lookup, avoid long-distance hopsCA-T1CCIArosUtahCMUTo vu.nlLulea.seMITMA-CableCiscoCornellNYUOR-DSLN20N40N80N41Matt Welsh, Harvard University 12Paper TopicsP2P storage systems• Focus on complete, vertical systems: Pond and PASTIndexing and search in P2P systems• Going beyond just lookup of documents with keys• PIER: Run a database on top of a P2P DHT!Fun and funky P2P Applications• SplitStream: High-bandwidth content distribution• Palimpsest: Soft-capacity, best-effort storage for wide area


View Full Document

HARVARD CS 263 - Modern Distributed Systems

Download Modern Distributed Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Modern Distributed Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Modern Distributed Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?