UMD CMSC 714 - Cluster-Based Scalable Network Services - D2815486

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 714> Cluster-Based Scalable Network Services

DOC PREVIEW

UMD CMSC 714 - Cluster-Based Scalable Network Services

School name University of Maryland, College Park

Course Cmsc 714- High Performance Computing Systems

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Cluster-Based Scalable Network ServicesArmando Fox, Steven D. Gribble, YatinChawathe, Eric A. Brewer and Paul GauthierPresented by Hari SivaramakrishnanAdvantages of Clustersz Scalabilityz Linear increase in hardware to handle loadz Adding resources easy for clustersz Availabilityz 24 x 7 service, despite transient hardware or software errorsz Nodes are independent in a cluster. Failures masked by softwarez Cost Effectivenessz Economical to maintain and expandz Commodity hardwareChallenges to using Clustersz Administrationz Software availablez Component vs System Replicationz Can support part of a service, not all of itz Handled in the architecture designz Functions are well described, and interchangablezPartial Failuresz Shared Statez None in a clusterz Can be emulated, but performance can be improved if need for shared state is minimizedArchitectural Featuresz Exploits strengths of cluster computingz Separation of content from servicesz Programming model based on composition of worker modelsz BASE semanticsz Basically Available, Soft State, Eventual Consistencyz Measurements and monitoringArchitecture of a SNS Layered Architecture2SNS Layerz Scalabilityz Use incrementally added nodes to spawn new componentsz Workers are simple and statelessz Centralized load balancingz Policy implemented in manager, can be changed easilyz Trace information collected from workers, decisions sent to FEsz Fault tolerantz Prolonged Bursts, Incremental growthz Overflow poolz Workers spawned by managerz APIz Provided by manager and FE to allow for new servicesz Worker stub handles load balancing, fault tolerance etc.z Worker code focuses on service implementationTACC : Programming modelz Transformationz Operation on a single data objectz Example : encryption, encoding, compressionz Aggregationz Collating data from various objectsz Customizationz User specific data automatically fed to workersz Same worker can be used with different parameter setsz Cachingz ISPs observed 40 – 50 % savings…criticalz Can cache original and transformed dataTansSendz Front Endsz SPARCstation machine clusterz HTTP interfacez Request served from cache if available or computedz 400 threadsz Load balancerz MS contacts manager to locate a distillerz WS accepts requests and reports load infoz Manager spawns distiller if load increasesTansSend contd.z Fault Tolerancez Registration system used to locate distillersz Timeouts detect dead nodesz All state is softz Watcher process needs to know if peer is alive by periodic monitoringz Peers start one anotherz Manager starts FEz FE starts a managerz Manager reports distiller failures to MS which updates its cachez Programmed in the manager stubsTransSend contd.z User profile databasez Normal ACID databasez Cachingz Harvest object cache workersz Distillersz Image processingz Off the shelf codez Did not have to remove all the bugs because if a node crashes, it will be restarted by a peerz Graphical Monitorz Detect system state and resource usageTansSend’s use of BASEz Load balancing dataz MS don’t have most recent informationz Errors are corrected by using timeoutsz Perf improvements outweigh problemsz Soft statez Transformed content is cachedz Approximate answersz If system is overloaded, can return a slightly different versionof data from cachez User can get accurate answer by resubmitting a request3Input Characteristics Cache Performancez Average cache hit takes 27ms to servez 95% of hits take less than 100msz Miss penalty anywhere from 100ms to 100sz Cache perf related to number of users and sizez Hit rate increases monotonically with sizez When sum of users exceeds cache size, hit rate fallsLoad balancingz Metric – queue length at distillersz New distillers spawned when load is very highz Delay D to allow for new distillers to stabilize the system before adding more distillersScalabilityz Limited by shared or centralized components –SAN, manager, user profile DBz DBz Was never near saturation in their testsz Managerz Has capability to handle three orders of magnitude more traffic than the peak loadz Even commodity hardware can get the job doneScalability of SANz Close to saturation, unreliable multicast traffic droppedz This information is needed by manager to load balancez Workaroundsz Separate network for data and control trafficz High performance interconnectEconomic Feasibilityz Caching saves an ISP a lot of moneyz A server can pay for itself in 2 monthsz Administration costs not consideredz Do not expect it to be very significant4Conclusionz Architecture works around deficiencies of using clustersz Defined a new programming model which makes adding new services extremely easyz BASE (weaker than ACID) semantics enhances

View Full Document