Aspen Buy and Sell Spare CPU Cycles Sam Davies Martin Ouimet Reuben Sterling sdavies mouimet benster mit edu Abstract We describe a system that allows computer owners who have unused cycles to rent out their machines to people who want to run many computational jobs The system is designed to run 1 2 hour jobs with deterministic results that do not require intermediate communication between nodes or computation checkpoints 1 We introduce a payment and accounting scheme according to a simple measure of worker machine quality the speed of computing jobs 2 we implement policy to detect cheating workers 3 we safely handle worker node failures through replication jobs across multiple machines 4 we allow untrusted CPU owners to modify their worker client code and 5 we check the correctness of clients by comparing replicated results I I NTRODUCTION Aspen allows people in need of extra computing resources to purchase computing time from other computer owners Typical uses of computing resources include Pharmaceutical research 8 indexing algorithms 2 and machine learning 9 Organizations involved in this type of research could benefit from a need based computing grid 8 Unlike other grid computing initiatives Aspen enables any computer owner to make his or her computing resources available to anyone who needs extra resources Computer owners receive financial compensation in exchange for completing computations requested by those in need of computing resources Aspen creates a free market for the exchange of computer resources where sellers and buyers set their own prices based on the quality of the resources they offer or require Although many examples of grid computing exist today they are limited in scope Projects like SETI home 13 only benefit one organization and require charitable users to offer their spare resources for free Projects like PlanetLab 12 make computation resources available to the general population but offer no financial incentive to project participants Furthermore PlanetLab operates in a community model and requires users of resources to contribute to the resource pool by offering their own computation resources Aspen is unique because it provides a financial incentive to users who contribute computation resources and allows any user with money to benefit from those available resources Aspen is intended to facilitate distributed processing of computations that are easy to parallelize That is Aspen works best with computations that can be broken down into small jobs whose results can be computed solely as a deterministic function of their arguments For example Aspen makes it easy for programmers to distribute to different computers computations performed inside the body of a loop The types of computations that are suitable for Aspen require that each parallelizable unit of computation be deterministic and run long enough to justify the overhead of network communication Offering financial compensation to CPU owners increases the potential computational resources but adds new challenges to the system for example there is a substantially higher motivation for sellers to try to cheat the system Aspen detects cheaters by replicating jobs and comparing the results and the completion times of each job Furthermore Aspen has a responsibility to ensure that buyers of computing resources receive the resources they are paying for For example the purchased resources could potentially degrade over time when the system slows down whether it is due to network latency load on the seller s machine or intentional seller cheating Aspen uses benchmark testing to detect the performance degradation and to take corrective action In the event that a purposeful slowdown by the seller to earn more money is detected Aspen will not charge the buyer of the resource and will not pay the seller This paper is divided into 11 sections The Goals section identifies the key problems that Aspen addresses The Design section explains the main functionality of the Brokerage System and Runtime Architecture and specifies how parties on either side can participate in the Aspen project The Goals Revisited section explains how the presented design meets the stated goals The Alternative Designs section lists a few of the alternate approaches that were considered and explains why the chosen approach is superior to the alternatives The Relevant Work section situates the contribution of the Aspen project in relation to existing grid computing projects The Future Work and Conclusion sections summarize the contributions of the Aspen project A Terminology This paper uses the following terms Consumer A machine that must execute a large suite of parallelizable jobs has inadequate computation resources and requires other machines to perform its computations in a timely manner The consumer owner wants to execute the aggregate computations faster than his resources allow and is therefore willing to pay money for extra computation resources Worker A machine that owns excess computational resources and makes those resources available to consumers The worker owner offers his machine s resources to the Aspen community for a specified minimum hourly price Job A job is the smallest unit of computation on Aspen and runs on a single machine A consumer s parallelizable computation will be broken down into many jobs Jobs must be deterministic and therefore may not make use of sources of randomness such as random number generators or multiple threads We introduce this restriction because Aspen uses redundant computations to ensure correct results A job is expected to run on the order of 1 or two hours A Consumer is likely to run many jobs as part of a single large computation A simple example of this is style of computation is the map function 2 which applies a single function to each item in a list and produces a list of results Aspen allows parts of the list to be sent to different machines so that the function may be run in parallel before the results are collated on the consumer s machine Aspen In the context of the Aspen system design the term Aspen refers to one or more servers that under the control of the Aspen project distribute jobs to workers return results to consumers manage executing jobs and handle financial transactions Aspen System The term Aspen system encompasses the entire computation network including Aspen all workers and all consumers II G OALS In light of the enumerated challenges Aspen seeks to fulfill the
View Full Document
Unlocking...