Unformatted text preview:

Aaron Witt EECC 756 May 17 1999 Computational Grids 1 Introduction Advancements in networking have for the first time allowed levels of communication and therefore cooperation that were previously unattainable As the next logical step in high performance distributed computing computational grids seek to attain levels of performance never seen before by connecting computers from around the world together into a single machine capable of executing a wide array of parallel applications Such a system providing such great levels of performance enables classifications of applications that were also previously unrealistic However this type of distributed system really defines a whole new model for computing Instead of an application being restricted by the computing power of the local system the production of computational power is now de coupled from the its use This concept introduces new avenues for programming models as well as the focus of a grid application is now on locating the appropriate resources to run the application and describing in a general way the operation of the algorithm rather than specifying a low level sequence of processor commands that is not easily portable to other architectures Attaining this goal requires meeting a number of significant challenges however Foremost a computational grid will be an inherently dynamic system as nodes and entire sites are brought on and offline Changes in network loading and demand which affect the system performance will increase the dynamic aspect of the environment Secondly the heterogeneous nature of the resources in the grid must not only be supported but exploited in order to achieve the best possible performance In this paper we will develop a set of characteristics that better define a computation grid examine some of the application areas that are enabled through grid technologies and provide an overview of the Globus Metacomputing Toolkit which attempts to provide a standard interface through which grid components can be build 2 Evolution The construction of computational grids is directly based off the techniques used in earlier distributed systems These systems simply placed more stringent requirements on the type of computational resources used in the system and the scale to which they can be distributed 2 1 Clustering The term clustering is usually used to refer to a collection of homogeneous workstations connected together on a local area network and used as a single parallel computer Typically these workstations are off the shelf PCs using standard Ethernet IP interconnections This type of environment is inexpensive to set up and the wide availability of message passing libraries like PVM have made clusters the most numerous type of parallel computers to date This environment is exactly what is currently in existence at RIT However there have been several different projects to create higher performance clusters than the simple version described above Often this involves enhancements to two specific components of the system The first of these is the physical interconnection system Ethernet is a relatively slow connection and cannot provide the network performance needed for high performance distributed computing Although switched megabit and gigabit ethernet networks improve upon their predecessor the highest performance clusters use special purpose interconnection hardware like that provided by Myrinet Servernet or VIA The second enhancement is in the form of special purpose communication protocols Standard TCP IP communication requires too much overhead to achieve the desired performance One such solution is the FastMessages package that is part of the High Performance Virtual Machine software used in the NCSA NT supercluster This messaging protocol is optimized for the low latency high bandwidth environment needed for optimal performance in favor of some of the error recovery options available with standard TCP IP Using both the Myrinet interconnection hardware and HPVM the NCSA NT supercluster has been able to achieve processing rates of over 4 gigaflops NT Supercluster Network Topology 2 2 Heterogeneous distributed computing The next obvious step from localized groups of homogenous machines is to localized groups of heterogeneous machines The focus of these machines was interoperability both amongst differing hardware platforms and differing operating systems Typically they used existing widely used application interfaces like RPC to achieve this interoperability While there are working implementations of this class of machine they have in general not achieved much of the potential performance available This in part due to the fact that while these machines allowed heterogeneity they did not exploit it 2 3 Metacomputing The final leap is to create distributed groups of heterogeneous machines or in essence making unlike supercomputers talk to each other and inter operate The goal in this type of machine is to exploit the characteristics of each machine rather than to abstract them For example if an application contained some vector code this portion of the code would be executed on a vector machine in the metacomputer while the massively parallel code would be executed on a massively parallel machine or cluster Here our focus is on both interoperability and performance We already know how to create high performance SMPs and supercomputers Existing fiber interconnection technologies can be used to bridge the gap between supercomputer sites and still maintain the performance level needed All that is then required to create a metacomputer or computational grid are the software components to provide the interoperability and other needed services 3 Grid Characteristics Now that we have described a rough model of a computational grid we will define more precisely what attributes are required to form a computational grid 3 1 Scaling and selection A computational grid infrastructure must be able to support thousands of sites Although existing examples do not exceed 100 individual sites any standard must be able to scale to the sizes that will be required in the near future Once this increased capacity is realized it will be necessary for applications to specify which of the many available resources it wishes to access In many cases some of this selection will be done automatically by the run time libraries in order to achieve the maximum system performance so there must be information available to perform this


View Full Document

RIT EECC 756 - Computational Grids

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Computational Grids and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Grids and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?