Yingping Huang and Gregory MadeyAutonomic Web-based SimulationSlide 3Slide 4AWS RequirementsCheckpointing (Self-healing/Optimizing)Expected Execution TimeModeling Simulation ExecutionProactive Failure DetectionAutonomic Infrastructure for AWSSelf-Configuring under AWSSelf-Healing under AWSSelf-Healing under AWS (contd)Slide 14Self-Optimizing under AWSSelf-Protecting under AWSConclusions and Future WorkAgnostic Question #1Agnostic Question #2Agnostic Question #3Agnostic Question #4Agnostic Question #5Agnostic Question #6Agnostic Question #7Agnostic Question #8Agnostic Question #9Yingping Huang and Gregory MadeyUniversity of Notre DameAWS utonomic eb-based imulationPresented by Tariq M. KingPublished by the IEEE Computer Society in the 38th Annual Simulation Symposium (April 2005)2Autonomic Web-based SimulationWeb-based Simulation +Autonomic ComputingMotivations:Scientific simulations are large programs that will probably contain errors when deployed to webIncreased complexity in large-scale web-based simulations due to integration of different pieces of servicesGoal: Self-manageable Web-based SimulationsAWS3Brain controls higher order conscious activities (thought, reasoning, abstraction) Brain also controls lower level involuntary activities called autonomic functionsANS monitors and regulates in such a way that there is no conscious human involvementANS was the basis for IBM’s Autonomic Computing initiative for system self-managementHuman Nervous System = CNS + PNS4Autonomic Computing VisionIBMAdapt to dynamically changing environmentsMonitor and tune resources automaticallyDiscover, diagnose & react to disruptionsAnticipate, detect, identify and protect against attacks45AWS Requirements1. Simulation checkpointing and restarting2. Simulation self-awareness and proactive failure detection3. Self-manageable computing infrastructure to host simulationsAWS6Checkpointing (Self-healing/Optimizing)RQ1Checkpointing is often used in simulations, databases, systems and operations researchDetermining optimal checkpoint is not trivialExcessive => performance degradationDeficient => expensive redo Both yields a longer execution timeAn optimization problem is formed7Expected Execution Time78Modeling Simulation Execution89Proactive Failure DetectionMajor cause of simulation crashes is low memoryAPI’s in J2SE 5.0 can be used for:External monitoring using external monitoring softwareInternal monitoring by adding logic inside the simulationE.g. MXBeans Low Mem Notification => checkpoint and terminate gracefullyRQ210Autonomic Infrastructure for AWSRQ3Autonomic Agent on each serverAutonomic Manageron DB serverFirewall/Routerwith Standby DBwith Standby DW10Autonomic IP forwarding switch11Self-Configuring under AWSAutonomic discovery of new serversAutonomic resize of server poolAutonomic configuration of firewall/router, application servers and simulation serversAutonomic configuration of the database server and the data warehouseAWS12Self-Healing under AWSSome degree of redundancy is required to achieve self-healing in AWSHot standby data warehouse and hot standby databaseDatabase and data warehouse are designed on two physical hostsServer pool ensures that when an application server is down, other servers can pick up its tasksAWS13Self-Healing under AWS (contd)Application Servers autonomic agent monitors execution statusuntimely response => failed app serverNew server started and IP forwarding is changed by the autonomic agent on the firewallSimulation ServersAutonomic agents upload operating system metrics (load avg, free memory)This also serves as the “heart-beat”, if the autonomic manager doesn’t receive the heart-beat => failed simulation serverAWS14Self-Healing under AWS (contd)Database Servers The autonomic manager resides on the DBS.Vital to keep server running 24/7Whenever primary database is down, database connections can be failed over to the standby database.SimulationsCheckpointingDispatcher redistributes crashed simulations to appropriate simulation servers.AWS15Self-Optimizing under AWSLoad balancing the server pool Achieved by the Dispatcher and the Autonomic AgentsNew simulation is assigned to the simulation server with the lowest OS loadAgents check Dispatcher table periodically to start any unassigned simulationsAt each checkpoint, Agents check with the Autonomic Manager to see if migration is necessarySimulations on heavily loaded servers are checkpointed and restarted on light servers.AWS16Self-Protecting under AWSCareful configuration of the firewallSecurity configuration on the gridUsers of the grid must register and be verified by the system administratorSystem administrator must assign appropriate user rolesUse of data model tables USERS, USER_ROLES, VERIFIERIs this self-protecting/autonomic?AWS17Conclusions and Future WorkPaper presents a prototype of autonomic web-based simulationImplementation of an autonomic infrastructure to support AWS is discussedFuture work focuses on implementing more autonomic features into AWSAWS18Agnostic Question #1The authors describe one possible implementation of autonomic web-based simulations. One example for a project that uses such an implementation is the NOM project. Do you know of any other projects that have been proposed or developed? How do they compare to each other in terms of efficiency, technique and architecture used?AWS19Agnostic Question #2The paper states that web-based simulations need to be deployed through computing systems (i.e. storage devices, database, web servers and simulation servers). Can you think of any component(s) involved that would increase the level of complexity more than the other?AWS20Agnostic Question #3One method the authors provide for handling faults after they have occurred is through the use of checkpointing and restarting. Which approach is better:Using static checkpointing (fixed time intervals)Using dynamic checkpointing (context-specific, amount of computation, etc)AWS21Agnostic Question #4The authors suggest that for a system to achieve autonomic features, that system must become even more complex by embedding the complexity into the system infrastructure itself. Is there any approach that involves less complexity in achieving autonomic features? If yes, give examples.AWS22Agnostic Question #5One method given by the
View Full Document