Faster!U-Net : A User-Level Network Interface for Parallel and Distributed ComputingBackground – Fast ComputingIssuesIssues (contd.)U-Net PhilosophyDo MPPs do this?Basic U-Net architectureThe U-Net ArchitectureU-Net Architecture (contd.)ObservationsZero-copy and True zero-copyKernel emulated end-pointU-Net ImplementationU-Net PerformanceU-Net Active Messages LayerAM – Micro-benchmarksSplit-C application benchmarksTCP/IP and UDP/IP over U-NetPerformance GraphsConclusionLightweight Remote Procedure CallsMotivationMeasurementsOverheads in Cross-domain CallsAvailable solutions?Solution proposed : LRPCsOverview of LRPCsImplementation - BindingData Structures used and createdCallingStub GenerationWhat are optimized here?Argument CopyBut… Is it really good enough?Other Issues – Domain TerminationMultiprocessor IssuesEvaluation of LRPCCost Breakdown for the Null LRPCThroughput on a multiprocessorSlide 41NOWFaster!Faster!Vidhyashankar VenkataramanVidhyashankar VenkataramanCS614 PresentationCS614 PresentationU-Net : A User-Level Network U-Net : A User-Level Network Interface for Parallel and Interface for Parallel and Distributed ComputingDistributed ComputingBackground – Fast ComputingBackground – Fast ComputingEmergence of MPP – Massively Parallel Processors in Emergence of MPP – Massively Parallel Processors in the early 90’sthe early 90’sRepackage hardware components to form a dense configuration Repackage hardware components to form a dense configuration of very large parallel computing systemsof very large parallel computing systemsBut require custom software But require custom software Alternative : Alternative : NOWNOW (Berkeley) – Network Of Workstations (Berkeley) – Network Of WorkstationsFormed by inexpensive, low latency, high bandwidth, scalable, Formed by inexpensive, low latency, high bandwidth, scalable, interconnect networks of workstationsinterconnect networks of workstationsInterconnected through fast switchesInterconnected through fast switchesChallenge: To build a scalable system that is able to use the Challenge: To build a scalable system that is able to use the aggregate resources in the network to execute parallel programs aggregate resources in the network to execute parallel programs efficientlyefficientlyIssuesIssuesProblem with traditional networking architecturesProblem with traditional networking architecturesSoftware path through kernel involves several copies Software path through kernel involves several copies - processing overhead- processing overheadIn faster networks, may not get application speed-up In faster networks, may not get application speed-up commensurate with network performancecommensurate with network performanceObservations:Observations:Small messages : Processing overhead is more Small messages : Processing overhead is more dominant than network latencydominant than network latencyMost applications use small messagesMost applications use small messagesEg.. UCB NFS Trace : 50% of bits sent were messages of Eg.. UCB NFS Trace : 50% of bits sent were messages of size 200 bytes or lesssize 200 bytes or lessIssues (contd.)Issues (contd.)Flexibility concerns:Flexibility concerns:Protocol processing in kernelProtocol processing in kernelGreater flexibility if application specific Greater flexibility if application specific information is integrated into protocol information is integrated into protocol processingprocessingCan tune protocol to application’s needsCan tune protocol to application’s needsEg.. Customized retransmission of video Eg.. Customized retransmission of video framesframesU-Net PhilosophyU-Net PhilosophyAchieve flexibility and performance byAchieve flexibility and performance byRemoving kernel from the critical path Removing kernel from the critical path Placing entire protocol stack at user levelPlacing entire protocol stack at user levelAllowing Allowing protectedprotected user-level access to user-level access to networknetworkSupplying full bandwidth to small messagesSupplying full bandwidth to small messagesSupporting both novel and legacy protocolsSupporting both novel and legacy protocolsDo MPPs do this?Do MPPs do this?Parallel machines like Meiko CS-2, Thinking Parallel machines like Meiko CS-2, Thinking Machines CM-5Machines CM-5Have tried to solve the problem of providing user-level Have tried to solve the problem of providing user-level access to networkaccess to networkUse of custom network and network interface – No Use of custom network and network interface – No flexibilityflexibilityU-Net targets applications on standard U-Net targets applications on standard workstationsworkstationsUsing off-the-shelf componentsUsing off-the-shelf componentsBasic U-Net architectureBasic U-Net architectureVirtualize N/W device so Virtualize N/W device so that each process has that each process has illusion of owning NIillusion of owning NIMux/ Demuxing device Mux/ Demuxing device virtualizes the NIvirtualizes the NIOffers protection!Offers protection!Kernel removed from Kernel removed from critical pathcritical pathKernel involved only in Kernel involved only in setupsetupThe U-Net ArchitectureThe U-Net ArchitectureBuilding BlocksBuilding BlocksApplication End-pointsApplication End-pointsCommunication Segment(CS)Communication Segment(CS)Message QueuesMessage QueuesSendingSendingAssemble message in CSAssemble message in CSEnQ Message DescriptorEnQ Message DescriptorReceivingReceivingPoll-driven/ Event-drivenPoll-driven/ Event-drivenDeQ Message DescriptorDeQ Message DescriptorConsume messageConsume messageEnQ buffer in free QEnQ buffer in free QAn application endpointA region of memoryU-Net Architecture (contd.)U-Net Architecture (contd.)More on event-handling (upcalls)More on event-handling (upcalls)Can be UNIX signal handler or user-level interrupt handlerCan be UNIX signal handler or user-level interrupt handlerAmortize cost of upcalls by batching receptionsAmortize cost of upcalls by batching receptionsMux/ Demux :Mux/ Demux :Each endpoint uniquely identified by a tag (eg.. VCI in ATM)Each endpoint uniquely identified by a tag (eg.. VCI in ATM)OS performs initial route setup and security tests and registers a OS performs initial route setup and security tests and registers a tag in U-Net for that applicationtag in U-Net for that applicationThe message tag mapped to a communication channel
View Full Document