DOC PREVIEW
UMD CMSC 714 - Synchronization and Communication in the T3E Multiprocess

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Synchronization and Synchronization and Communication in the Communication in the T3ET3EMultiprocessorMultiprocessorSteven L. ScottSteven L. ScottCray Research, IncCray Research, IncPresented by Presented by HariHariSivaramakrishnanSivaramakrishnanT3E FeaturesT3E FeaturesDistributed shared memory systemDistributed shared memory system••Up to 2GB memory per processorUp to 2GB memory per processor••DEC Alpha 21164 processorDEC Alpha 21164 processor••Shell Shell ––control and router chips, control and router chips, memory memory T3E FeaturesT3E FeaturesBufferingBuffering••Buffers can detect multiple interleaved streamsBuffers can detect multiple interleaved streamsLocal memory cachedLocal memory cached••No onboard cacheNo onboard cache••External External backmapbackmapto maintain data consistencyto maintain data consistencyEE--RegistersRegisters••512 user + 128 system512 user + 128 system••Remote communication and synchronizationRemote communication and synchronization••Highly pipelinedHighly pipelined••Extend the processorExtend the processor’’s physical address spaces physical address spaceGlobal CommunicationGlobal CommunicationOperations performed on EOperations performed on E--RegistersRegisters••Direct loads, stores between EDirect loads, stores between E--registers registers and processor registersand processor registers••Global operations (message passing, Global operations (message passing, synchronization, remote loads)synchronization, remote loads)Global referencesGlobal references••Global Virtual Address (GVA)Global Virtual Address (GVA)Address TranslationAddress TranslationGlobal Virtual Address (GVA)Global Virtual Address (GVA)Virtual PE numberVirtual PE numberCentrifugeCentrifuge••Mask, index, base Mask, index, base Should be only 6 bits, not 8Source or destination2Get and Put operationsGet and Put operationsReads and writes to global EReads and writes to global E--RegistersRegisters••Single word or a vectorSingle word or a vectorFlags on each register for synchronizationFlags on each register for synchronization••EmptyEmpty••FullFull••Memory to memory copy through EMemory to memory copy through E--registersregistersDoes not touch processor busDoes not touch processor bus••No RAW hazardsNo RAW hazardsHighly pipelinedHighly pipelined••256 bytes in 26.7ns can be issued256 bytes in 26.7ns can be issued••Large number of ELarge number of E--registersregisters••Max transfer rate = 480MB/s between two nodesMax transfer rate = 480MB/s between two nodesAtomic Memory OperationsAtomic Memory OperationsT3D used dedicated SWAP registersT3D used dedicated SWAP registersT3E uses memory locationsT3E uses memory locationsUniversal constructHow to perform an AMO?How to perform an AMO?Operands written to EOperands written to E--registersregistersStore to I/O space to trigger operationStore to I/O space to trigger operationAtomic Memory Operation packet sent to particular memory Atomic Memory Operation packet sent to particular memory locationlocationResult returned to EResult returned to E--Register specified on the address lineRegister specified on the address lineMost Most AMOsAMOsneed a readneed a read--modifymodify--write of RAMwrite of RAM••11 11 sysclockssysclocksat 147ns per clockat 147ns per clock••8M operations per second8M operations per secondHigh bandwidth High bandwidth fetch_and_incfetch_and_incserved out of buffer at served out of buffer at memory controller for each nodememory controller for each nodeMessagesMessagesT3DT3D••Single hardware message queue for user and system Single hardware message queue for user and system messagesmessages••Every message generates an interruptEvery message generates an interruptT3ET3E••Arbitrary number of message queuesArbitrary number of message queues••Mapped to memoryMapped to memory••Queue max size = 128 MB. Message size = 64 bytesQueue max size = 128 MB. Message size = 64 bytesMessage notificationMessage notification••Always interruptAlways interrupt••Never interrupt (polling)Never interrupt (polling)••Interrupt on a thresholdInterrupt on a thresholdMessage passing and shared memory integrationMessage passing and shared memory integrationMessage Queue Control WordMessage Queue Control WordDescriptor for a message queueDescriptor for a message queueMessages rejected when queue is fullMessages rejected when queue is fullIf message insertion creates a segmentation If message insertion creates a segmentation violation, violation, nacknackis returnedis returnedSending MessagesSending MessagesMessages assembled in an aligned block of 8 EMessages assembled in an aligned block of 8 E--RegistersRegistersSent to address of MQCWSent to address of MQCWMQCW updates and message storage are atomicMQCW updates and message storage are atomicEE--Registers status is set to empty on sendRegisters status is set to empty on send••If message accepted, changed to fullIf message accepted, changed to full••If message rejected, changed to fullIf message rejected, changed to full--sendsend--rejectedrejected3Barrier/Eureka SynchronizationBarrier/Eureka SynchronizationBarrierBarrier••Wait for Wait for allallprocessors to signal an eventprocessors to signal an eventEurekaEureka••Wait for Wait for somesomeprocessor to signal an eventprocessor to signal an eventBarrier/Eureka Synchronization unitsBarrier/Eureka Synchronization units••32 32 BSUsBSUs••MemoryMemory--mappedmapped••Set of processors given access to a BSUSet of processors given access to a BSUBarrier/Eureka TreesBarrier/Eureka TreesBarrier/Eureka network embedded in torus Barrier/Eureka network embedded in torus interconnectinterconnect••Keeps latency lower than a remote referenceKeeps latency lower than a remote referenceNetwork router has a register for each BSUNetwork router has a register for each BSU••Node can be configured as internal in BSU treeNode can be configured as internal in BSU tree••Information about which of six network directions is the Information about which of six network directions is the parentparentEurekasEurekasand Barrier notifications are sent to the and Barrier notifications are


View Full Document

UMD CMSC 714 - Synchronization and Communication in the T3E Multiprocess

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Synchronization and Communication in the T3E Multiprocess
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Synchronization and Communication in the T3E Multiprocess and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Synchronization and Communication in the T3E Multiprocess 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?