EECE 276 – Embedded SystemsPerformance, non-von Neumann systems 1EECE 276Embedded SystemsPerformance enhancementsOther devicesNon-von-Neumann machinesEECE 276 – Embedded SystemsPerformance, non-von Neumann systems 2High-performance ProcessingCache memoryCPUMEMCPUCacheMEM Runs at the speed of the slower. Higher performance Slower Faster accessLocality principle:If many memory references occur in the same “neighborhood”, then keeping that page in high-speed memory will improve performance.EECE 276 – Embedded SystemsPerformance, non-von Neumann systems 3High-performance ProcessingPipeliningFetch/Decode/Execute stages are done by independent units -> their function overlaps in time.EECE 276 – Embedded SystemsPerformance, non-von Neumann systems 4High-performance ProcessingDSP Architectures:O Traditional microprocessor with support for high-speed DSP-oriented instructions:» “Multiply-Accumulate” instructionO High-speed communication portsRISC: Reduced Instruction Set ComputerO Simple Load/Store Instructions, 1-cycleO Many registers, pipelined designO Complex compilerEECE 276 – Embedded SystemsPerformance, non-von Neumann systems 5Special DevicesABCF0 F1 F2 F3Programmable Array LogicAND-array: productsOR-array: sums“Programmable” connectionsEECE 276 – Embedded SystemsPerformance, non-von Neumann systems 6Special DevicesField Programmable Gate ArrayFlexible “fabric” to implement arbitrary digital circuitsI/O BlocksLogic Blocks LUTs, memoryInterconnectsIO IO IO IOIO IO IO IOIOIOIOIOIOIOIOIOLBLBLBLBLBLBLBLBLBLBLBLBLBLBLBLBEECE 276 – Embedded SystemsPerformance, non-von Neumann systems 7Non-von-Neumann SystemsClassic: Single Instruction/Single DataParallel systems:O Multiple Instructions/Single Data» Pipelined architectures» Very Long Instruction Word architecturesO Single Instruction/Multiple Data» Systolic arrays – all elements perform the same operationO Multiple Instructions/Multiple Data» Full multi-processing» Dataflow architectures» Transputers, DSP networks: high-speed comm. ports» Time-triggered architecture: time-shared bus – fault tolerantEECE 276 – Embedded SystemsPerformance, non-von Neumann systems 8Non-von-Neumann SystemsExamples:O MISD: » VLIW: TransMeta Processor. X86 instructions are transcribed on-the-fly into VLIW instructionsO SIMD:» Systolic arrays for real-time image processingO MIMD:» Multi-processor servers: multiple CPUs, shared busEECE 276 – Embedded SystemsPerformance, non-von Neumann systems 9Non-von-Neumann SystemsTime-Triggered ArchitectureO Time-shared, scheduled busO Communication Network InterfaceO Fault-tolerant clock synchronization protocolAll tasks and communications are strictly scheduled at design time. Nodes must exhibit “fault-silent”
View Full Document