This preview shows page 1-2-3-4-5-6-7-48-49-50-51-52-53-54-97-98-99-100-101-102-103 out of 103 pages.
CS433: Computer System OrganizationNote on FiguresIntroductionSystem UsesBasestation with Specialized PartsTigerSHARC replaces many partsCore Block DiagramCoreComputation Block Block DiagramComputation BlockSlide 11Vector AdditionX-side vector ops (2x32)X-side vector ops (4x16)Y-side vector opsY-side vector ops (4x16)XY vector ops (4x16 X&Y)ISA Summary - RegistersISA Summary – Assembly FormatISA Summary - PredicatesISA Summary - ALUSlide 22ISA Summary - MultiplierISA Summary – CLU and ShifterISA Summary - IALUISA Summary - SequencerInstruction EncodingInstruction Encoding (Compute Block)Application UsesFIR FilterSlide 31FIR Filter - OverallFIR Filter – some detailsFIR Filter Loop Detail – Half UnrollFIR Filter HighlightsTigerSHARC PipelinePipeline - NotesTiming – Result Use ConstraintsTiming – Result UseProcessor CoreProcessor PeripheralsUniprocessor ConfigurationMultiprocessor ConfigurationMemory and BussesSlide 45Clock DomainsRegister Data FormatsInstruction Line OrganizationCompute Block RegistersX/YStat Upper BitsX/YStat Lower BitsRegister Name SyntaxUniversal Registers4x16 Vector AdditionInstruction SummariesALU ArithmeticSlide 57ALU LogicalALU MiscellaneousALU Floating PointSlide 61ALU Multiplier (32 bit)ALU Multiplier (quad 16-bit)ALU Multiply 32-bit ComplexALU ShifterSlide 66J Unit Arithmetic and LogicalK Unit Arithmetic and LogicalLoad Data RegsStore Data RegsSequencerSlide 72Slide 73Slide 74Compute BlockFields in Standard Compute Block InsnSlide 77IALULoad and StoreSlide 80Link PortsLink Port BuffersLink Ports – 4 bit modeLink Ports – 1 bit modeLink Ports – TS to TSLink Port 4-bit DDRSuccessful TransferNACK’ed transferDelays and TimingTiming - BranchesMemoryMemory Block DetailSlide 93Block DiagramPhysical Pin LayoutPinout DescriptionSlide 97MechanicalMechanical ConsiderationsSlide 100Slide 101Thermal NoteConclusionsCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 1CS433: Computer System OrganizationLuddy HarrisonAnalog Devices Incorporated TigerSHARC®CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 2Note on FiguresSome of the figures in this lecture are taken from the Analog Devices Hardware and Programming References for the TigerSHARC 201.CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 3IntroductionDigital Signal Processor (DSP) by Analog Devices Inc.System-on-chip designRAM on chipDMA controller on chipSDRAM controller on chip“Static Superscalar” (VLIW)Hardware will issue things in parallel if told toBut it examines dependencies, and has no alignment requirements4-way issueSIMD (single instruction multiple data) capableVery high throughputCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 4System Uses3G cellular/wireless infrastructureLibraries to deal with physical-layer processing available for licenseSoftware-defined radiosImage processingEmbedded applicationsUse anywhere for high throughput math, with low power and costIntended to replace ASICs and FPGAsCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 5Basestation with Specialized PartsCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 6TigerSHARC replaces many partsCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 7Core Block DiagramCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 8CoreTwo I-ALUs, J and KDo integer arithmetic, moves, and load/storesCalled “Data Address Generators” in original SHARCTwo computation blocks, X and YDo ALU-type things4 bussesJ/K for dataI for instructionsS for outside worldInternal memoryOn-die DRAM memory (24 Mbits)Also has buffers/cacheProgram sequencerInstruction fetch, branch prediction, BTB, controlSystem on chip interfaceConnects to main external bus, SRAM and DRAM controllers, link ports, JTAG port, etc.CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 9Computation Block Block DiagramCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 10Computation BlockFour functional unitsALUDoes arithmetic, logic, and packed data manipulationMultiplierDoes multiply, multiply-accumulate, and complex numbersShifterDoes shifts, rotates, and bit-field manipulationsCLUCommunications logic unitNifty things for communications and codingCan use two of them every cycleCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 11Computation BlockData Alignment BufferHelps align unaligned accesses to circular buffersRegister file32 registers, 32 bits eachMemory mappedFunctional units can operate on “packed” dataTreat a 32 or 64 bit wide chunk of data as many 16 or 8 bit sub-chunksUsed for SIMD – single instruction multiple dataCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 12Vector AdditionCS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 13X-side vector ops (2x32)+=XR0:1XR2:3XR4:5XR4:5 = R0:1 + R2:3CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 14X-side vector ops (4x16)+=XR0:1XR2:3XR4:5XSR4:5 = R0:1 + R2:3CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 15Y-side vector ops+=YR0:1YR2:3YR4:5YR4:5 = R0:1 + R2:3CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 16Y-side vector ops (4x16)+=YR0:1YR2:3YR4:5YSR4:5 = R0:1 + R2:3CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 17XY vector ops (4x16 X&Y)+=XR0:1XR2:3XR4:5+=YR0:1YR2:3YR4:5SR4:5 = R0:1 + R2:3CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 18ISA Summary - RegistersALU register names have four fields:Which compute block(s)The data type “R”The number of the registerNumbers: 32x32 bit registersXR0, XR1, ..., XR3116x64 bit double registersXR1:0, XR3:2, ..., XR31:308x128 bit quad registersXR3:0, XR7:4, ..., XR31:28XR0 overlaps with XR1:0, etc.Which compute block:XR0 //X blockYR0 //Y blockXYR0 //Both (SIMD)R0 //Shorthand for bothOnly matters for destination, as sources must match (except move)Data types:XBR0 // A set of bytesXSR0 //A set of shorts (16 bits)XR0 //A set of words (32 bits)XR1:0 //A set of words (32 bits)XLR3:0 //A set of longs (64 bits)XFR0 //A 32 bit floatXFR1:0 //A 40 bit floatXSTAT/YSTAT for status flagsIALU is simplerJ and K register filesJ0, J1, to J31, K0,
View Full Document