Unformatted text preview:

1BR 1/99 1Stratix FPGA HomeworkCompare Altera Stratix and FLEX families in several areas.Basic mechanism for implementing random logic is the Logic Element which contains a 4-input Lookup Table + DFF. This has not changed much between families.One difference is that Stratix added synchronous load and clear logic to every LE; this is a common functionality to have in register which was implemented as part of the LUT4 logic in the FLEX10K. Also, old ‘cascade’ function in FLEX for wide AND/OR functions generalized in Stratix by allowing LUTs to chained together for wide functions (chain input is dedicated route between logic elements).BR 1/99 2New in StratixBR 1/99 3Fast Addition/SubtractionFLEX10K had carry logic in each LE and dedicated carry chain routing between LEs to speed up carry propagation so that LUT4 and programmable routing did not have be used for carry logic.Fact: The speed of carry generation determines speed of adder for binary addition. Stratix does several things to make addition and adder/subtractor faster and more efficient. BR 1/99 4Adder/Subtractor OperationB[3:0]A[3:0]+CinSub01Y[3:0]2/1 MuxB[3:0]A[3:0]+CinSubY[3:0]XORA – B = A + B’ + 1BR 1/99 5New to Stratix LE.Used for building adder/subtractor, do not have to use LUT logic for this -- will speed up the operation and be more efficient in terms of resource usage.BR 1/99 6Carry-Select Adder4bit Ripple Adder4bit Ripple Adder4bit Ripple AdderThe Carry path is the slowest path in the ripple carry adder. We can speed it up with the following scheme (8-bit adder):A[3:0] B[3:0]Cin‘0’A[7:4] B[7:4]Sum[3:0]A[7:4] B[7:4]‘1’10Sum[7:4]CoutNote that Cout of 1st4-bit stage selects the correct sum of next stage. Upper stage requires two 4bit adders2/1 mux2BR 1/99 7Carry-Select Adder (larger N)RplA[3:0]B[3:0]CinSum[3:0]10Sum[8:4]Co‘0’A[8:4] B[8:4]Rpl‘1’A[8:4]B[8:4]Rpl0110‘0’A[15:9] B[15:9]Rpl‘1’Rpl5 bit rpl4 bit rpl7 bit rplA[15:9]B[15:9]Sum[15:9]BR 1/99 8Note that in this mode, LUT4 split into four 2-input LUTs, for computing sum bits with carry-in = 0,1 and computing carry outs for carry-in = 0,1BR 1/99 910-bit Carry Select Adder – note that the stage size is 5 bits.Can configure the stage size to be any size that is desired.BR 1/99 10Higher Level Arithmetic SupportFlex10K had no support for higher level arithmetic support otherthan implementation as a netlist of LUT4s. Stratix has monolithic multipliers which can be configured as 9x9 or 18x18 multipliers. Four 18x18 multipliers can be used with adedicated adder to form a 36x36 multiplier. Multiplier sub-blocks are embedded in a DSP block. Output of multipliers to adder block that can be used for accumulation.An interesting omission is that the DSP block cannot do saturating arithmetic.BR 1/99 11Can do either 9x9 or 18x18 multiplication, signed or unsigned.BR 1/99 12Accumulator can be up to 52 bits (36 bit product from 18x18, plus 16 bits of accumulation)3BR 1/99 13 BR 1/99 14Summary of DSP Block ModesBR 1/99 15SRAM ComparisonFlex10K: 2Kb single port SRAM blocks, configured as 256x8, 512x4, 1024x2, or 2048 x 1 . Any dual port SRAM support is strictly multi-cycle dual port SRAM.Stratix: three different blocks sizes available 512b, 4Kb, and one large block whose size is dependent upon the part.True Dual Port SRAM available.BR 1/99 16In dual port modes, widths of the ports can be differentBR 1/99 17True dual port – any combination of reads/writes on ports at same/different clock frequenciesBR 1/99 18Simple Dual-port – only support read and write in same clock cycle.Good for FIFOs.4BR 1/99 19Clocking ComparisonFlex 10K: each LE can select 1 of 2 global clocks. PLL provides clock multiplication by 2, and also syncs internal clock edges to external clock edges.Stratix: Hierarchical clocking scheme, 16 global clock networks, driven by 4 enhanced PLLs. 16 regional clocks (4 per device quadrant), and 8 dedicated fast regional clock networks.Clock Frequency Scaling: m/(n * post-scale counter) where m,n , post scale counter all go from 1 to 512. M, N used for clock frequency, post-scale counter controls duty cycle.BR 1/99 20Clock SkewWe have used the equation:reg-to-reg delay = C2q + MaxCombDelay + TsuIt is actually:reg-to-reg delay = C2q + MaxCombDelay + Tsu + TskewWhere Tskew is the clock skew. Clock skew is the difference in arrival times of clock edges at DFFs on the device. Tskew is determined by die size, propagation delay across chip. Gate delays used to be large compared to Tskew, so could ignore. As transistor lengths have scaled down, gates have gotten fasterand can no longer ignore Tskew.BR 1/99 21Why Hierarchical Clocks?Why have global, regional and fast regional clocks? Because can specify different skews on the two clock networks – regional clocks will have smaller skew than global clocks, so any register-to-register paths clocked only by a regional clock can have tighter timing than a register to register path that crosses a regional boundary.Can also save power BR 1/99 22Can be used for any clocking sourceBR 1/99 23Regional clocks are associated with a particular chip quadrant.BR 1/99 24Suggested uses for fast regional clocks are high fanout signals like synchronous loads/clears, clock enables.5BR 1/99 25IO Technology• Input/Output (IO) has become very complex– Used to only have to worry about TTL vs CMOS– TTL had current drive requirements, CMOS just voltage level requirements– Both used full swing signals (0 to Vdd, used to be 5 V)• New issues in IO technology– Limit voltage swing to speed up signaling – Voltage swing about a reference voltage instead of between 0 and Vdd– Differential signaling to reject noise– Termination required to prevent signal reflections from corrupting signalsBR 1/99 26Stratix supported IO standardsBR 1/99 27Classification of Advanced IO standards• Single Ended- Full swing signals between 0 and VCCIO, tolerant of overdrive of input signals, various current drive capability– LVTTL, LVCMOS -- tolerant of 5V overdrive, expects 3.3 V– 2.5V, 1.8V, 1.5V – simply lower voltage versions of LVTTL, LVCMOS– PCI 3.3V, PCI-X 3.3V – expects 3.3 V input signals, used to limit signal overshoot. But makes the pin intolerant of any drive that is over the clamping limit unless external series resistor used to limit current. BR 1/99 28BR 1/99 29Clamping diode can be turned off/on for each IO, clamp to VCCIO. The VCCIO is the voltage of IO, and can be


View Full Document

MSU ECE 4743 - Stratix FPGA Homework

Download Stratix FPGA Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Stratix FPGA Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Stratix FPGA Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?