ISU CPRE 583 - Lect-05 - D2845783

Home> Schools> Iowa State University> Computer Engineering (CPRE) > CPRE 583> Lect-05

DOC PREVIEW

ISU CPRE 583 - Lect-05

School name Iowa State University

Course Cpre 583- Reconfig Comptg Sys

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CprE / ComS 583Reconfigurable ComputingProf. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State UniversityLecture #5 – FPGA ArithmeticCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.2Quick Points• HW #1 due Thursday at 12:00pm• Any comments?CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.3Recap• Cluster size of N = [6-8] is good, K = [4-5]CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.4LUT Mapping TechniquesCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.5LUT Mapping Techniques (cont.)CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.6LUT Mapping Techniques (cont.)2CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.7Outline• Recap• Motivation• Carry / Cascade Logic• Addition• Ripple Carry• Carry Bypass• Carry Select• Carry Lookahead• Basic MultipliersCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.8Motivation• Traditional microprocessors, DSPs, etc. don’t use LUTs• Instead use a w-bit Arithmetic and Logic Unit (ALU)• Carry connections are hard-wired• No switches, no stubs, short wires(1)AND2OR2XOR2(2)ADDSUBCMP3-LUT 3-LUT3-LUT3-LUT(2)ALUABOutOp(1, 2)2-LUTABOut(1)A B CinSumCoutSumCout / CinABCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.9Adder Delays• Assuming a ripple-carry adder:• 32-bit ALU delay – 6ns• 32-cascaded 4-LUTs – 32 x 2.5ns = 80ns• Motivates extra hardware to accelerate carry operations2.0 ns0.5 ns2.5 nsLogic delaySingle channel delayper bit4-LUT delay16 ns6 nsArea optimizedDelay optimizedCompare: 32-bit ALU (0.6λ)CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.10Altera Flex 8000 Carry ChainCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.11Xilinx XC4000 Carry ChainCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.12Cascades• Large fanin operations (reductions):• Decoding• Matching• Completion detection• Many-to-one reductions• Combining logic is simple • Makes use of dedicated paths3CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.13Altera Cascade Logic• LE delay – 2.4 ns• Cascade delay – 0.6 nsCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.14Why Look at Arithmetic?• Parallelization • Specialization• Architecture• Size• Inputs• Adder problem – delay grows linearly with bit width• Solutions for larger adders:• Pipelining• Carry bypass• Carry selectCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.15Adder Pipelining• Not as practical in ASIC world (registers are expensive)• Registers essentially “free” in FPGA logic blocks+A1B1S1+CoutA2B2S2+A3B3S3CoutCoutCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.16Carry Bypass Adders• If all the propagates are 1 (P0P1P2P3= 1) then Cout3= Cin• Pi= Aixor Bi• Skip all the carry logic• Inexpensive+A0B0Cin+A1B1+A2B2+A3B3Cout0Cout1Cout301Cout2P0P1P2P3CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.17Carry Bypass Performance• Small hardware cost:• 16-bit add – 4 CLB overhead• 32-bit add – 9 CLB overhead• Delay growth still linear, smaller slopeDelayRipple Carry Carry Bypass N-bit AdderCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.18Carry Select Adders• Precompute addition value for (Cin = 0) case and (Cin = 1) case• Use mux to select between two with actual Cin value• Cost of this approach?+A0B0+A1B1+A2B2Cin+A4B4+A4B401+A3B3+A5B5+A6B6+A7B7+A5B5+A6B6+A7B7S4-7014CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.19Linear Carry-Select• Adder delay = w, mux delay = 1+A31-24B31-2401+01S31-24+A23-16B23-1601+01S23-16+A15-8B15-801+1S15-8+A7-0B7-001+1S7-0t8t8t8t8t8t8t8t8t9t10t11t1200CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.20Square-Root Carry Select• Each carry arrives when the corresponding sum is ready01++01A31-30B31-30S31-3001++01A29-22B29-22S29-2201++01A21-15B21-15S21-1501++01A14-9B14-9S14-901++01A8-4B8-4S8-401++01A3-0B3-0S3-0t4t4t5t5t5t6t6t7t7t8t8t6t7t8t9t10CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.21Constant Addition• If one operand is constant:• More speed?• Less hardware?HAA00S0FAA1S1FAA2S2FAA3S31 0 1C3A0S0HAA2S2HAA3S3C3A1S1CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.22Multiplication• Shift and add operations• Need N bit adder, M cycles42 = 101010 Multiplicand (N bits)x 10 = x 1010 Multiplier (M bits) 000000 101010 Partial products000000+ 101010420 = 111001110 Product (N+M bits)CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.23Array Multiplier• Area – N x M cells• Delay – O(N+M)+X0Y0X1X2X3Z0Y1X0+X1+X2+X3Z1+Y2X0+X1+X2+X3+Y3X0+X1+X2+X3Z2CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.24Carry-Save+X0Y0X1X2X3Z0Y1X0+X1+X2+X3Z1+Y2++++Y3+++Z25CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.25Multiplier Pipelining• Register cost:• Multiplicand – (N bits/stage x M stages)• Multiplier – (M2+ M) / 2 bits• Early output values – (M2+ M) / 2 bits• Total – M x (N + M + 1) bits• Critical path = max:• DFF + FA + setup• Bottom-level adderCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.26Constant MultiplierY0=0Z0X0X1X2X3Z1+Y3=1X0+X1+X2+X3Z2• Can greatly reduce the number of adders• Removes all and gatesY1=1Y2=0CprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.27LUT-based Constant Multipliers• k-LUT can perform constant multiply of k-bit number• Break operand into k-bit quantities• Example: 8-bit x 8-bit constant, k=410101011x NNNNNNNNAAAAAAAAAAA (N * 1011 (LSN))+ BBBBBBBBBBBBBBB (N * 1010 (MSN))SSSSSSSSSSSSSSS ProductCprE 583 – Reconfigurable ComputingSeptember 5, 2006 Lect-05.28Summary• Latency overhead of programmable logic• Several approaches to reducing design latency:• Fast carry• Cascade• Hardwired connections• Multiplier optimization goals different from adder• Other techniques:• Logarithmic v. linear (Wallace Tree multiplier)• Data encoding (Booth’s

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

ISU CPRE 583 - Lect-05

Sign up for free to view:

Please select your school