16.375: Complex Digital SystemsL01-1February 3, 2010 http://csg.csail.mit.edu/6.375Lecturer: ArvindTA: Richard S. UhlerAdministration: Sally LeeWhy take 6.375Something new and exciting as well as usefulusefulFun: Design systems that you never thought you could design in a course made possible by large FPGAs and BluespecL01-2February 3, 2010 http://csg.csail.mit.edu/6.375You will also discover that is possible to design complex digital systems with little knowledge of circuits2New, exciting and useful …L01-3February 3, 2010 http://csg.csail.mit.edu/6.375Wide Variety of Products Rely on ASICsASIC = Application-Specific Integrated CircuitL01-4February 3, 2010 http://csg.csail.mit.edu/6.3753What’s required?ICs with dramatically higher performance, optimized for applicationsoptimized for applicationsL01-5February 3, 2010 http://csg.csail.mit.edu/6.375Source: http://www.intel.com/technology/silicon/mooreslaw/index.htmand at a size and power to deliver mobilitycost to address mass consumer marketsCurrent Cellphone ArchitectureTwo chips, each with an ARM general-purpose processor (GPP) and a CommsApplicationWLAN RFWLAN RFWLAN RFWCDMA/GSM RFDSP (TI OMAP 2420)Comms. ProcessingApplication ProcessingMany L01-6February 3, 2010 http://csg.csail.mit.edu/6.375Many specialized complex blocks4Server microprocessors also need specialized blocks compression/decompressionencryption/decryptionintrusion detection and other security related solutionsDealing with spamL01-7February 3, 2010 http://csg.csail.mit.edu/6.375Self diagnosing errors and masking them…Real power saving implies specialized hardwareH.264 video decoder implementations in software vs hardware in software vs. hardware the power/energy savings could be 100 to 1000 foldbut our mind set is that hardware design is:L01-8February 3, 2010 http://csg.csail.mit.edu/6.375design is: Difficult, risky Increases time-to-market Inflexible, brittle, error prone, ... Difficult to deal with changing standards, …5Will multicores reduce the need for new hardware?L01-9February 3, 2010 http://csg.csail.mit.edu/6.37564-core TileraSoC & Multicore Convergence:more application specific blocksOn-chip memory banksApplication-specific General-purpose processorsprocessing unitsL01-10February 3, 2010 http://csg.csail.mit.edu/6.375Structured on-chip networks6To reduce the design cost of SoCs we need … Extreme IP reuseMultiple instantiations of a block for “Intellectual Property”Multiple instantiations of a block for different performance and application requirements Packaging of IP so that the blocks can be assembled easily to build a large system (black box model)Architectural exploration to understand L01-11February 3, 2010 http://csg.csail.mit.edu/6.375Architectural exploration to understand cost, power and performance tradeoffsFull system simulations for validation and verificationHardware design today is like programming was in the fifties, i.e., before the invention of high-level languages L01-12February 3, 2010 http://csg.csail.mit.edu/6.375languages7Programmers had to know many detail of their computerAn IBM 650 Instruction: 60 1234 1009IBM 650(1954)L01-13February 3, 2010 http://csg.csail.mit.edu/6.375• “Load the contents of location 1234 into the distribution; put it also into the upper accumulator; set lower accumulator to zero; and then go to location 1009 for the next instruction.”For designing complex SoCs deep circuits knowledge is secondary Using modern high-level hardware synthesis tools like Bluespec requires computer science training in programming and architecture rather than circuit designL01-14February 3, 2010 http://csg.csail.mit.edu/6.3758Bluespec A new way of expressing behaviorA formal method of composing modules with parallel interfaces (ports) Bluespecwith parallel interfaces (ports) Compiler manages muxing of ports and associated controlPowerful and zero-cost parameterization of modulesEncapsulation of C and Verilog codes using Bluespec wrappers L01-15February 3, 2010 http://csg.csail.mit.edu/6.375ppp Helps Transaction Level modeling Smaller, simpler, clearer, more correct code not just simulation, synthesis as wellIP Reuse via parameterized modulesExample OFDM based protocolsMACScramblerFECEncoderInterleaver MapperPilot &GuardInsertionIFFTCPInsertionTXControllerD/AMACstandard specificpotential reuseDe-ScramblerFECDecoderDe-InterleaverDe-MapperChannelEstimaterFFT SynchronizerRXControllerS/PA/D Reusable algorithm with different t ttiL01-16February 3, 2010 http://csg.csail.mit.edu/6.375 Different algorithms Different throughput requirementsparameter settings(Alfred) Man Cheuk Ng, …9High-level Synthesis from BluespecBluespec SystemVerilog sourceCBluesimCycleAccurateVerilog 95 RTLVerilog simBluespec CompilerRTL synthesisL01-17February 3, 2010 http://csg.csail.mit.edu/6.375VCD outputDebussyVisualizationgatesFPGAPower estimation toolFPGAs: a new opportunityL01-18February 3, 2010 http://csg.csail.mit.edu/6.37510Chip Design StylesCustom and Semi-CustomHand-drawn transistors (+ some standard cells)Handdrawn transistors (+ some standard cells) High volume, best possible performance: used for most advanced microprocessorsStandard-Cell-Based ASICs High volume, moderate performance: Graphics chips, network chips, cell-phone chipsField-Programmable Gate ArraysL01-19February 3, 2010 http://csg.csail.mit.edu/6.375 Prototyping Low volume, low-moderate performance applications Different design styles have vastly different costsExponential growth: Moore’s LawIntel 8080A, 19743Mhz, 6K transistors, 6uIntel 8086, 1978, 33mm210Mhz, 29K transistors, 3uIntel 80286, 1982, 47mm212.5Mhz, 134K transistors, 1.5uIntel 386DX, 1985, 43mm233Mhz, 275K transistors, 1u February 7, 2007L01-20 http://csg.csail.mit.edu/6.375/Intel 486, 1989, 81mm250Mhz, 1.2M transistors, .8uIntel Pentium, 1993/1994/1996, 295/147/90mm266Mhz, 3.1M transistors, .8u/.6u/.35uIntel Pentium II, 1997, 203mm2/104mm2300/333Mhz, 7.5M transistors, .35u/.25uhttp://www.intel.com/intel/intelis/museum/exhibit/hist_micro/hof/hof_main.htmShown with approximate relative sizes11Intel Penryn (2007)Dual coreQuad-issue out-of-order superscalar processorssuperscalar processors6MB shared L2 cache45nm technology Metal gate transistors High-K gate dielectric410 Million transistors3+? GHz clock frequencyL01-21February 3, 2010 http://csg.csail.mit.edu/6.375Could fit over 500 486 processors on same size die.But
View Full Document