A Defect Tolerant Self-organizing Nanoscale SIMD Architecture

Home> Academic Documents> A Defect Tolerant Self-organizing Nanoscale SIMD Architecture

DOC PREVIEW

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Abstract1 Introduction2 DNA-based Self-Assembled Nanoscale Systems and the Architectural ImplicationsFIGURE 1. Patterned DNA [26]FIGURE 2. DNA Lattice with transistors and interconnectFIGURE 3. Lattice with two levels of interconnect, and connections to Vdd and GndFIGURE 4. Self-assembled network of nodes3 System Overview4 Node Microarchitecture4.1 Data Path4.2 ControlFIGURE 5. Node Floorplan4.3 Inter-Node Communication4.4 Circuit Size and Power Estimates4.5 Summary5 System Configuration5.1 Logical Structure and Defect Isolation5.2 Configuring Processing ElementsFIGURE 6. PE LayoutFIGURE 7. Instruction execution in a random network with three configured PEs. The via is shown to cover multiple nodes, which are rendered unusable. The via is connected to the PEs through the anchor node (A). I/O bandwidth into the system c...6 System Architecture6.1 Instruction Set ArchitectureTABLE 1. Instruction Set6.2 Execution Model6.3 Instruction Execution Example7 Evaluation7.1 MethodologyTABLE 2. SOSA System ParametersTABLE 3. Ideal Superscalar ParametersTABLE 4. Benchmark Descriptions7.2 ResultsFIGURE 8. Single Cell Program Runtimes: (a) Matrix Multiplication, (b) Gaussian Filter, (c) Median Filter and (d) Sort. The vertical line denotes the input size beyond which SOSA does better than the Pentium 4Execution TimeFIGURE 9. Matrix Multiply: Assembly Code (no unrolling)ThroughputTABLE 5. TEA Throughput for different architectures7.3 Defect ToleranceFIGURE 10. TEA/XTEA: Graceful degradation of throughput with increasing node defect rateFIGURE 11. Matrix Multiplication: Effect of defects on runtime7.4 Result Summary8 Limitations and Future Work9 Related Work10 ConclusionsAcknowledgementsReferencesA Defect Tolerant Self-organizing Nanoscale SIMD Architecture1AbstractThe continual decrease in transistor size (through either scaled CMOS or emerging nano-technologies) promises to usher in an era of tera to peta-scale integration. However, this decrease in size is also likely to increase defect densities, contributing to the exponen-tially increasing cost of top-down lithography. Bottom-up manu-facturing techniques, like self-assembly, may provide a viable lower-cost alternative to top-down lithography, but may also be prone to higher defects. Therefore, regardless of fabrication meth-odology, defect tolerant architectures are necessary to exploit the full potential of future increased device densities.This paper explores a defect tolerant SIMD architecture. A key fea-ture of our design is the ability of a large number of limited capa-bility nodes with high defect rates (up to 30%) to self-organize into a set of SIMD processing elements. Despite node simplicity and high defect rates, we show that by supporting the familiar data par-allel programming model the architecture can execute a variety of programs. The architecture efficiently exploits a large number of nodes and higher device densities to keep device switching speeds and power density low. On a medium sized system (~1cm2 area), the performance of the proposed architecture on our data parallel programs matches or exceeds the performance of an aggressively scaled out-of-order processor (128-wide, 8k reorder buffer, perfect memory system). For larger systems (>1cm2), the proposed archi-tecture can match the performance of a chip multiprocessor with 16 aggressively scaled out-of-order cores.Categories and Subject Descriptors B.4.3 [Input/Output and Data Communications]: Interconnections (Subsystems); B.6.1 [Logic Design]: Design Styles; C.1.2 [Processor Architectures]: Multiple Data Stream Architectures (Multiprocessors).General Terms Design, Performance, ReliabilityKeywords self-organizing, SIMD, data parallel, bit-serial, defect tolerance, DNA, nanocomputing.1 IntroductionManufacturing defects, power density, process variability, tran-sient faults, bulk silicon limits, rising test costs and multibillion dollar fabrication facilities are some of the challenges facing the continued scaling of CMOS. While architectural modifications (e.g., multicore) can provide some short-term relief, the semicon-ductor industry recognizes the importance of these issues and the need to explore long term alternatives to CMOS devices and fabri-cation techniques [18].One promising alternative is DNA-based self-assembly of nanoscale components using inexpensive laboratory equipment to achieve tera to peta-scale integration. Although much of this tech-nology is in its infancy (i.e., demonstrated in research lab experi-ments), by studying its potential uses for building computing systems, architects can gain a deeper understanding of its limita-tions and opportunities while providing important feedback to the scientists developing the new technologies.DNA-based fabrication produces precise control within a small area (e.g., 9 µm2) enabling the construction of a large number (~109-1012) of small nodes (computational circuits with ~104 tran-sistors) that can be linked together using self-assembly. This pro-duces a random network of nodes, due to the lack of control over placement and orientation of nodes, that contains defective nodes and links. While our work is motivated by DNA-based self-assem-bly, it is applicable to any technology with similar characteristics (e.g., scaled CMOS with high process variability, high defect rates and point-to-point links between relatively small compute nodes). The challenge for computer architects is to efficiently exploit the computational power of the large number of nodes while overcom-ing two primary challenges: 1) loss of precise control over the entire fabrication process, and 2) high defect rates.This paper presents a SIMD architecture designed to address these challenges. The fundamental building block in our architec-ture is a relatively small node (e.g., 1-bit ALU with 32 bits of stor-age and communication support for four neighbors) that operates asynchronously. A configuration phase at startup isolates defective nodes and allows groups of nodes to self-organize into SIMD pro-cessing elements (PEs) which are connected in a logical ring, thus simplifying the programmer’s view of the system.Simulations using conservative estimates for node size and device speed show that the proposed design can match the perfor-mance of aggressively scaled architectures for 8 out of 9 bench-marks tested. Furthermore, this performance is achieved with a very low power density of 6.5 W/cm2 (vs. >75 W/cm2 for modern cores) while


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

Please select your school