DOC PREVIEW
UT EE 382V - Defect Tolerance on the Teramac Custom Computer

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EE382V: Computer Architecture: User System Interplay Lecture #6Department of Electical and Computer EngineeringThe University of Texas at Austin Wednesday, 14 February 2007Disclaimer: ”The contents of this document are scribe notes for The University ofTexas at Austin EE382V Spring 2007, Computer Architecture: User System Interplay∗.The notes capture the class discussion and may contain erroneous and unverified infor-mation and comments.Defect Tolerance on the Teramac Custom Computer- Scribe NotesLecture #6: Wednesday, 07 February 2007Lecturer: Professor Mattan ErezScribe: Kshama PawarReviewer: Professor Mattan Erez, Min Kyu JeongThe introductory paper on defect tolerance identifies issues hampering yield and relia-bility. It highlights the importance of Design for Manufacturability (DFM). The emphasisis on the issues faced by the Silicon design engineer and the steps s/he can take to in-corporate DFM. The following are the scribe notes for the class discussion on the paperwhich talks about defect tolerance on the Teramac computer.1 Problem being SolvedThe following are the problems being solved with respect to Teramac:• Reconfigurable high performance hardware. (Though not explicitly stated.)• Systems should work with a variety of abstract designs - Implementation of a varietyof designs.• Reduced cost - Utilization of defective parts.• Hiding defect tolerance from Users - Automation of mapping of working parts withdesigns.The following were some of the points regarding their methodology towards solving theabove problems.∗Copyright 2007 Kshama Pawar and Mattan Erez, all rights reserved. This work may be reproducedand redistributed, in whole or in part, without prior written permission, provided all copies cite theoriginal source of the document including the names of the copyright holders and ”The University ofTexas at Austin EE382V Spring 2007, Computer Architecture: User System Interplay”.2 EE382V: Lecture #6• Use defective parts (FPGAs) to solve building a high performance design for largesystems. The Teramac system runs at a speed of 1MHz for simulations of computerarchitecture explorations, which was significantly higher than its competitors.• Remapping of designs are done post defect, they are not done during runtime andthe failed simulations have to be run again.• The software for defect-analysis can be run by the user on the faulting machine,which will then rectify itself by a revised mapping and can be used again.• The paper does not deal with fault tolerance or fault detection. One cannot knowthat the system is not working as expected during runtime unless a fault is encoun-tered. Discovering a fault has not been addressed by the paper.2 Intended Users• SOC manufacturers/IC manufacturers possessing defective parts.• Consumers of cheap processors.• Manufacturers wanting to advertize cost reduction for these defective parts.• Custom computer designers.The following are some of the points that came up in the context of the intended usersfor this paper.• Frequency binning of designs is due to process variations (explained in the intro-ductory paper). Process variations cause out/underperformance in terms of theexpected tolerances.• Including redundancy within the design implies adding an extra row of memory orcache. Traditional defect tolerance is done by adding rows. Teramac went withadding extra interconnect to work around faulty blocks at a smaller granularity.• Teramac approach does not scale well to manufacturing because of the time it takesto find and map, around faults.• The Teramac custom computer is about 100 times faster than regular computersof its time according to wall clock time.EE382V: Lecture #6 33 Uniqueness• Diagnostic tests isolate faulty design parts using mapping software, diagnostic soft-ware and fault database.• No redundant dedicated regular modules. An attempt to reduce the redundancyin design and increase use of as many parts as possible.• Column or cylindrical structure for routing faster. Heirarchical interconnect system.• Utilize Rent’s rule for modules and interconnects to map a variety of designs to thehardware; ensures different designs.• Software systems to detect the exact location of the fault.• Efficient use of resources, they narrow down as much as possible to isolate thedefective parts that they will not use.• Other fault tolerant designs have built-in redundancy for defects. Redundancyis built into the chip and the chip will work exactly as specified. This is whatTeramac did not do. Teramac chips are not logically equivalent to one another,whereas traditional defect tolerant designs would be.• Exposes all designs on-chip and maps the exact defects. A lot of commercial designsare identical logically, that is, they do not expose redundant parts of the designoutside the test equipment. Teramac exposes the hardware and does not haveexplicitly redundant designs which cannot be used at all once they are hardwired.There was an elaboration of Rent’s rule by Professor Erez. A researcher at IBM observeda power law between the amount of functionality, in terms of logic gates, to the numberof wires in the block. Rent’s rule is roughly a square root law but generically is definedas a power law. CAD tools can make use of this phenomenon.4 Evaluation• The study lacked comprehensive quantitative evaluation.• There is no comparison of the provided solution with any other design. The existingcomparisons are qualitative. For example, number of FPGAs has been quoted butno quantitative details are given.• Reduction of cost is indicated by examples like three-fourths of the parts being free- A strong hint that a lot of parts were obtained free.• There were ratios expected for the goals set at the beginning of the paper. Eventhe goals which were set were not numeric but only qualitative. Aimed to makinga functionally correct “working system”.4 EE382V: Lecture #65 Evaluation in line with the Stated Requirements?• There has not been a convincing argument that they met user requirements.• Speeds of resources have not been measured/evaluated. (They have mentioned itqualitatively as “excellent”.)• No quantitative mention of how effective their defect tolerance scheme is in termsof numbers.• Critical defective resources are thrown away, but they do not mention how manythey throw away.• Ribbon cables is not a viable solution today for interconnection networks. Ribbonsare not


View Full Document

UT EE 382V - Defect Tolerance on the Teramac Custom Computer

Documents in this Course
Load more
Download Defect Tolerance on the Teramac Custom Computer
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Defect Tolerance on the Teramac Custom Computer and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Defect Tolerance on the Teramac Custom Computer 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?