MIT 6 454 - Highly Fault-Tolerant Parallel Computation - D1929540

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 454> Highly Fault-Tolerant Parallel Computation

DOC PREVIEW

MIT 6 454 - Highly Fault-Tolerant Parallel Computation

School name Massachusetts Institute of Technology

Course 6 454- Graduate Seminar in Area I

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

PreliminariesPrimer on Polynomial CodingCoding StrategyHighly Fault-Tolerant Parallel ComputationJohn Z. SunMassachusetts Institute of TechnologyOctober 12, 2011OutlinePreliminariesPrimer on Polynomial CodingCoding Strategy2/17Recap•von Neummann (1952)•Introduced study of reliable computation with faulty gates•Used computation replication and majority rule to ensure reliability•Main statement: If any gate can fail with probability ǫ, then the output gatewill fail with constant probability δ by constructing bundles of r = f(δ, ǫ)wires. The “blowup” of such a system is O(r).•Alternative statement: An error-free circuit of m gates can be reliablysimulated with a circuit composed of O(m log m) unreliable components•Dobrushin and Ortyukov (1977b)•Rigorously expanded von Neumann’s architecture using exactly ǫ wireprobability of error•Pippenger (1985)•Gave an explicit construction to the above analysis•Main statement: There is a constant ǫ such that, for all circuits C, there is away to replace each wire in C with a bundle of O (r) and an amplifier of sizeO(r) so that the probability that any bundle in the circuit fails to represent itsintended value is at most w2−r. The blowup of such a simulation is O(r).Can we do better?3/17Computation via Local Codes•Elias (1958)•Focused on multiple instances on pairs of inputs on a particular Booleanfunction•Showed fundamental differences between xor and inclusive-or•For the latter, showed that repetition coding is best•Winograd (1962) and others•Further development of negative results along the lines of Elias (seePippenger 1990 for a summary)•Taylor (1968)•Used LDPC codes for reliable storage in unreliable memory cells•Can be extended to other linear functionals4/17Main Result•Spielman moves beyond local coding to get improved performance•Setup: Consider a parallel computation machine M with w processorsrunning t time units•Result: M can be simulated using a faulty machine M′with w logO(1)wprocessors and t logO(1)w time steps such that probability of error is< t2−w1/4Novelty:•Using processors (finite state machines) rather than logic•Running parallel computations to allow for coding•Using heterogenous components5/17NotationDefinitionFor a set S and integer d, let Sddenote the set of d-tuples of elements of S.DefinitionFor sets S and T , let STdenote the set of |T |-tuples of elements of S indexedby elements of T.DefinitionA pair of functions (E, D) is an encoding-decoding pair if there exists afunction l such thatE :{0, 1}n→ {0, 1}l(n)D :{0, 1}l(n)→ {0, 1}n∪ {?},satisfying D(E(~a)) = ~a for all ~a in {0, 1}n.6/17NotationDefinitionLet (E, D) be an encoding-decoding pair. A parallel machine M′(ǫ, δ, E, D)-simulates a machine M ifProb{D(M′(E(~a))) = M(~a)} > 1 − δ,for all inputs ~a if each processor produces the wrong output with probabilityless than ǫ at each time step.DefinitionLet (E, D) be an encoding-decoding pair. A circuit C′(ǫ, δ, E, D)-simulates acircuit C ifProb{D(C′(E(~a))) = C(~a)} > 1 − δ,for all inputs ~a if each wire produces the wrong output with probability lessthan ǫ at each time step.7/17Remarks•Theblow-up of the simulation is the ratio of gates in C′and C•The notion of failure here is at most ǫ on wires[Pippenger (1989)]•Restrict (E, D) to be simple to eliminate them from doing computationrather than M′•In this case, the encoder-decoder pair is same for all simulations•No recoding necessary between levels of circuits8/17Reed-Solomon CodesFields•Afield F is a countable set with the following properties•F forms an abelian group under the addition operator•F − {0} forms an abelian group under multiplication operator•Operators satisfy distributive law•AGalois field has qnelements for q prime•GF(qn) isomorphic to polynomials of degree n − 1 over GF(q)Reed-Solomon code•Consider a message (f0, . . . fk)•For n = q, evaluate f(z) = f0+ f1z + . . . + fk−1zk−1for each z ∈ GF(q)•Codeword associated with message is (f (1), f (α), . . . f(αq−2))•Minimum distance is d = n − k + 19/17Extended Reed-Solomon CodesDefinitionLet F be a field and let H ⊂ F. We define an encoding function of anextended RS code CH,Fto beEH,F: FH→ FF,where the message is mapped to the unique degree-(|H − 1) polynomial thatinterpolates it.Thedecoding function isDH,F: FF→ FH∪ {0},where the input is mapped to a codeword of CH,Fthat differ in at most kplaces and the output is the inverse mapping to the message space.Theerror-correcting function isDkH,F: FF→ FF∪ {0},where the input is mapped to a codeword of CH,Fthat differ in at most kplaces.10/17Extended Reed-Solomon CodesTheoremThe encoding and decoding functions EH,Fand DH,Fcan be computed bycircuits of size |F| logO(1)|F|.Proof: See Justesen (1976) and Sarwate (1977)LemmaThe function DkH,Fcan be computed by a randomed parallel algorithm thattakes time logO(1)|F| on (k2|F|) logO(1)|F|, for k < (|F| − |H|)/2. Thealgorithm succeeds with probability 1 − 1/|F|.Proof: See Kaltofen and Pan (1994). Requires k = O(p|F|).11/17Generalized Reed-Solomon CodesDefinitionLet F be a field and let H ⊂ F. We define an encoding function of ageneralized RS code CH2,Fto beEH2,F: FH2→ FF2.Thedecoding function isDH2,F: FF2→ FH2∪ {0}.Encoding: Run RS encoder on first dimension, then on second.Decoding: Run RS decoder on second dimension, then on firstCan correct up to ((F − H)/2)2errors, but only (F − H )/2 in each dimension.12/17Computation on HypercubesNetwork model•Consider an n-dimensional hypercube with processors at each vertex(labeled by a string in {0, 1}n)•Processors are connected via edges in hypercube (strings that differ inonly one bit)•Processors are synchonized and are allowed to communicate with oneneighbor during each time step•At each time step, all communication must happen in the same directionPropositionAny parallel machine with w processors can be simulation withpolylogarithmic slowdown by a hypercube with O(w) processors.Processor Model•Processors are identical finite automata with a valid set of statesS = GF(2s) for some constant s•Processors change state based on a deterministic instruction, itsprevious state, and state of a neighbor•Communcation direction is deterministic and known to each processor13/17Sketch of Main Idea•FSM previous state σi,t, neighbor state σ′i,tand instruction wi,taremapped to set S ⊂ F•Encode states

View Full Document