Berkeley MATH 128A - Floating-Point Tricks to Solve Boundary-Value Problems Faster - D536884

Home> Schools> University of California, Berkeley> Mathematics (MATH) > MATH 128A> Floating-Point Tricks to Solve Boundary-Value Problems Faster

Berkeley MATH 128A - Floating-Point Tricks to Solve Boundary-Value Problems Faster

School name University of California, Berkeley

Course Math 128a- Numerical Analysis

Pages 28

Download Save

Unformatted text preview:

File FloTrik Floating-Point Tricks … version dated September 10, 2013 6:21 pmProf. W. Kahan’s notes for Math. 128B Page 1/28 Floating-Point Tricks to Solve Boundary-Value Problems Faster Prof. W. KahanMath. and Computer Sci. Depts.Univ. of Calif. @ Berkeley §0. Abstract: Old tricks are exhumed to accelerate the numerical solution of certain discretized boundary-value problems. Without the tricks, half the digits carried by the arithmetic can be lost to roundoff when the discretization’s grid-gaps get very small. The tricks can procure adequate accuracy from arithmetic with float variables 4-bytes wide instead of double variables 8-bytes wide that move slower through the computer’s memory system and pipelines. The tricks are tricky for programs written in M ATLAB™ 7+, J AVA , F ORTRAN and post-1985 A NSI C . For the original Kernighan-Ritchie C of the 1970s, and for the few implementations of C 99 that fully support IEEE Standard 754 for Binary Floating-Point, most of the tricks are easy or unnecessary. Their efficacy is illustrated here by examples. Contents: §1. Introduction: page 2§2: Discretized Initial-Value Problem d y /d τ = f ( y ) : 3§3: An Example 4§4. Discretized Boundary-Value Problem ( P · U ' ) ' + Q · U = R 5§5. How Roundoff Corrupts the Discretization: 6§6. Accurate Residuals: 8§7. A Trickier Trick: 9§8. Example: (x· u' ) ' + 4x·(1–x 2 )· u = 0 10 First Program’s Computed Graphs of u , u ' and v , and their errors 12 - 13§9. The 2nd Program: preserves symmetry 14 2nd Program’s Results 15§10. Iterative Refinement: recovers accuracy if done right 16 4th Program’s Results 17 - 18§11. Discretization of an Elliptic Boundary-Va;ue Problem: Laplace’s Equation 19Graphs of Φ and ||Grad Φ || 20Computed Results for Φ 21§12. Computing Grad Φ 22§13. Conclusions: Hardly any programmers will ever know the tricks. 25§14. Appendix 1: accuracy despite cancellation 26§15. Appendix 2: tridiagonal and 2nd order despite variable gaps 27 Posted at www.eecs.berkeley.edu/~wkahan/Math128/FloTrik.pdfFile FloTrik Floating-Point Tricks … version dated September 10, 2013 6:21 pmProf. W. Kahan’s notes for Math. 128B Page 2/28 §1. Introduction: Computations, formerly carried out in 8-byte-wide double floating-point, could afford to lose over half the arithmetic’s 16 sig.dec. and yet retain accuracy adequate for almost every requirement by scientists and engineers. Now they are tempted to replace double by 4-byte-wide float variables that will move twice as fast through computers’ pipelines and vast memories, dissipate half the energy, and take advantage of inexpensive hardware mass-produced for entertainment and communications. But this replacement exacerbates the threat to accuracy from roundoff that was formerly ignored. Only if unnoticed can the loss of about half the 7 sig.dec. of float arithmetic be ignored.There are tricks that defend discretized differential equations against excessive loss to roundoff.A few of us used these tricks in the 1970s during the brief reign of small computers with float arithmetic hardware but not double . The tricks ran faster than double simulated in software. The tricks relied upon a property of floating-point subtractive cancellation:If p and q are floating-point numbers of the same precision,and if 1/2 ≤ p / q ≤ 2 ,then p – q is computed exactly, unsullied by a rounding error.A proof appeared on p. 138 of Floating-Point Computation by P.H. Sterbenz (1974, Prentice-Hall, NJ). Rare exceptions occurred on perverse hardware lacking a g uard digit , and for abrupt instead of gradual underflow of p – q ; these exceptions do not happen nowadays on hardware conforming fully to IEEE Standard 754. The tricks entail complexities that bloat a computer program’s capture-cross-section for mistakes.No endorsement of these tricks is implied by their lengthy discussions and analyses below. Quite the contrary. Their complexities are a penalty imposed by programming languages and compilers lacking convenient support for arithmetic operations more precise than their operands. A major exception was C designed by B.W. Kernighan and D.M. Ritchie for an early DEC PDP-11. Its floating-point board required a call upon the operating system to select one of float or double precision. For speed’s sake, the board was left in double ; consequently every floating-point expression was evaluated in double regardless of whether operands were float s. This practice was numerically advantageous. It rendered unnecessary almost all the tricks exhibited below, and greatly enhanced the accuracy and reliability of many a F ORTRAN program transliterated into C , especially 3-dimensional geometrical computations like those described on my web page’s www.eecs.berkeley.edu/~wkahan/MathH110/Cross.pdf . But in the mid 1980s, before those advantages were appreciated widely, ANSI committee X3J11 let C compilers revert to Fortranish expression-evaluation. Most did. Now tricks are necessary for most C compilers.The simplest trick is Compensated Summation used to suppress the worst rounding errors in the numerical solution of initial-value problems to which one-dimensional boundary-value problems are converted when solved by Shooting methods. These deliver better accuracy as a stepsize θ is diminished, though at the cost of greater work proportional to 1/ θ . Without that trick or else extra-precise arithmetic, the enhanced accuracy is vitiated by rounding errors that accumulate proportional to 1/ θ in worst cases, though these happen only rarely. An example of damaging accumulation is provided by an initial-value problem in §§2-3.File FloTrik Floating-Point Tricks … version dated September 10, 2013 6:21 pmProf. W.

View Full Document


School:
Email:
New Password:
Confirm Password:

Berkeley MATH 128A - Floating-Point Tricks to Solve Boundary-Value Problems Faster

Sign up for free to view:

Please select your school