Floating Point Arithmetic

I started my software engineer career constructing an Mechanical CAD software. One day, my tangential algorithm yielded a line that does not even touch the curve it is supposed to be tangent to. The graphics on the screen is off by several pixels. I was miffed. Checked and re-checked. It was not a graphics problem, my algorithm was correct, and the implementation was not buggy. That stumped me for a few days until I learned about floating point round-off errors. With just a few lines of code changed, the line snapped neatly onto the curve. That was the 1st time I learned the difference between real numbers and computerized floating point numbers.

Then I worked for Sun and met this guy (forgot his name) who was on the IEEE standard committee. All he talked about are Fortran, Inf, NaN, floating point exceptions, and those things I had no clue about. Good thing I was young and he was patient.

In pretty much all computers these days, floating point arithmetic are not precise. If a programmer is not careful, a surprising large error can happen from simple operations. Every programmer should really read the famous paper: What Every Computer Scientist Should Know About Floating-Point Arithmetic. In addition to floating point rounding off, novice programmers frequently trip on integer overflow (or underflow). Simply put, if you put two integers together and the sum is greater than what a computer can hold, it simply throws away the excessive bits and leave you with a result that can be very surprising. (Try adding 2,000,000,000 and another 2,000,000,000 to an “int32” typed integer variable. Guess what’s the answer before running the program.)

Floating Point Arithmetic

Leave a Reply

Recent Posts

Categories

Meta

Pages

Archives