Tag Archives: IEEE_arithmetic

Half Precision Arithmetic: fp16 Versus bfloat16

The 2008 revision of the IEEE Standard for Floating-Point Arithmetic introduced a half precision 16-bit floating point format, known as fp16, as a storage format. Various manufacturers have adopted fp16 for computation, using the obvious extension of the rules for … Continue reading

Posted in research | Tagged , , | 9 Comments

The Rise of Mixed Precision Arithmetic

For the last 30 years, most floating point calculations in scientific computing have been carried out in 64-bit IEEE double precision arithmetic, which provides the elementary operations of addition, subtraction, multiplication, and division at a relative accuracy of about . … Continue reading

Posted in research | Tagged , | 3 Comments

Tiny Relative Errors

Let and be distinct floating point numbers. How small can the relative difference between and be? For IEEE double precision arithmetic the answer is , which is called the unit roundoff. What if we now let and be vectors and … Continue reading

Posted in research | Tagged , , | Leave a comment