
Recent Posts
Recent Comments
 Orr Shalit on What Is the Complex Step Approximation?
 JoséJavier Martínez on What Is the Singular Value Decomposition?
 Timo Euler on What Is the Sherman–Morrison–Woodbury Formula?
 Daniel Trejo on What Is the Sylvester Equation?
 Carlos Frederico on What Is the Sherman–Morrison–Woodbury Formula?
Categories
 books (18)
 conferences (27)
 Emacs (8)
 LaTeX (15)
 matrix computations (6)
 miscellaneous (14)
 people (15)
 Princeton Companion (12)
 publication peculiarities (7)
 publishing (2)
 research (21)
 software (26)
 whatis (35)
 writing (15)
Tag Archives: IEEE_arithmetic
What Is Bfloat16 Arithmetic?
Bfloat16 is a floatingpoint number format proposed by Google. The name stands for “Brain Floating Point Format” and it originates from the Google Brain artificial intelligence research group at Google. Bfloat16 is a 16bit, base 2 storage format that allocates … Continue reading
What Is IEEE Standard Arithmetic?
The IEEE Standard 754, published in 1985 and revised in 2008 and 2019, is a standard for binary and decimal floatingpoint arithmetic. The standard for decimal arithmetic (IEEE Standard 854) was separate when it was first published in 1987, but … Continue reading
Half Precision Arithmetic: fp16 Versus bfloat16
The 2008 revision of the IEEE Standard for FloatingPoint Arithmetic introduced a half precision 16bit floating point format, known as fp16, as a storage format. Various manufacturers have adopted fp16 for computation, using the obvious extension of the rules for … Continue reading
The Rise of Mixed Precision Arithmetic
For the last 30 years, most floating point calculations in scientific computing have been carried out in 64bit IEEE double precision arithmetic, which provides the elementary operations of addition, subtraction, multiplication, and division at a relative accuracy of about . … Continue reading
Tiny Relative Errors
Let and be distinct floating point numbers. How small can the relative difference between and be? For IEEE double precision arithmetic the answer is , which is called the unit roundoff. What if we now let and be vectors and … Continue reading