What’s New in MATLAB R2020a and R2020b?

In this post I discuss new features in MATLAB R2020a and R2020b. As usual in this series, I focus on a few of the features most relevant to my work. See the release notes for a detailed list of the many changes in MATLAB and its toolboxes.

Exportgraphics (R2020a)

The exportgraphics function is very useful for saving to a file a tightly cropped version of a figure with the border white instead of gray. Simple usages are

exportgraphics(gca,'image.pdf')
exportgraphics(gca,'image.jpg','Resolution',200)

I have previously used the export_fig function, which is not built into MATLAB but is available from File Exchange; I think I will be using exportgraphics instead from now on.

Svdsketch (R2020b)

The new svdsketch function computes the singular value decomposition (SVD) USV^T of a low rank approximation to a matrix (U and V orthogonal, S diagonal with nonnegative diagonal entries). It is mainly intended for use with matrices that are close to having low rank, as is the case in various applications.

This function uses a randomized algorithm that computes a sketch of the given m-by-n matrix A, which is essentially a product Q^TA, where Q is an orthonormal basis for the product A\Omega, where \Omega is a random n-by-k matrix. The value of k is chosen automatically to achieve \|USV^T-A\|_F \le \mathrm{tol}\|A\|_F, where \mathrm{tol} is a tolerance that defaults to \epsilon^{1/4} and must not be less than \epsilon^{1/2}, where \epsilon is the machine epsilon (2\times 10^{-16} for double precision). The algorithm includes a power method iteration that refines the sketch before computing the SVD.

The output of the function is an SVD in which U and V are numerically orthogonal and the singular values in S of size \mathrm{tol} or larger are good approximations to singular values of A, but smaller singular values in S may not be good approximations to singular values of A.

Here is an example. The code

n = 8; rng(1); 8; A = gallery('randsvd',n,1e8,3);
[U,S,V] = svdsketch(A,1e-3);
rel_res = norm(A-U*S*V')/norm(A)
singular_values = [svd(A) [diag(S); zeros(n-length(S),1)]]

produces the following output, with the exact singular values in the first column and the approximate ones in the second column:

rel_res =
   1.9308e-06
singular_values =
   1.0000e+00   1.0000e+00
   7.1969e-02   7.1969e-02
   5.1795e-03   5.1795e-03
   3.7276e-04   3.7276e-04
   2.6827e-05   2.6827e-05
   1.9307e-06            0
   1.3895e-07            0
   1.0000e-08            0

The approximate singular values are correct down to around 10^{-5}, which is more than the 10^{-3} requested. This is a difficult matrix for svdsketch because there is no clear gap in the singular values of A.

Axis Padding (R2020b)

The padding property of an axis puts some padding between the axis limits and the surrounding box. The code

x = linspace(0,2*pi,50); plot(x,tan(x),'linewidth',1.4)
title('Original axis')
axis padded, title('Padded axis')

produces the output

padded_1a.jpg

padded_2a.jpg

Turbo Colormap (2020b)

The default colormap changed from jet (the rainbow color map) to parula in R2014b (with a tweak in R2017a), because parula is more perceptually uniform and maintains information when printed in monochrome. The new turbo colormap is a more perceptually uniform version of jet, as these examples show. Notice that turbo has a longer transition through the greens and yellows. If you can’t give up on jet, use turbo instead.

Turbo:

turbo.jpg

Jet:

colormap_jet.jpg

Parula:

colormap_parula.jpg

ND Arrays (R2020b)

The new pagemtimes function performs matrix multiplication on pages of n-dimensional arrays, while pagetranspose and pagectranspose carry out the transpose and conjugate transpose, respectively, on pages of n-dimensional arrays.

Performance

Both releases report significantly improved speed of certain functions, including some of the ODE solvers.

Simulating Low Precision Floating-Point Arithmetics in MATLAB

At least five floating-point arithmetics are available in mainstream hardware:

  • the IEEE double precision (fp64), single precision (fp32), and half precision (fp16) formats,
  • bfloat16, and
  • tf32, introduced in the recently announced NVIDIA A100, which uses the NVIDIA Ampere GPU architecture.

Only fp32 and fp64 are available on current Intel processors and most programming environments support only these two precisions. For many of us it is therefore necessary to simulate other precisions if we wish to explore their properties and the behavior of algorithms when run at these precisions.

chop-ex.jpg

Srikara Pranesh and I have written a MATLAB function chop that allows arithmetic of arbitrary precision and exponent range (both lower than fp64) to be simulated. The idea is very simple: variables are stored in fp32 or fp64 and then rounded to the required precision and exponent range after each operation by calling chop. The precision and range, as well as various other parameters, can be changed at any point in the computation.

Our approach can be compared with defining a new data type and storage format and converting to and from single precision or double precision in order to carry out the computations. Our approach avoids the overheads of repeatedly converting to and from the special storage format. It gains speed and generality at the cost of requiring chop calls to be inserted in the code.

Chop is available from this GitHub repository and from MathWorks File Exchange. The motivation behind chop is described in this open access paper in SIAM. J. Sci. Comput. We have made use of chop to simulate fp16 and bfloat16 in several recent papers, including on mixed precision algorithms, probabilistic rounding error analysis, and fast and accurate summation.

How to Use Chop

A call to chop has the form chop(X,options), where the structure options spceifies how to round the elements of the real array X to a lower precision arithmetic. X should be a single precision or double precision array and the output will have the same type. The structure options controls various aspects of the rounding, as we now explain.

  1. The arithmetic format is specified by options.format, which is one of
    • ‘b’, ‘bfloat16’: bfloat16,
    • ‘h’, ‘half’, ‘fp16’: IEEE half precision (the default),
    • ‘s’, ‘single’, ‘fp32’: IEEE single precision,
    • ‘d’, ‘double’, ‘fp64’: IEEE double precision,
    • ‘c’, ‘custom’: custom format.

    In the last case the (base 2) format is defined by options.params, which is a 2-vector [t,emax], where t is the number of bits in the significand (including the hidden bit) and emax is the maximum value of the exponent.

  2. options.subnormal specifies whether subnormal numbers are supported (if they are not, subnormals are flushed to zero):
    • 0 = do not support subnormals (the default for bfloat16),
    • 1 = support subnormals (the default for fp16, fp32, and fp64).
  3. The form of rounding is specified by options.round:
    • 1: round to nearest using round to even last bit to break ties (the default);
    • 2: round towards plus infinity (round up);
    • 3: round towards minus infinity (round down);
    • 4: round towards zero;
    • 5: stochastic rounding—round to the next larger or next smaller floating-point number with probability proportional to 1 minus the distance to those floating-point numbers;
    • 6: stochastic rounding—round to the next larger or next smaller floating-point number with equal probability.

    For stochastic rounding, exact floating-point numbers are not changed.

  4. If options.flip = 1 (default 0) then each element of the rounded result has, with probability options.p (default 0.5), a randomly chosen bit in its significand flipped. This option is useful for simulating soft errors.
  5. If options.explim = 0 (default 1) then emax (the maximal exponent) for the specified arithmetic is ignored, so overflow, underflow, or subnormal numbers will be produced only if necessary for the data type of X. This option is useful for exploring low precisions independent of range limitations.

To avoid repeatedly passing the options argument, one can issue a call chop([],options) and the subsequent calls will use the previously given options parameters. The value of options can be inspected with [~,options] = chop.

In the rest of this post we give examples of the use of chop.

Fp16 versus Bfloat

The code

x = 1/3;
opt_h.format = 'h'; xh = chop(x,opt_h); err_h = (x - xh)/x
opt_b.format = 'b'; xb = chop(x,opt_b); err_b = (x - xb)/x
x = 70000;
xh = chop(x,opt_h)
xb = chop(x,opt_b)

produces the output

err_h =
2.4414e-04
err_b =
-1.9531e-03
xh =
Inf
xb =
70144

which illustrates the higher precison but smaller range of fp16 compared with bfloat16.

Rounding Modes

The code

opt.format = 'h'; x = 0.1; rng(2)
for k = 1:6
opt.round = k;
y(k) = chop(x,opt);
end
errors = y - x
diff_y = diff(y)

rounds 0.1 to fp16 using six different rounding modes. The output is

errors =
-2.4414e-05   3.6621e-05  -2.4414e-05  -2.4414e-05  -2.4414e-05
3.6621e-05
diff_y =
6.1035e-05  -6.1035e-05            0            0   6.1035e-05

Rounding maps a number to the next larger or next smaller floating-point number, so there are only two possibilties for the error, and they have opposite signs. The diff_y values are consistent with the spacing of the fp16 floating-point numbers around 0.1, which is 2^{-11}/8.

Matrix Multiplication

The overheads of chop can be minimized by choosing a suitable implementation of a computation. Matrix multiplication provides a good example. The multiplication C = A*B of n-by-n matrices involves 2n^3 floating-point operations, each of which needs to be chopped. The following code uses only 2n calls to chop, because it processes an outer product in a single call, taking advantage of the fact that chop can take a matrix argument.

A = rand(m,n); B = rand(n,p);
C = zeros(m,p);
for i = 1:n
C = chop(C + chop(A(:,i)*B(i,:)));
end

Tf32 Arithmetic

The recently announced NVIDIA A100 GPU supports a new floating-point format called tf32 in its tensor cores. Tf32 is a 19-bit format that has 11 bits in the significand (including the hidden bit) and 8 bits in the exponent. So it has the precision of fp16 and the range of bfloat16, effectively combining the best features of these two formats. We can simulate tf32 using the custom format of chop. The code

opt.format = 'custom';
% tf32. Significand: 10 bits plus 1 hidden, exponent: 8 bits.
t = 11; emax = 127;
opt.params = [t,emax];
x = 1/3;
xt = chop(x,opt); err_t = (x - xt)/x
x = 70000;
xt = chop(x,opt)

produces the output

err_t =
2.4414e-04
xt =
70016

The error is the same as for fp16 but the number 70000 now rounds to a finite floating-point number.

Update of Catalogue of Software for Matrix Functions

funm_ex.jpg

Edvin Hopkins and I have updated to version 3.0 the catalogue of software for matrix functions that we originally produced in 2014 and updated in 2016. It covers what is available in various languages (C++, Fortran, Java, Julia, Python, Rust), problem solving environments (GNU Octave, Maple, Mathematica, MATLAB and associated toolboxes, R, Scilab), and libraries (Armadillo, GNU Scientific Library, NAG Library, SLEPc, SLICOT).

Here are some highlights of changes in the last four years that are reflected in the new version.

  • Several new MATLAB third-party functions have been written, by various authors, notably for f(A)b and for arbitrary precision evaluation of the exponential and logarithm.
  • Matrix function support in Julia has been expanded.
  • Armadillo, Rust, SLEPc, and Tensorflow are included in new entries.

In addition, all URLs and references have been updated.

Suggestions for inclusion in a future revision are welcome.

What’s New in MATLAB R2019a and R2019b?

I didn’t have time earlier this year to write about the first MATLAB release of 2019, so in this post I will discuss R2019a and R2019b together. As usual in this series, I concentrate on the features most relevant to my work.

Modified Akima Cubic Hermite Interpolation (R2019b)

makima_example.jpg

A new function makima computes a piecewise cubic interpolant with continuous first-order derivatives. It produces fewer undulations than the existing spline function, with less flattening than pchip. This form of interpolant has been an option in interp1, interp2, interp3, interpn, and griddedInterpolant since R2017b and is now also a standalone function. For more details see Cleve Moler’s blog post about makima, from which the code used to generate the image above is taken.

Linear Assignment Problem and Equilibration (R2019a)

The matchpairs function solves the linear assignment problem, which requires each row of a matrix to be assigned to a column in such a way that the total cost of the assignments (given by the “sum of assigned elements” of the matrix) is minimized (or maximized). This problem can also be described as finding a minimum-weight matching in a weighted bipartite graph. The problem arises in sparse matrix computations.

The equilibrate function take as input a matrix A (dense or sparse) and returns a permutation matrix P and nonsingular diagonal matrices R and C such that R*P*A*C has diagonal entries of magnitude 1 and off-diagonal entries of magnitude at most 1. The outputs P, R, and C are sparse. This matrix equilibration can improve conditioning and can be a useful preprocessing step both in computing incomplete LU preconditioners and in iterative solvers. See, for example, the recent paper Max-Balanced Hungarian Scalings by Hook, Pestana, Tisseur, and Hogg.

The scaling produced by equilibrate is known as Hungarian scaling, and its computation involves solving a linear assignment problem.

Function Argument Validation (R2019b)

A powerful new feature allows a function to check that its input arguments have the desired class, size, and other aspects. Previously, this could be done by calling the function validateattributes with each argument in turn or by using the inputParser object (see Loren Shure’s comparison of the different approaches).

The new feature provides a more structured way of achieving the same effect. It also allows default values for arguments to be specified. Function argument validation contains many aspects and has a long help page. We just illustrate it with a simple example that should be self-explanatory.

function fav_example(a,b,f,t)
  arguments
    a (1,1) double               % Double scalar.
    b {mustBeNumeric,mustBeReal} % Numeric type and real.
    f double {mustBeMember(f,[-1,0,1])} = 0 % -1, 0 (default), or 1.
    t string = "-"               % String, with default value.
  end
  % Body of function.
end

The functions mustBeNumeric and mustBeMember are just two of a long list of validation functions. Here is an example call:

>> a = 0; b = 2; f = 11; fav_example(a,b,f)
Error using fav_example
Invalid input argument at position 3. Value must be a member of ...
this set:
    -1
    0
    1 

One caveat is that a specification within the arguments block

v (1,:) double    

allows not just 1-by-n vectors (for any n) but also n-by-1 vectors, because of the convention that in many situations MATLAB does not distinguish between row and column vectors.

For more extensive examples, see New with R2019b – Function Argument Validation and Spider Plots and More Argument Validation by Jiro Doke at MathWorks Blogs.

Dot Indexing (R2019b)

One can now dot index the result of a function call. For example, consider

>> optimset('Display','iter').Display
ans =
    'iter'

The output of the dot call can even be indexed:

>> optimset('OutputFcn',{@outfun1,@outfun2}).OutputFcn{2}
ans =
  function_handle with value:
    @outfun2

Of course, these are contrived examples, but in real codes such indexing can be useful. This syntax was not allowed in R2019a and earlier.

Hexadecimal and Binary Values (R2019b)

One can now specify hexadecimal and binary values using the prefixes 0x or 0X for hexadecimal (with upper or lower case for A to F) and 0b or 0B for binary. By default the results have integer type.

>> 0xA
ans =
  uint8
   10
>> 0b101
ans =
  uint8
   5

Changes to Function Precedence Order (R2019b)

The Release Notes say “Starting in R2019b, MATLAB changes the rules for name resolution, impacting the precedence order of variables, nested functions, local functions, and external functions. The new rules simplify and standardize name resolution”. A number of examples are given where the meaning of code is different in R2019b, and they could cause an error to be generated for codes that ran without error in earlier versions. As far as I can see, though, these changes are not likely to affect well-written MATLAB code.

Program File Size (R2019b)

Program files larger than “approximately 128 MB” will no longer open or run. This is an interesting change because I cannot imagine writing an M-file of that size. Good programming practice dictates splitting a code into separate functions and not including all of them in one program file. Presumably, some MATLAB users have produced such long files—perhaps automatically generated.

For a modern introduction and reference to MATLAB, see MATLAB Guide (third edition, 2017)

mg3-front-cover.jpg

What’s New in MATLAB R2018b?

spy_new.jpg
spy; text(2,80,”What’s New?”,’FontSize’,28)

The MATLAB R2018b release notes report a number of performance improvements, including faster startup and faster calls to built-in functions. I pick out here a few other highlights from the release (excluding the toolboxes) that are of interest from the numerical analysis point of view.

The new xline and yline functions add vertical or horizontal lines to a plot—something that I have needed from time to time and have previously had to produce with a line or a plot command, together with hold on. For example, xline(pi) plots a vertical line at x = pi.

The stackedplot function is mainly of interest for plotting multiple variables from a table in a single plot, but it can also plot the columns of a matrix. In this example, A is the symmetric eigenvector matrix for the second difference matrix:

A = gallery('orthog',10,1); stackedplot(A);

The resulting plot clearly shows that the number of sign changes increases with the column index.

stackedplot_ex.jpg

String arrays, introduced in R2016b, are now supported throughout MATLAB. In the previous example I could have typed A = gallery("orthog",10,1).

A new option 'all' to the functions all, any, max, min, prod, sum (and a few others) makes them operate on all the dimensions of the array. For example:

>> A = pascal(3)
A =
     1     1     1
     1     2     3
     1     3     6
>> max(A,[],'all')
ans =
     6
>> [prod(A,'all'), sum(A,'all')]
ans =
   108    19

The empty second argument in the max call is needed because max(x,y) is also a supported usage. This is a useful addition. I have often written norm(A(:),inf) to avoid max(max(abs(A))) (which does not work for arrays with more than two dimensions), but now I can write max(abs(A),[],'all') without incurring the overhead of forming A(:).

New functions sinpi and cospi plot the sine and cosine functions at the specified multiple of \pi. Thus sinpi(x) is the same as sin(pi*x) except that it does not explicitly compute pi*x and so returns exact answers for integer arguments:

>> [sin(pi)         sinpi(1)]
ans =
   1.2246e-16            0

What’s New in MATLAB R2018a?

MATLAB R2018a was released in March 2018. With each biannual release I try to give a brief overview of the changes in MATLAB (not the toolboxes) that are of most interest to me. These are not comprehensive summaries of what’s new and you should check the release notes for full details.

matlabr2018highlights.jpg
From the MATLAB “R2018a at a Glance” page.

Complex empty arrays, such as that generated with complex([]) now have an (empty) imaginary part instead of being real.

% R2017b
>> complex([])
ans =
     []

% R2018a
>> complex([])
ans =
  0×0 empty complex double matrix

This is a consequence of a change in the way MATLAB stores complex arrays. Prior to R2018a it stored the real parts together and the imaginary parts together. In R2081 it uses an interleaved format in which the real and imaginary parts of a number are stored together; this is the storage format used in the C and C++ languages. For more details see MATLAB Support for Interleaved Complex API in C MEX Functions. Unless you write MEX files or create complex empty arrays, this change should have no effect on you.

The Live Editor continues to gain improved functionality, including embedded sliders and drop-down menus.

Legends can now have multiple columns, specified with the NumColumns property for Legend objects. I must admit that I had not realized that is was already possible to arrange legends in a horizontal (1-by-n) rather than vertical orientation (n-by-1) using the Orientation property.

Tall arrays have several new capabilities, including the ability to compute a least squares solution to an overdetermined linear system Ax = b with a tall A, which is done by QR factorization.

MATLAB starts up faster and has further optimizations in the execution engine.

The GraphPlot Object has some additional options for the 'force', 'force3', and 'circle' layouts.

I noticed that the recent Windows 10 Spring Creators Update broke the Symbolic Toolbox. An update to fix this (and various other issues) is available at this page (MathWorks login required).

For a concise but wide-ranging introduction and reference to MATLAB, see MATLAB Guide (third edition, 2017)

mg3-front-cover.jpg

Fun Books for Learning Programming

I learned Fortran from the TV course and book by Jeff Rohl. Some years later I came across A FORTRAN Coloring Book by Roger Emanuel Kaufman (MIT Press, 1978). The text is entirely handwritten (even the copyright page), is illustrated with numerous cartoons, and is full of witty wordplay. Yet it imparts the basics of Fortran very well and I could have happily learned Fortran from it. It even describes some simple numerical methods, such as the bisection method. The book is one continuous text, with no chapters or sections, but it has a good index. I’ve long been a fan of the book and Des Higham, and I include three quotes from it in MATLAB Guide.

kauf78_cover.jpg

kauf78_p75.jpg

Kaufman’s book has attracted attention in cultural studies. In the article Bend Sinister: Monstrosity and Normative Effect in Computational Practice, Simon Yuill describes it as “the first published computing text to use cartoon and comic strip drawings as a pedagogic medium” and goes on to say “and it could be argued, is the archetype to the entire For Dummies series and all its numerous imitators”. I would add that the use of cartoons within magazine articles on computing was prevalent in the 1970s, notably in Creative Computing magazine, though I don’t recall anything comparable with Kaufman’s book.

alco92_p139.jpg
A page from Illustrating C.

A book in a similar vein and from the same era, is the handwritten Illustrating Basic by Donald Alcock (Cambridge University Press, 1977). It’s a bit like Kaufman without the jokes, and is organized into sections. This was the first in a series of such books, culminating in Illustrating C (1992). Like Kaufman’s book, Alcock’s contain nontrivial examples and are a good way for anyone to learn a programming language.

Thinking Forth by Leo Brodie, about the Forth language, is typeset but contains lots of cartoons and hand-drawn figures. It was originally published in 1984 and is now freely available under a Creative Commons license.

A more recent book with a similarly fun treatment is Land of Lisp by Conrad Barski (No Starch Press, 2011). It teaches Common Lisp, coding various games along the way. It’s typeset but is heavily illustrated with cartoons and finishes with a comic strip.

What’s New in MATLAB R2017b?

Following my earlier posts What’s New in MATLAB R2016b? and What’s New in MATLAB R2017a? I take a look here at the R2017b release of MATLAB. As before, this is not a comprehensive treatment (for which see the Release Notes), but rather a brief description of the changes in MATLAB (not the toolboxes) that are the most notable from my point of view.

Decomposition Object

MATLAB now has a way of avoiding unnecessarily repeating the factorization of a matrix. In the past, if we wanted to solve two linear systems Ax = b and Ay = c involving the same square, general nonsingular matrix A, writing

x = A\b;
y = A\c;

was wasteful because A would be LU factorized twice. The way to avoid this unnecessary work was to factorize A explicitly then re-use the factors:

[L,U] = lu(A);
x = U\(L\b);
y = U\(L\c);

This solution lacks elegance. It is also less than ideal from the point of view of code maintenance and re-use. If we want to adapt the same code to handle symmetric positive definite A we have to change all three lines, even though the mathematical concept is the same:

R = chol(A);
x = R\(R'\b);
y = R\(R'\c);

The new decomposition function creates an object that contains the factorization of interest and allows it to be re-used. We can now write

dA = decomposition(A);
x = dA\b;
y = dA\c;

MATLAB automatically chooses the factorization based on the properties of A (as the backslash operator has always done). So for a general square matrix it LU factorizes A, while for a symmetric positive definite matrix it takes the Cholesky factorization. The decomposition object knows to use the factors within it when it encounters a backslash. So this example is functionally equivalent to the first two.

The type of decomposition can be specified as a second input argument, for example:

dA = decomposition(A,'lu');
dA = decomposition(A,'chol');
dA = decomposition(A,'ldl');
dA = decomposition(A,'qr');

These usages make the intention explicit and save a little computation, as MATLAB does not have to determine the matrix type. Currently, 'svd' is not supported as a second input augment.

This is a very welcome addition to the matrix factorization capabilities in MATLAB. Tim Davis proposed this idea in his 2013 article Algorithm 930: FACTORIZE: An Object-Oriented Linear System Solver for MATLAB. In Tim’s approach, all relevant functions such as orth, rank, condest are overloaded to use a decomposition object of the appropriate type. MATLAB currently uses the decomposition object only in backslash, forward slash, negation, and conjugate transpose.

The notation dA used in the MathWorks documentation for the factorization object associated with A grates a little with me, since dA (or \Delta A) is standard notation for a perturbation of the matrix A and of course the derivative of a function. In my usage I will probably write something more descriptive, such as decompA or factorsA.

String Conversions

Functions written for versions of MATLAB prior to 2016b might assume that certain inputs are character vectors (the old way of storing strings) and might not work with the new string arrays introduced in MATLAB R2016b. The function convertStringsToChars can be used at the start of a function to convert strings to character vectors or cell arrays of character vectors:

function y = myfun(x,y,z)
[x,y,z] = convertStringToChars(x,y,z);

The pre-2016b function should then work as expected.

More functions now accept string arguments. For example, we can now write (with double quotes as opposed to the older single quote syntax for character vectors)

>> A = gallery("moler",3)
A =
     1    -1    -1
    -1     2     0
    -1     0     3

Tall Arrays

There is more support in R2017b for tall arrays (arrays that are too large to fit in memory). They can be indexed, with sorted indices in the first dimension; functions including median, plot, and polyfit now accept tall array arguments; and the random number generator used in tall array calculations can be independently controlled.

Word Clouds

The new wordcloud function produces word clouds. The following code runs the function on a file of words from The Princeton Companion to Applied Mathematics.

words = fileread('pcam_words.txt');
words = string(words);
words = splitlines(words);
words(strlength(words)<5) = [];
words = categorical(words);
figure
wordcloud(words)

I had prepared this file in order to generate a word cloud using the Wordle website. Here is the MATLAB version followed by the Wordle version: wordcloud_pcam.jpg wordle-edited2.jpg Wordle removes certain stop words (common but unimportant words) that wordcloud leaves in. Also, whereas Wordle produces a different layout each time it is called, different layouts can be obtained from wordcloud with the syntax wordcloud(words,'LayoutNum',n) with n a nonnegative integer.

Minima and Maxima of Discrete Data

New functions islocalmin and islocalmax find local minima and maxima within discrete data. Obviously targeted at data manipulation applications, such as analysis of time series, these functions offer a number of options to define what is meant by a minimum or maximum.

The code

A = randn(10); plot(A,'LineWidth',1.5); hold on
plot(islocalmax(A).*A,'r.','MarkerSize',25); 
hold off, axis tight

finds maxima down the columns of a random matrix and produces this plot: islocalmax_plot.jpg

Whereas min and max find the smallest and largest elements of an array, the new functions mink(A,k) and maxk(A,k) find the k smallest and largest elements of A, columnwise if A is a matrix.

How Fast is Quadruple Precision Arithmetic?

170724-2024-49-7501.jpg
The 24th IEEE Symposium on Computer Arithmetic included several talks on multiprecision arithmetic.

When I am testing an algorithm running in double precision arithmetic I often want to compare the computed solution with a reference solution: a solution that is fully accurate to double precision. To obtain one I solve the same problem in quadruple precision and round the result back to double precision. If the conditioning of the problem leads me to doubt that that this reference solution will be correct to double precision, I compute it at a precision even higher than quadruple. Whether it is feasible to compute a reference solution in this way depends on the size of the problem and the speed of quadruple precision arithmetic. This is just one scenario where quadruple precision arithmetic is needed; others are discussed in my earlier post The Rise of Mixed Precision Arithmetic.

Roughly speaking, a quadruple precision number x can be represented by a pair of double precision numbers (x_1,x_2), where x = x_1 + x_2 and x_1 and x_2 represent the leading and trailing significant digits of x, respectively. Then a product x y can be written (x_1 + x_2)(y_1 + y_2) = x_1y_1 + x_1y_2 + x_2y_1 + x_2y_2, which involves four double precision multiplications and three double precision additions. We can therefore expect quadruple precision multiplication to be at least seven times slower than double precision multiplication. Almost all available implementations of quadruple precision arithmetic are in software (an exception is provided by the IBM z13 mainframe system), so we should expect quadruple precision to be slower than the theoretical prediction would suggest. Bailey and Borwein (2015) state that quadruple precision implemented in software typically runs five to ten times slower than hardware double precision arithmetic.

Various aspects need to be taken into consideration when we compare double precision and quadruple precision arithmetics. First, the relative speed of the arithmetics may depend on the type of operations being performed (scalar versus vectorizable operations), memory traffic considerations, and to what extent the implementations exploit multicore processors. Second, are we to compare the same code running in different precisions or computational kernels coded specially for each precision? I will take the second approach, as it is more relevant to my usage of quadruple precision. However, this does mean that I am comparing algorithms and their implementations as well as the underlying arithmetic.

I have done some very simple comparisons in MATLAB using the VPA arithmetic of the Symbolic Math Toolbox and the MP arithmetic of the third party Multiprecision Computing Toolbox from Advanpix.

The Multiprecision Computing Toolbox supports IEEE-compliant quadruple precision arithmetic, which is invoked by using the function mp.Digits to set the precision to 34 decimal digits (in fact, this is the default). For the VPA arithmetic in the Symbolic Math Toolbox 34 decimal digit precision is specified with the command digits(34) (the default is 32 digits). According to the documentation, VPA arithmetic uses a few guard digits, that is, it computes with a few extra digits of precision. The number of guard digits cannot be set by the user. In the Multiprecision Computing Toolbox the number of guard digits can be set with the function mp.GuardDigits, and the default is to use no guard digits.

These experiments were performed on an Intel Broadwell-E Core i7-6800K CPU @ 3.40GHz, which has 6 cores. The software details are:

MATLAB Version: 9.2.0.538062 (R2017a)
Operating System: Microsoft Windows 10 Home Version 10.0
                  (Build 15063)
Advanpix Multiprecision Computing Toolbox   Version 4.4.1.12580   
Symbolic Math Toolbox                       Version 7.2  (R2017a)

Further details on the Multiprecision Computing Toolbox can be seen by typing mp.Info, which lists details of open source libraries that are used.

In these experiments I do not check that the computed results are correct. Such checks are done in more extensive tests available in the test scripts at https://www.advanpix.com/2015/02/12/version-3-8-0-release-notes/. I note that the timings vary from run to run, but it is the order of magnitudes of the ratios that are of interest and these are correctly reflected by the reported results.

Test 1: LU Factorization

This code compares the speed of the arithmetics for LU factorization of a 250-by-250 matrix.

%QUAD_PREC_TEST1  LU factorization.
rng(1), n = 250;
A = randn(n); A_mp = mp(A,34); A_vpa = vpa(A,34);
t = clock; [L,U] = lu(A);     t_dp = etime(clock, t)
t = clock; [L,U] = lu(A_mp);  t_mp = etime(clock, t)
t = clock; [L,U] = lu(A_vpa); t_vpa = etime(clock, t)
fprintf('mp/double: %8.2e, vpa/double: %8.2e, vpa/mp: %8.2e\n',....
         t_mp/t_dp, t_vpa/t_dp, t_vpa/t_mp)

The output is

t_dp =
   1.0000e-03
t_mp =
   9.8000e-02
t_vpa =
   2.4982e+01
mp/double: 9.80e+01, vpa/double: 2.50e+04, vpa/mp: 2.55e+02

Here, the MP quadruple precision is 98 times slower than double precision, but VPA is 25,000 times slower than double precision.

Test 2: Complete Eigensystem of Nonsymmetric Matrix

%QUAD_PREC_TEST2  Complete eigensystem of nonsymmetric matrix.
rng(1), n = 125;
A = randn(n); A_mp = mp(A,34); A_vpa = vpa(A,34);
t = clock; [V,D] = eig(A);     t_dp = etime(clock, t)
t = clock; [V,D] = eig(A_mp);  t_mp = etime(clock, t)
t = clock; [V,D] = eig(A_vpa); t_vpa = etime(clock, t)
fprintf('mp/double: %8.2e, vpa/double: %8.2e, vpa/mp: %8.2e\n',....
         t_mp/t_dp, t_vpa/t_dp, t_vpa/t_mp)

The output is

t_dp =
   1.8000e-02
t_mp =
   1.3430e+00
t_vpa =
   1.0839e+02
mp/double: 7.46e+01, vpa/double: 6.02e+03, vpa/mp: 8.07e+01

Here, MP is 75 times slower than double, and VPA is closer to the speed of MP.

Test 3: Complete Eigensystem of Symmetric Matrix

%QUAD_PREC_TEST3  Complete eigensystem of symmetric matrix.
rng(1), n = 200;
A = randn(n); A = A + A'; A_mp = mp(A,34); A_vpa = vpa(A,34);
t = clock; [V,D] = eig(A);     t_dp = etime(clock, t)
t = clock; [V,D] = eig(A_mp);  t_mp = etime(clock, t)
t = clock; [V,D] = eig(A_vpa); t_vpa = etime(clock, t)
fprintf('mp/double: %8.2e, vpa/double: %8.2e, vpa/mp: %8.2e\n',....
         t_mp/t_dp, t_vpa/t_dp, t_vpa/t_mp)

The output is

t_dp =
   1.1000e-02
t_mp =
   3.5800e-01
t_vpa =
   1.2246e+02
mp/double: 3.25e+01, vpa/double: 1.11e+04, vpa/mp: 3.42e+02

Note that there are at least three different algorithms that could be used here (the QR algorithm, divide and conquer, and multiple relatively robust representations), so the three eig invocations may be using different algorithms.

Test 4: Componentwise Exponentiation

%QUAD_PREC_TEST4  Componentwise exponentiation.
rng(1), n = 1000;
A = randn(n); A_mp = mp(A,34); A_vpa = vpa(A,34);
t = clock; X = exp(A);     t_dp = etime(clock, t)
t = clock; X - exp(A_mp);  t_mp = etime(clock, t)
t = clock; X - exp(A_vpa); t_vpa = etime(clock, t)
fprintf('mp/double: %8.2e, vpa/double: %8.2e, vpa/mp: %8.2e\n',....
         t_mp/t_dp, t_vpa/t_dp, t_vpa/t_mp)

The output is

t_dp =
   7.0000e-03
t_mp =
   8.5000e-02
t_vpa =
   3.4852e+01
mp/double: 1.21e+01, vpa/double: 4.98e+03, vpa/mp: 4.10e+02

Both MP and VPA come closer to double precision on this problem of computing the scalar exponential at each matrix element.

Summary

Not too much emphasis should be put on the precise timings, which vary with the value of the dimension n. The main conclusions are that 34-digit MP arithmetic is 1 to 2 orders of magnitude slower than double precision and 34-digit VPA arithmetic is 3 to 4 orders of magnitude slower than double precision.

It is worth noting that the documentation for the Multiprecision Computing Toolbox states that the toolbox is optimized for quadruple precision.

It is also interesting to note that the authors of the GNU MPFR library (for multiple-precision floating-point computations with correct rounding) are working on optimizing the library for double precision and quadruple precision computations, as explained in a talk given by Paul Zimmermann at the 24th IEEE Symposium on Computer Arithmetic in July 2017; see the slides and the corresponding paper.

[Updated September 6, 2017 to clarify mp.Info and April 6, 2018 to change “Multiplication” to “Multiprecision” in penultimate paragraph.]

Elements of MATLAB Style

140415-1443-12-4032-cropped.jpg

Style is an important aspect of writing, and also of programming. While MATLAB is a quick and easy language in which to program, style should not be neglected. Good style aids readability, which in turn makes it easier to debug and maintain code. It also fosters confidence in the code. Here are six style tips, in which the examples given are all inspired by real-life code that I have come across.

For more on MATLAB style see the book MATLAB Guide, and in particular section 16.1, “Elements of Coding Style”.

1. Omit Unnecessary Parentheses

% Bad style.
x = (A')\b;
y = [1];
D = (diag(x));

The parentheses around A' are unnecessary. Without them, the statement is legal and unambiguous. But of course the parentheses would be necessary in, for example, x = (A*B)'\c;

y is a scalar. There is no need to enter it as a 1-by-1 matrix by putting it in square brackets.

There is no need for parentheses around the diag function.

% Good style.
x = A'\b;
y = 1;
D = diag(x);

2. Use Spaces Consistently

% Bad style.
q=sqrt(2);
a = 1;
Z =zeros(3);
for i=1:10, q=q+1/q; end

Stick to a convention about spacing around around equals signs and plus and minus signs. I think spaces make the code more readable.

% Good style.
q = sqrt(2);
a = 1;
Z = zeros(3);
for i = 1:10, q = q + 1/q; end

3. Omit Unnecessary Semicolons and Commas

% Bad style.
figure;
for i = 2:n,
    x(i) = x(i-1)^2*y(i);
end;

Semicolons suppress output, but in this example there is no need for the them since no output is returned by the figure or end functions. The comma would only be necessary if the for loop was collapsed onto one line.

% Good style.
figure
for i = 2:n
    x(i) = x(i-1)^2*y(i);
end

4. Use the Appropriate Construct

The following if statement is perfectly correct.

% Bad style.
if flag == 1
   fprintf('Failed to converge.\n')
elseif flag == 2
   fprintf('Overflow encountered.\n')
elseif flag == 3
   fprintf('Division by zero.\n')
elseif flag == 4
   fprintf('Tolerance too small.\n')
end

But given its regular structure, it is better written as a switch statement, which is shorter, brings out the parallelism in the logic, and is easier to check.

% Good style.
switch flag
   case 1, fprintf('Failed to converge.\n')
   case 2, fprintf('Overflow encountered.\n')
   case 3, fprintf('Division by zero.\n')
   case 4, fprintf('Tolerance too small.\n')
end

5. Omit Trailing Tildes

In the list of output arguments in a function call a tilde signifies that the output argument in that position is not wanted and so can be discarded (and hence need not be computed). This notation was introduced in MATLAB R2009b. Generally, there is no point in providing a tilde as the last output argument, as a well written function will check the number of requested outputs and not compute any trailing outputs that are not requested.

The singular value decomposition (SVD) of a matrix A is computed by the call [U,S,V] = svd(A). In the following example we wish to compute the left singular vectors of A but not the singular values or right singular vectors.

% Bad style.
[U,~,~] = svd(A);

% Good style
[U,~] = svd(A);

An obvious question is why the second example was not shortened to

U = svd(A);

The reason is that with one output argument the svd function has a special behavior: it returns a matrix of singular values, not the matrix U of left singular vectors.

6. Include an H1 line in Program Files

The H1 line in a MATLAB program file is a comment line that is the second line in the file and contains the program name followed by a one-line description of the program. It is good practice to provide every program with an H1 line, as MATLAB itself does. Several MATLAB functions make use of H1 lines. For example, when the help function is invoked with a directory name and the directory in question does not contain a contents.m file, a list of contents is created from the H1 lines of the program files in the directory and displayed.

Here is an example of a function with an H1-line (this is an updated version of a function from The Matrix Computation Toolbox).

function show(x)
%SHOW   Display signs of matrix elements.
%   SHOW(X) displays X in `FORMAT +' form, that is,
%   with `+', `-' and  blank representing positive, negative
%   and zero elements respectively.

f = get(0,'Format'); % Get current numerical format.
format +
disp(x)
format(f)            % Restore original numeric format.