February 2020 – Nick Higham

The third edition of Handbook of Writing for the Mathematical Sciences was published by SIAM in January 2020. It is SIAM’s fourth best selling book of all time if sales for the first edition (1993) and second edition (1998) are combined, and it is in my top ten most cited outputs. As well as being used by individuals from students to faculty, it is a course text on transferable skills modules in many mathematics departments. A number of publishers cite the book as a reference for recommended style—see, for example, the AMS Author Handbook, the SIAM Style Manual and, outside mathematics and computing, the Chicago Manual of Style.

Parts of the second edition were becoming out of date, as they didn’t reflect recent developments in publishing (open access publishing, DOIs, ORCID, etc.) or workflow (including modern LaTeX packages, version control, and markup languages). I’ve also learned a lot more about writing over the last twenty years.

I made a variety of improvements for the third edition. I reorganized the material in a way that is more logical and makes the book easier to use for reference. I also improved the design and formatting and checked and updated all the references.

I removed content that was outdated or is now unnecessary. For example, nowadays there is no need to talk about submitting a hard copy manuscript, or one not written in LaTeX, to a publisher. I removed the 20-page appendix Winners of Prizes for Expository Writing, since the contents can now be found on the web, and likewise for the appendix listing addresses of mathematical organizations.

I also added a substantial amount of new material. Here are some of the more notable changes.

A new chapter Workflow discusses how to organize and automate the many tasks involved in writing. With the increased role of computers in writing and the volume of digital material we produce it is important that we make efficient use of text editors, markup languages, tools for manipulating plain text, spellcheckers, version control, and much more.
The chapter on $\LaTeX$ has been greatly expanded, reflecting both the many new and useful packages and my improved knowledge of typesetting.
I used the enumitem $\LaTeX$ package to format all numbered and bulleted lists. This results in more concise lists that make better use of the page, as explained in this blog post.
I wrote a new chapter on indexing at the same time as I was reading the literature on indexing and making an improved index for the book. Indexing is an interesting task, but most of us do it only occasionally so it is hard to become proficient. This is my best index yet, and the indexing chapter explains pretty much everything I’ve learned abut the topic.
Since the second edition I have changed my mind about how to typeset tables. I am now a convert to minimizing the use of rules and to using the booktabs $\LaTeX$ package, as explained in this blog post.
The chapter Writing a Talk now illustrates the use of the Beamer $\LaTeX$ package.
The book uses color for syntax highlighted $\LaTeX$ listings and examples of slides.
Sidebars in gray boxes give brief diversions on topics related to the text, including several on “Publication Peculiarities”.
An expanded chapter English Usage includes new sections on Zombie Nouns; Double Negatives; Serial, or Oxford, Comma; and Split Infinitives.
There are new chapters on Writing a Blog Post; Refereeing and Reviewing; Writing a Book; and, as discussed above, Preparing an Index and Workflow.
The bibliography now uses the backref $\LaTeX$ package to point back to the pages on which entries are cited, hence I removed the author index.
As well as updating the bibliography I have added DOIs and URL links, which can be found in the online version of the bibliography in bbl and PDF form, which is available from the book’s website.

At 353 pages, and allowing for the appendices removed and the more efficient formatting, the third edition is over 30 percent longer than the second edition.

As always, working with the SIAM staff on the book was a pleasure. A special thanks goes to Sam Clark of T&T Productions, who copy edited the book. Sam, with whom I have worked on two previous book projects, not only edited for SIAM style but found a large number of improvements to the text and showed me some things I did not know about $\LaTeX$ .

SIAM News has published an interview with me about the book and mathematical writing and publishing.

Here is a word cloud for the book, generated in MATLAB using the wordcloud function, based on words of 6 or more characters.

The softmax function takes as input an $n$ -vector $x$ and returns a vector $g(x)$ with elements

$g_j(x) = \displaystyle\frac{\mathrm{e}^{x_j}}{\sum_{i=1}^n \mathrm{e}^{x_i}}, \quad j=1\colon n,$

The elements of $g$ are all between $0$ and $1$ and they sum to 1, so $g$ can be regarded as a vector of probabilities. Softmax is a key function in machine learning algorithms.

Softmax is the gradient vector of the log-sum-exp function

$f(x) = \displaystyle\log \sum_{i=1}^n \mathrm{e}^{x_i}.$

This function is an approximation to the largest element, $x_{\max} = \max_i x_i$ of the vector $x$ , as it lies between $x_{\max}$ and $x_{\max} + \log n$ .

A problem with numerical evaluation of log-sum-exp and softmax is that overflow is likely even for quite modest values of $x_i$ because of the exponentials, even though $g(x)$ cannot overflow and $f(x)$ is very unlikely to do so.

A standard solution it to incorporate a shift, $a$ , and use the formulas

$f(x) = a + \displaystyle\log \sum_{i=1}^n \mathrm{e}^{x_i-a}, \hspace*{4.5cm}(1)$

and

$g_j(x) = \displaystyle\frac{\mathrm{e}^{x_j-a}}{\sum_{i=1}^n \mathrm{e}^{x_i-a}}, \quad j=1\colon n, \hspace*{3.3cm}(2)$

where $a$ is usually set to $x_{\max}$ .

Another formula for softmax is obtained by moving the denominator into the numerator:

$g_j(x) = \exp\left(x_j - a - \log\displaystyle\sum_{i=1}^n\mathrm{e}^{x_i -a}\right). \hspace*{2cm}(3)$

This formulas is used in various codes, including in the SciPy 1.4.1 function softmax.

How accurate are these formulas when evaluated in floating-point arithmetic? To my knowledge, this question has not been addressed in the literature, but it is particularly important given the growing use of low precision arithmetic in machine learning. Two questions arise. First, is there any difference between the accuracy of the formulas (2) and (3) for $g_j(x)$ ? Second, in (1) and (3), $a$ is added to a nonnegative log term, so when $a = x_{\max}$ is negative can there be damaging subtractive cancellation?

In a recent EPrint with Pierre Blanchard and Des Higham I have investigated these questions using rounding error analysis and analysis of the conditioning of the log-sum-exp and softmax problems. In a nutshell, our findings are that while cancellation can happen, it is not a problem: the shifted formulas (1) and (2) can be safely used.

However, the alternative softmax formula (3) is not recommended, as its rounding error bounds are larger than for (2) and we have found it to produce larger errors in practice.

Here is an example from training an artificial neural network using the MATLAB Deep Learning Toolbox. The network is trained to classify handwritten digits from the widely used MNIST data set. The following figure shows the sum of the computed elements of the softmax vector $g(x)$ for 2000 vectors extracted from the training data, where $g(x)$ was computed in IEEE half precision arithmetic. The sum should be 1. The red circles are for formula (2) and the blue crosses are for the division-free formula (3). Clearly, (2) gives a better approximation to a vector of probabilities (in the sense of respecting the constraint that probabilities sum to unity); the actual errors in each vector component are also smaller for (2).

Month: February 2020

Handbook of Writing for the Mathematical Sciences, Third Edition

Accurately Computing the Softmax Function

Share this:

Share this: