## A Mathematician Looks at the Collins English Dictionary

I have several dictionaries on my shelf, among which is a well-thumbed Collins English Dictionary (third edition, 1991). Earlier this year I acquired the thirteenth edition (2018). At 26.5cm high, 20cm wide, and 6.5cm deep, and weighing approximately 2.5kg, it’s an imposing tome. It’s printed on thin paper with minimal show-through and in a specially designed font (Collins Fedra) that is very legible.

The thirteenth edition, which I will abbreviate to CED13, is a wonderful acquisition for any dictionary lover. It has a wide coverage, including

• new words such as micromort (“a unit of risk equal to a one-in-a million chance of dying”),
• obscure words such as compotation (“the act of drinking together in a company”), and
• a wide selection of proper nouns, including my home town Eccles and, somewhat unexpectedly, Laurel and Hardy and Torvill and Dean (Olympic ice dance champions, 1984).

It has no appendices on English usage, mathematical symbols, chemical elements, etc., as are found in many dictionaries—which is fine with me as I rarely use them.

I decided to take a close look at some of the mathematical words in the CED.

determinant n maths: a square array of elements that represents the sum of certain products of these elements, used to solve simultaneous equations, in vector studies, etc.”

This definition has two problems. First, a determinant is the sum, not something that represents the sum. Of, course, one will find in some textbooks statements such as “swapping two rows of a determinant changes its sign”, but it’s odd that this informal usage of determinant as array is the only one mentioned. A second problem is that the determinant is not a sum of products: it is a signed sum of products and it is the permanent (not in this dictionary) that is obtained by taking all positive signs.

matrix n maths a rectangular array of elements set out in rows and columns, used to facilitate the solution of problems, such as transformation of coordinates.”

A matrix is more than just an array: its key characteristic is that it has algebraic operations defined on it.

rounding: n computing a process in which a number is approximated as the closest number that can be expressed using the number of bits or digits available.”

Rounding is not specifically a computing term—it’s more fundamentally a mathematical operation and predates computing. Bits are special cases of digits. And rounding does not have to be to the closest number: in some situations once needs to round to the next larger or smaller number.

index n maths c a subscript or superscript to the right of a variable to express a set of variables, as in using $x_i$ for $x_1$, $x_2$, $x_3$, etc”

An index does not (except maybe in informal usage) express a set, but rather identifies a member of a set.

supercomputer n a powerful computer that can process large quantities of data of a similar type very quickly.”

Supercomputers do mathematical calculations (and are ranked on their speed in doing so), which is not apparent from this definition. I’m also not sure why “of a similar type” is necessary. The PC on which I am typing is a supercomputer according to this definition!

integral n maths the limit of an increasingly large number of increasingly smaller quantities, related to the function that is being integrated (the integrand). The independent variables may be confined within certain limits (definite integral) or in the absence of limits (indefinite integral).”

This seems to be an attempt to state informally the Riemann sum definition of definite integration. Sadly, it’s technically incomplete and sure to baffle anyone who doesn’t already know about Riemann sums. It would have been much better to simply say that integration is the inverse of differentiation. The second sentence is grammatically incorrect.

fractal maths n a figure or surface generated by successive subdivisions of a simpler polygon or polyhedron, according to some iterative process.”

Surely any definition should mention fractional dimension and self-similarity? This definition excludes the fractal that is the boundary of the Mandelbrot set.

I’m not too surprised by these weaknesses, because in 1994 I wrote an article Which Dictionary for the Mathematical Scientist? (PDF file here) in which I evaluated several dictionaries (including CED3) from the point of view of their mathematical words and found problems such as those above in several of them.

Despite these criticisms, I very much like this dictionary and I use it as much as the other dictionaries on my desk. It is especially good on the computing side. I was pleased to see that my favourite editor, emacs, is included (though I’m not sure why it is not capitalized). Vi users will be sad to hear that Vi is not included. A good number of programming languages are present, including awk (uncapitalized), Java, and Javascript, but not, C++ (how would that be alphabetized?), Python, or R.

A particularly notable definition is

flops or FLOPS n acronym for floating-point operations per second: used as a measure of computer processing power (in combination with a prefix): megaflops; gigaflops.

This is much better than the Oxford English Dictionary’s definition of the singular flop as “a floating-point operation per second”. There are also entries for petaflop,$10^{15}$ floating-point operations a second”, and teraflop, “a thousand billion floating-point operations a second”. I just wish the latter definition contained “$10^{12}$“, because there is scope for misunderstanding because of the alternative meaning of a billion as a million million in the UK.

This entry was posted in books and tagged . Bookmark the permalink.

### 1 Response to A Mathematician Looks at the Collins English Dictionary

1. scruss2 says:

Hi Nick — glad you like the CED. I worked in Collins Dictionaries’ computational division back in the late 90s. I’m not in any way a lexicographer, but some of the decisions that go into dictionary compilation have stuck with me.

As you noted, some of the definitions of computer terms might not meet your expectations. While they were sent out to computer experts, those experts might be considered generalists in your field. There are also the issues of time and space: dictionaries have hard publication deadlines, and limits on the number of pages are set fairly early on. Creating a short entry that doesn’t require the reader to look up other words in the definition is often considered valuable by dictionary editors.

Because the computation team was the nearest and cheapest resource, we’d sometimes get asked to look over word choice. At the time, Collins Dictionaries was a Sun/Solaris shop and we processed and proofed most of the English and Bilingual dictionaries in a mix of shell, awk (hence the definition) and Perl scripts. Sometimes local human bias crept in: for a while there was a definition of mouse (computer) that said they operated on top of a gridded metal pad. If you used the old 3-button Mouse Systems mice that came with Sun workstations in the mid-1990s, you’ll know where that came from …

We developed a lot of check procedures for dictionaries. Not just spelling (the obvious one), but also sorting (far more subtle than you want to know about, and pretty horrid in the pre-Unicode days too) and coverage. Coverage was a really important one and was a hard requirement of the tiny Gem dictionary: every word used in the definitions had to be defined in the dictionary itself.

We typeset most of the smaller dictionaries in-house using a homegrown tool that generated PostScript. We’d previously tried to use PostScript to generate all the hyphenation, justification and page breaks, but — as a numerics person, you’ll know what’s inevitably coming next — adding up lots of small floating-point numbers of indifferent precision resulted in different line breaks (or worse, page breaks/counts) when run on different PostScript RIPs. Complex PostScript was also unpopular with the printers as it could be terribly slow on their old phototypesetting machines.

Larger dictionaries were set out-of-house. I think the CED was set in FrameMaker. Amazingly, some of the other large titles were set in troff by a small family of wizards living in Scarborough. We never quite could work out how they did some of their magic. The combined Dictionary & Thesaurus was a tough one: dictionary definitions at the top of the page, matching thesaurus definitions at the bottom. We always had the theory that they’d extended troff’s footnote mechanism to make the layout, but like all good wizards, they weren’t telling. (We tried TeX, by the way. At the time it didn’t have precise enough page size definitions for us, had the irritating habit of dropping back to bitmap fonts for seemingly random reasons and its handling of multilingual characters wasn’t up to snuff. There never was time to get it quite right before the next deadline.)

The treatment of new words was contentious when I was there. The academic dictionary tradition was to avoid new (“buzzy”) words until their usage settled. Dictionaries have to cover their costs though and Marketing loved to have the newest words so they could outdo the competition. Words of a certain time got stripped out due to finite space: “perestroika” was one that had aged out while I worked there.

I’m glad you like the paper they print on. Thin opaque dictionary paper is rather expensive (second only to the absurdly expensive bible paper, used only on a very few very pricey editions). Dictionary paper is very strong; one should be able to pick up a dictionary just by a single leaf from near the middle of the book. I don’t recommend doing this on your newest/most treasured dictionary, though!