Nova: Science in the news
Published by the Australian Academy of Science
Back to the normal view

When the numbers just don't add up


Mathematics and statistics provide essential information for the operation of today's technocratic society. But beware: numbers can be fudged!
Contents

Key text
Activities
Further reading
Useful sites
Glossary


Key text

For most of us, the process of number-crunching is a mysterious one. A statistician, mathematician or pollster takes a whole bunch of numbers (called data), feeds them into a computer (perhaps using models or statistical packages), punches a few buttons and – lo and behold! – out prints a number that tells us something about the world around us.

But should we accept such numbers at face value? Putting a number on something is a necessary part of science and society. It should set us thinking about differences and possibilities and the magnitude of problems and solutions, but only rarely should we assume that it is exact. There are many ways in which numbers can be fudged – either deliberately or through error. Let's take a look at a few.

Oranges and apples

In many areas of science, people disagree on definitions. What is a forest, for example? This may seem a silly question; after all a forest is a lot of trees growing alongside each other. But look more closely and you can see that there is room for (mis)interpretation. Exactly how close together must the trees be? Or how tall? A recent government report showed that the area of forest in Australia had increased from about 43 million hectares in 1992 to just under 157 million hectares in 1998. This doesn't mean that trees have suddenly started sprouting up all over the country; it means that the definition of 'forest' has changed. Now we count all our woodlands (where the trees are quite far apart) and most of our mallee (where the trees aren't very tall) as forest, when previously they were ignored.

So, one of the first rules of number-crunching is to compare like with like; apples with apples. Ignoring this rule can lead to all kinds of mistakes. For example, the US Navy once advertised that it was safer to be in the navy than out of it, since the death rate in the navy during the Spanish-American War was 9 per thousand, compared to 16 per thousand in New York over the same period. But the naval recruiters were comparing apples with oranges – the navy consisted mostly of healthy young men, while the population of New York included people with higher natural death rates such as children, the elderly and the ill.

Sampling errors

Another common mistake is sampling error. Suppose, for example, that you want to know how many Australian children brush their teeth after breakfast. You could set out to ask every child - let's say there are 5 million in Australia – but this will take a lot of time and money and would probably prove totally impractical.

One way around this problem is to sample by surveying a small part of the total population. Let's say our sample size is 500; of these, 320 kids claim they do brush their teeth. All other things being equal, we can assume that the same proportion (64 per cent) of the entire population of children also brush their teeth (and 36 per cent don't).

This technique is well established in science, but there are traps for the unwary. The sample must be unbiased, or random. This means that everything (every child, in this case) in the population must have an equal chance of being selected in the sample. This sounds easy, but in practice it might be quite difficult. How do you go about selecting the sample? Severe bias might occur if, for example, the sample group was drawn from the membership list of the Australian Orthodontists Association - we might expect that the children of dentists are more likely to brush their teeth than others (although we couldn't be sure without a survey!).

Sample size is also critical. Australian mathematician Jane Watson investigated a claim made by a seafood company that 'Seven in ten men who frequently eat canned tuna, sardines, salmon, mackerel or kippers admit to being ambitious' – the implication being that a diet of fish increased ambitiousness. She found that the claim was based on a real sample size of only six men (who all ate fish frequently), of whom four (or 66 per cent – close enough to 70 per cent, or 7 out of 10) considered themselves more ambitious than their colleagues. So the claim was true enough for the sample. But with such a small sample size it would be misleading to suggest that it represented the entire population of fish-eating men in the country.

Deliberate misrepresentation

For various reasons, governments, organisations and individuals don't always tell the truth – although they may be able to produce impressive numbers to justify their arguments. In the Australian tax debate, for example, all sides of politics can support their case with a vast array of numbers produced by economic models. How can the people of Australia decide who is right? A model can be made to produce almost any outcome by varying the underlying assumptions. Perhaps all we can do is ask that these assumptions are made public.

One way of deliberately misrepresenting numbers is through the use of graphs or other illustrations. For example, look at this graph.

How much more money was this fictitious government spending on environmental matters than on national defence? The answer is twice as much, but the presentation of the figures using pretty cubes gives a different impression. One cube is eight times bigger (by volume) than the other.

Beware of unjustified detail

If you are told that the atmospheric carbon dioxide concentration has increased by 25.8743 per cent since 1850, be suspicious. Although carbon dioxide can be measured in air with great accuracy, the concentration fluctuates on a daily and seasonal basis. Also, estimates of its concentration in 1850 have been derived by many means but cannot be as accurate as a present-day measurement. So its increase since that time is impossible to know with absolute precision. Scientists are aware of the limits of accuracy and will not quote figures with unjustified detail. For the example of carbon dioxide increase, a figure of 'about 26 per cent', or even 'between 20 and 30 per cent' would be more acceptable.

What is average?

The word 'average' can be used incorrectly. You often hear people complaining about the number of schoolchildren who are below average in reading and writing. This shows a lack of understanding of the concept. By definition, an arithmetic average, or mean, will always have some values above and some below it, otherwise it would not be an average. Thus, it is not surprising or shocking if you read that many households in Australia produce more than the average quantity of domestic rubbish. It is to be expected. Mind you, such a statistic might also be informative – if rubbish production is known to be above average in certain suburbs, these might be targeted by waste reduction campaigns. In this way, it might be possible to lower the overall average.

Averages can be misleading. Statisticians remind themselves that people can disappear without trace in a lake with an average depth of 2 centimetres – if they happen to fall into the small part of it that is 10 metres deep. Averages can give the wrong impression if they are taken from a set of numbers with a few very high or very low values. Most of this fictitious lake had a depth of less than 2 centimetres; the deep bit of 10 metres was hidden in the average figure.

Averages are measured in different ways – confusing these can also lead to error. The most commonly used average is the mean. This is the sum of all the values in a series divided by the total number of values. Then there is the median, which is the middle value, and the mode, which is the most frequent number in a series. So in a sequence of 2, 2, 3, 8, 10 the mean is {2+2+3+8+10}/5 = 5, the median is 3 and the mode is 2.

The mean, median and mode are rarely identical but they can all legitimately be called an average; this opens up a nice little loophole for those wishing to present their case in the most positive light. If the numbers above are the scores you received in spelling tests, you would probably report to your parents that you were averaging 5 (the mean value). But if they represented your scorecard after 5 holes of golf, you might prefer to boast about an average of 2 (the modal value). And if you wanted to reflect your efforts most fairly, you could report all three values: mean, median and mode.

Correlations

Beware of correlations. Statisticians and other scientists love to look for cause-and-effect relationships – if such-and-such happens, then this-and-that will happen as a consequence. Medical researchers often look for correlations between habits (smoking, for example) and diseases (such as lung cancer).

But proving a correlation is not the same thing as proving a cause-and-effect relationship. The average wage may be correlated to the national debt (both have increased over time), but this doesn't mean that one causes the other. Good scientists use rigorous sampling and statistical techniques to eliminate all other possible factors before asserting a causal relationship. And they must propose a reasonable mechanism to account for the cause-and-effect relationship.

Missing information

Sometimes it is the numbers that aren't presented that cause confusion. Ross Gittins pointed out a good example of this in the Sydney Morning Herald of 4 February 1998. At the time, the youth unemployment rate was reported to be running at 28 per cent – the implication being that more than a quarter of Australia's young were on the dole.

In deciding whether to accept this number or not, the first thing to establish is the definition of 'youth' – in this case it appears to mean people aged over 15 and under 20. The second is to look at the raw numbers. According to Gittins, there are about 1.05 million people in this age group in Australia. Of these, 740,000 are still in the education system, 223,000 are in the full-time work force and 86,000 are unemployed. The 86,000 unemployed represent 28 per cent of the teenage workforce but only 8.2 per cent of the total population in that age group. By looking more closely at the numbers, we are now in a position to consider the 28 per cent figure in its proper context.

The trend of using trends

Trends are widely used by people such as economists, company executives and stock market brokers to help predict the future. But they can lead to absurd results.

Consider the following statement: 'If present trends continue, the record times for many world athletic events will eventually be zero.' Real trends are almost never in straight lines. In this case the rate of improvement in athletic records gets slower and slower as we approach the limits of human performance.

It is possible for something to decrease forever without reaching zero. Similarly, something can increase without becoming infinite. This can happen if the rate of increase or decrease is itself becoming less. So, the amount chiselled off each world athletics record will get smaller and smaller. New records for shorter times can continue forever, but without the times reaching zero.

The calculated risk

Not all numbers are wrong or misleading and quite a few are very informative about the world in which we live. Indeed, without numbers we couldn't measure our environment or our society and we would find it very difficult to address many environmental and social issues.

But it pays to be wary. When presented with a number, consider it carefully. How has it been collected? Is there any reason for the collector – or the presenter – to misrepresent the true picture? Do you need more information to decide whether to accept the number or not? Above all, does it make sense? Questions such as these will help you survive the numbers game.


Activities


Further reading


New Scientist
21 October 2006, page 6
Iraq’s body count
Looks at efforts to monitor the number of 'excess' deaths occurring in Iraq since the invasion in 2003.


23 September 2006, page 14
A reality check for conservationists (by Peter Aldhous)
Argues that conservationists should ensure their decisions are objectively based to preserve biodiversity.


4 March 2006, page 22
Hype and herceptin (by Ralph W Moss)
Argues that the real benefits of herceptin may not be all they seem.


20 February 1999, page 48
Rivers of doubt (by Fred Pearce)
Examines the difficulty scientists and writers have in communicating a degree of uncertainty to the public.


13 February 1999, page 16
Golden goals (by Robert Matthews)
Shows how the analysis of English football matches has exposed subtle trends in scores.


Scientific American
2 March 2007
Has James Cameron found Jesus's tomb or is it just a statistical error? (by Christopher Mims)
Looks at the critical assumptions made in the statistical analysis about the ‘tomb of Jesus’.


October 2006, page 14
Contentious calculation (by John Dudley Miller)
Looks at the controversy over estimates of Chernobyl’s future cancer toll.


15 September 2006
Darfur dead much higher than commonly reported (by J R Minkel)
Comments on the difficulty in obtaining accurate figures for the number of deaths in Sudan.


Useful sites

How numbers are tricking you (Technology Review, USA)

A guide to the most common types of errors in statistical reasoning found in the media.
http://www.geocities.com/CapitolHill/4834/barnett.htm


Glossary of mathematical mistakes

One individual's collection of the ways in which numbers can be misrepresented.
http://members.cox.net/mathmistakes/glossary1.htm


Statistics every writer should know (Niles Online, USA)

Explains some basic concepts in statistics from mean and median to standard deviation and margin of error. 'Data analysis' suggests ways to look at data more critically.
http://nilesonline.com/stats/


Ockham's Razor (ABC Radio National)

Transcripts from the ABC radio program, Ockham's Razor.

  • Be aware of mathematics (29 June 1997)
    Statistics, models and measurements are often presented to support a point of view. According to Dr David Blest, Associate Professor of Mathematics at the University of Tasmania, a critical interpretation of numbers like these requires a knowledge of mathematics.
    http://www.abc.net.au/rn/science/ockham/stories/s163.htm

  • The need for statistical literacy in Australia (13 April 1997)
    Dr Jane Watson, Reader in Mathematical Education at the University of Tasmania, presents a number of examples showing how statistics can be misused, and suggests that there are three stages in the development of statistical literacy.
    http://www.abc.net.au/rn/science/ockham/stories/s29.htm


Can you trust statistics? (Peter Macinnis, Australia)

A chatty discussion about the use and abuse of statistics. (Originally presented on Ockham's Razor, 1991.)
http://members.ozemail.com.au/~macinnis/ockhams/stats.htm


Glossary

correlation. The closeness of the relationship between two variables. The correlation is positive if an increase in one variable implies an increase in the other, and negative if an increase in one implies a decrease in the other. Variables having no relationship at all are said to be uncorrelated.

model. Solving complex problems associated with real situations is often made easier by setting up a model of the situation – a mathematical description of the problem. To set up a model, a problem is simplified and only those aspects that can be represented mathematically are included.

After the problem is solved mathematically, tentative solutions are translated back to the real situation, as possible real solutions. At this stage the inadequacy of the simple model may be revealed, and some parts of the process may need to be changed. More information on models and modelling can be found at What is modelling? (Nova: Science in the news, Australian Academy of Science).

variable. Something that takes on different values that can be measured or counted. If one variable can be controlled exactly (such as the selling price of apples) then it is called an 'independent variable', while the remaining variable (in this case the number of apples bought) is called a 'dependent variable'.


External sites are not endorsed by the Australian Academy of Science.
Posted April 1999.

The Australian Foundation for Science is also a supporter of Nova.

This topic is sponsored by Australian university mathematical sciences departments and the Australian Government's National Innovation Awareness Strategy.


© Australian Academy of Science