Far more numbers start with the digit '1' than the digit '9'.
The first person to notice this counter-intuitive result was the American astronomer Simon Newcomb, who in 1881 observed that the pages at the front of his book of logarithms were tattier than those at the back.
The phenomenon was properly investigated by the physicist Frank Benford in 1938. He analysed 20 naturally occurring sets of data and noted a common pattern in their leading digits.
One of the sets of data he analysed was the area of 335 river basins. He also looked at the population of 3259 towns, all 308 numbers in a copy of Readers' Digest, the house numbers of 342 scientists and 1458 baseball records. In each case he noted the first digit of the numbers and came up with the following.
First digit
1
2
3
4
5
6
7
8
9
Rivers, Area
31%
16%
11%
11%
7%
9%
6%
4%
5%
Population
34%
20%
14%
8%
7%
6%
4%
4%
2%
Readers Digest
34%
19%
12%
8%
7%
6%
5%
5%
4%
Addresses
29%
19%
13%
9%
8%
6%
6%
5%
5%
Baseball
33%
18%
13%
8%
8%
6%
5%
5%
5%
Benford averaged out all his results and discovered they related to a logarithmic function.
First digit
1
2
3
4
5
6
7
8
9
Distribution
30.1%
17.6%
12.5%
9.7%
7.9%
6.7%
5.8%
5.1%
4.6%
30% of numbers start with 1. Only 5% of numbers start with 9.
1
2
3
4
5
6
7
8
9
Today's it's called Benford's Law. In a naturally occurring set of data, the leading digit of a number is much likelier to be smaller than larger.
Let me attempt to sort-of explain why this works.
Imagine you live in a random house in this street. With nine houses, each starting digit is equally likely.
1
2
3
4
5
6
7
8
9
But add one more house and '1' is now twice as likely as the other starting digits.
1
2
3
4
5
6
7
8
9
10
Add ten more houses and suddenly more than half of the numbers start with a 1!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Extending the street to 30, 40, 50... decreases the proportion of 1s each time, until at 99 the probabilities are equal again. But all of the next 100 houses start with 1, which means by house 200 the overall percentage has leapt up to an impressive 55½%. Add further houses and the proportion falls back, reaching equality at house 999, but after that it shoots up massively again through the 1000s.
If you could draw a graph, it would look something like this.
Of course most streets don't have 1000 houses, let alone 10000, but the principle remains the same. The street you live in could be of any length, and in most streets '1' is much more likely than any other leading digit.
I thought I'd have a go at extracting some data myself. I found an old copy of the East London phone book and looked up the house numbers of 100 consecutive people, starting with Mr Benford at 8 Exning Road.
Here are those 100 house numbers rearranged by leading digit.
1 is by far the most popular leading digit, thanks to a lot of houses in the teens and a lot more in the hundreds. A single house in the thousands (on Westferry Road) also makes an appearance. House numbers in the twenties are also very popular, but there are rather fewer in the 30s and 40s so the frequency of the leading digits starts to fall. The results don't perfectly match Benford's Law because my sample was small, and because most house numbers are fairly low, but they do fit the general overall pattern.
Next I decided to investigate the populations of the 69 cities in the UK. Here are those populations rearranged by leading digit.
Over a third of these city population figures start with the digit 1. They range from St Davids (1841) to Birmingham (1092330), with a heck of a lot of six-digit populations inbetween. Over half of the cities have populations starting with either 1 or 2 - a greater proportion than Benford's Law might suggest. But the numbers certainly drop away fast. Only Glasgow starts with 6. Only Bath starts with 8.
Importantly, Benford's Law doesn't apply to all kinds of statistical data. It works best when numbers are plucked from a continuum of values covering several orders of magnitude, for example measurements from a population or prices of goods. It doesn't work well for numbers issued in sequence, random numbers (e.g. lottery draws) or variables with constraints (e.g. years of birth). Also it doesn't matter what the units are, so for example lengths of rivers should work just as well in kilometres as in miles.
For one last experiment, I thought ticket numbers would be an interesting thing to check. I don't have hundreds of old tickets lying around at home so I went to the London Transport Museum's online collection which does. First I sampled 100 train tickets (mostly from the tube) and then 100 bus tickets. And this happened.
First digit
1
2
3
4
5
6
7
8
9
Train tickets
28%
11%
14%
11%
9%
9%
6%
4%
8%
Bus tickets
13%
12%
5%
12%
15%
11%
11%
13%
8%
Serial numbers on train tickets matched Benford's Law fairly well, with a lot more 1s than any other digit and not so many 7s, 8s and 9s. This may be because the tickets were in all kinds of historical formats, some with three digit serial numbers, some four, some five or even six. But the numbers on the bus tickets were spread out much more equally across the different digits at roughly 11% each. I think this is because bus ticket numbers were invariably four digits long and sequentially issued, in which case the law wouldn't apply.
Benford's Law isn't just a mathematical curiosity, it has a genuine use in the detection of fraud. Any accountant inventing numbers is likely to balance out their first digits rather than including lots more 1s, and a simple analysis of leading digits can be all it takes to catch them out.
If you're interested you could always read more about Benford's Law (at varyinglevelsofmathematicaldifficulty) or view some contemporary datasets at testingbenfordslaw.com. Better still, do some practical investigating of your own. Pick a set of data, tally the first digits and see whether or not they fit the pattern. Likely you'll find a lot more 1s than 9s, just like Frank Benford did.