diamond geezer

Monday, April 20, 2020

Far more numbers start with the digit '1' than the digit '9'.

The first person to notice this counter-intuitive result was the American astronomer Simon Newcomb, who in 1881 observed that the pages at the front of his book of logarithms were tattier than those at the back.

The phenomenon was properly investigated by the physicist Frank Benford in 1938. He analysed 20 naturally occurring sets of data and noted a common pattern in their leading digits.

One of the sets of data he analysed was the area of 335 river basins. He also looked at the population of 3259 towns, all 308 numbers in a copy of Readers' Digest, the house numbers of 342 scientists and 1458 baseball records. In each case he noted the first digit of the numbers and came up with the following.

First digit 1 2 3 4 5 6 7 8 9

Rivers, Area 31% 16% 11% 11% 7% 9% 6% 4% 5%

Population 34% 20% 14% 8% 7% 6% 4% 4% 2%

Readers Digest 34% 19% 12% 8% 7% 6% 5% 5% 4%

Addresses 29% 19% 13% 9% 8% 6% 6% 5% 5%

Baseball 33% 18% 13% 8% 8% 6% 5% 5% 5%

Benford averaged out all his results and discovered they related to a logarithmic function.

First digit 1 2 3 4 5 6 7 8 9

Distribution 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%

30% of numbers start with 1. Only 5% of numbers start with 9.

1 2 3 4 5 6 7 8 9

Today's it's called Benford's Law. In a naturally occurring set of data, the leading digit of a number is much likelier to be smaller than larger.

Let me attempt to sort-of explain why this works.

Imagine you live in a random house in this street. With nine houses, each starting digit is equally likely.

1 2 3 4 5 6 7 8 9

But add one more house and '1' is now twice as likely as the other starting digits.

1 2 3 4 5 6 7 8 9 10

Add ten more houses and suddenly more than half of the numbers start with a 1!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Extending the street to 30, 40, 50... decreases the proportion of 1s each time, until at 99 the probabilities are equal again. But all of the next 100 houses start with 1, which means by house 200 the overall percentage has leapt up to an impressive 55½%. Add further houses and the proportion falls back, reaching equality at house 999, but after that it shoots up massively again through the 1000s.

If you could draw a graph, it would look something like this.

Of course most streets don't have 1000 houses, let alone 10000, but the principle remains the same. The street you live in could be of any length, and in most streets '1' is much more likely than any other leading digit.

I thought I'd have a go at extracting some data myself. I found an old copy of the East London phone book and looked up the house numbers of 100 consecutive people, starting with Mr Benford at 8 Exning Road.

Here are those 100 house numbers rearranged by leading digit.

1 1, 10, 11, 11, 11, 12, 13, 14, 14, 14, 14, 14, 14, 16, 17, 17, 17, 17, 18, 19, 105, 109, 112, 115, 119, 125, 129, 133, 141, 144, 145, 156, 157, 163, 166, 185, 186, 195, 1401 39%

2 2, 2, 20, 22, 23, 23, 25, 27, 27, 27, 28, 28, 29, 29, 29, 29, 228, 244
18%

3 3, 30, 34, 36, 303 5%

4 4, 4, 42, 43, 44, 47, 47, 48 8%

5 5, 5, 5, 51, 52, 52, 57 7%

6 61, 62, 64, 65, 69 5%

7 7, 7, 7, 7, 7, 7, 72, 78, 78, 704 10%

8 8, 8, 8, 82, 89 5%

9 9, 90, 97, 98 4%

1 is by far the most popular leading digit, thanks to a lot of houses in the teens and a lot more in the hundreds. A single house in the thousands (on Westferry Road) also makes an appearance. House numbers in the twenties are also very popular, but there are rather fewer in the 30s and 40s so the frequency of the leading digits starts to fall. The results don't perfectly match Benford's Law because my sample was small, and because most house numbers are fairly low, but they do fit the general overall pattern.

Next I decided to investigate the populations of the 69 cities in the UK. Here are those populations rearranged by leading digit.

1 1841, 10536, 14777, 16702, 18766, 18808, 107524, 107877, 116595, 117773, 120165, 121688, 123867, 132512, 138375, 140202, 140644, 145736, 151145, 151906, 153990, 168310, 183631, 189120, 198051, 1092330 38%

2 20256, 26795, 29946, 205056, 219396, 233933, 236882, 239023, 248752, 249008, 249470, 256384, 256406, 273369, 275506, 280177 23%

3 3355, 32219, 34790, 305680, 316915, 325837, 329839, 333871, 346090 13%

4 40302, 45770, 428234, 466415, 468720 7%

5 58896, 503127, 522452, 552698 6%

6 603080 1%

7 7375, 79415, 751485 4%

8 88859 1%

9 91733, 93541, 94375, 98768 6%

Over a third of these city population figures start with the digit 1. They range from St Davids (1841) to Birmingham (1092330), with a heck of a lot of six-digit populations inbetween. Over half of the cities have populations starting with either 1 or 2 - a greater proportion than Benford's Law might suggest. But the numbers certainly drop away fast. Only Glasgow starts with 6. Only Bath starts with 8.

Importantly, Benford's Law doesn't apply to all kinds of statistical data. It works best when numbers are plucked from a continuum of values covering several orders of magnitude, for example measurements from a population or prices of goods. It doesn't work well for numbers issued in sequence, random numbers (e.g. lottery draws) or variables with constraints (e.g. years of birth). Also it doesn't matter what the units are, so for example lengths of rivers should work just as well in kilometres as in miles.

For one last experiment, I thought ticket numbers would be an interesting thing to check. I don't have hundreds of old tickets lying around at home so I went to the London Transport Museum's online collection which does. First I sampled 100 train tickets (mostly from the tube) and then 100 bus tickets. And this happened.

First digit 1 2 3 4 5 6 7 8 9

Train tickets 28% 11% 14% 11% 9% 9% 6% 4% 8%

Bus tickets 13% 12% 5% 12% 15% 11% 11% 13% 8%

Serial numbers on train tickets matched Benford's Law fairly well, with a lot more 1s than any other digit and not so many 7s, 8s and 9s. This may be because the tickets were in all kinds of historical formats, some with three digit serial numbers, some four, some five or even six. But the numbers on the bus tickets were spread out much more equally across the different digits at roughly 11% each. I think this is because bus ticket numbers were invariably four digits long and sequentially issued, in which case the law wouldn't apply.

Benford's Law isn't just a mathematical curiosity, it has a genuine use in the detection of fraud. Any accountant inventing numbers is likely to balance out their first digits rather than including lots more 1s, and a simple analysis of leading digits can be all it takes to catch them out.

If you're interested you could always read more about Benford's Law (at varying levels of mathematical difficulty) or view some contemporary datasets at testingbenfordslaw.com. Better still, do some practical investigating of your own. Pick a set of data, tally the first digits and see whether or not they fit the pattern. Likely you'll find a lot more 1s than 9s, just like Frank Benford did.

posted 07:00 :

<< click for Newer posts

click for Older Posts >>

click to return to the main page

...or read more in my monthly archives
Jan25 Feb25 Mar25 Apr25 May25 Jun25 Jul25 Aug25
Jan24 Feb24 Mar24 Apr24 May24 Jun24 Jul24 Aug24 Sep24 Oct24 Nov24 Dec24
Jan23 Feb23 Mar23 Apr23 May23 Jun23 Jul23 Aug23 Sep23 Oct23 Nov23 Dec23
Jan22 Feb22 Mar22 Apr22 May22 Jun22 Jul22 Aug22 Sep22 Oct22 Nov22 Dec22
Jan21 Feb21 Mar21 Apr21 May21 Jun21 Jul21 Aug21 Sep21 Oct21 Nov21 Dec21
Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20
Jan19 Feb19 Mar19 Apr19 May19 Jun19 Jul19 Aug19 Sep19 Oct19 Nov19 Dec19
Jan18 Feb18 Mar18 Apr18 May18 Jun18 Jul18 Aug18 Sep18 Oct18 Nov18 Dec18
Jan17 Feb17 Mar17 Apr17 May17 Jun17 Jul17 Aug17 Sep17 Oct17 Nov17 Dec17
Jan16 Feb16 Mar16 Apr16 May16 Jun16 Jul16 Aug16 Sep16 Oct16 Nov16 Dec16
Jan15 Feb15 Mar15 Apr15 May15 Jun15 Jul15 Aug15 Sep15 Oct15 Nov15 Dec15
Jan14 Feb14 Mar14 Apr14 May14 Jun14 Jul14 Aug14 Sep14 Oct14 Nov14 Dec14
Jan13 Feb13 Mar13 Apr13 May13 Jun13 Jul13 Aug13 Sep13 Oct13 Nov13 Dec13
Jan12 Feb12 Mar12 Apr12 May12 Jun12 Jul12 Aug12 Sep12 Oct12 Nov12 Dec12
Jan11 Feb11 Mar11 Apr11 May11 Jun11 Jul11 Aug11 Sep11 Oct11 Nov11 Dec11
Jan10 Feb10 Mar10 Apr10 May10 Jun10 Jul10 Aug10 Sep10 Oct10 Nov10 Dec10
Jan09 Feb09 Mar09 Apr09 May09 Jun09 Jul09 Aug09 Sep09 Oct09 Nov09 Dec09
Jan08 Feb08 Mar08 Apr08 May08 Jun08 Jul08 Aug08 Sep08 Oct08 Nov08 Dec08
Jan07 Feb07 Mar07 Apr07 May07 Jun07 Jul07 Aug07 Sep07 Oct07 Nov07 Dec07
Jan06 Feb06 Mar06 Apr06 May06 Jun06 Jul06 Aug06 Sep06 Oct06 Nov06 Dec06
Jan05 Feb05 Mar05 Apr05 May05 Jun05 Jul05 Aug05 Sep05 Oct05 Nov05 Dec05
Jan04 Feb04 Mar04 Apr04 May04 Jun04 Jul04 Aug04 Sep04 Oct04 Nov04 Dec04
Jan03 Feb03 Mar03 Apr03 May03 Jun03 Jul03 Aug03 Sep03 Oct03 Nov03 Dec03
Jan02 Feb02 Mar02 Apr02 May02 Jun02 Jul02 Aug02 Sep02 Oct02 Nov02 Dec02

Life viewed from London E3

» email me
» follow me on twitter
» follow the blog on Twitter
» follow the blog on RSS

» my flickr photostream

twenty blogs
our bow
arseblog
ian visits
londonist
broken tv
blue witch
on london
the great wen
edith's streets
spitalfields life
linkmachinego
round the island
wanstead meteo
christopher fowler
the greenwich wire
bus and train user
ruth's coastal walk
round the rails we go
london reconnections
from the murky depths

quick reference features
Things to do in Outer London
Things to do outside London
London's waymarked walks
Inner London toilet map
20 years of blog series
The DG Tour of Britain
London's most...

read the archive
Aug25 Jul25 Jun25 May25
Apr25 Mar25 Feb25 Jan25
Dec24 Nov24 Oct24 Sep24
Aug24 Jul24 Jun24 May24
Apr24 Mar24 Feb24 Jan24
Dec23 Nov23 Oct23 Sep23
Aug23 Jul23 Jun23 May23
Apr23 Mar23 Feb23 Jan23
Dec22 Nov22 Oct22 Sep22
Aug22 Jul22 Jun22 May22
Apr22 Mar22 Feb22 Jan22
Dec21 Nov21 Oct21 Sep21
Aug21 Jul21 Jun21 May21
Apr21 Mar21 Feb21 Jan21
Dec20 Nov20 Oct20 Sep20
Aug20 Jul20 Jun20 May20
Apr20 Mar20 Feb20 Jan20
Dec19 Nov19 Oct19 Sep19
Aug19 Jul19 Jun19 May19
Apr19 Mar19 Feb19 Jan19
Dec18 Nov18 Oct18 Sep18
Aug18 Jul18 Jun18 May18
Apr18 Mar18 Feb18 Jan18
Dec17 Nov17 Oct17 Sep17
Aug17 Jul17 Jun17 May17
Apr17 Mar17 Feb17 Jan17
Dec16 Nov16 Oct16 Sep16
Aug16 Jul16 Jun16 May16
Apr16 Mar16 Feb16 Jan16
Dec15 Nov15 Oct15 Sep15
Aug15 Jul15 Jun15 May15
Apr15 Mar15 Feb15 Jan15
Dec14 Nov14 Oct14 Sep14
Aug14 Jul14 Jun14 May14
Apr14 Mar14 Feb14 Jan14
Dec13 Nov13 Oct13 Sep13
Aug13 Jul13 Jun13 May13
Apr13 Mar13 Feb13 Jan13
Dec12 Nov12 Oct12 Sep12
Aug12 Jul12 Jun12 May12
Apr12 Mar12 Feb12 Jan12
Dec11 Nov11 Oct11 Sep11
Aug11 Jul11 Jun11 May11
Apr11 Mar11 Feb11 Jan11
Dec10 Nov10 Oct10 Sep10
Aug10 Jul10 Jun10 May10
Apr10 Mar10 Feb10 Jan10
Dec09 Nov09 Oct09 Sep09
Aug09 Jul09 Jun09 May09
Apr09 Mar09 Feb09 Jan09
Dec08 Nov08 Oct08 Sep08
Aug08 Jul08 Jun08 May08
Apr08 Mar08 Feb08 Jan08
Dec07 Nov07 Oct07 Sep07
Aug07 Jul07 Jun07 May07
Apr07 Mar07 Feb07 Jan07
Dec06 Nov06 Oct06 Sep06
Aug06 Jul06 Jun06 May06
Apr06 Mar06 Feb06 Jan06
Dec05 Nov05 Oct05 Sep05
Aug05 Jul05 Jun05 May05
Apr05 Mar05 Feb05 Jan05
Dec04 Nov04 Oct04 Sep04
Aug04 Jul04 Jun04 May04
Apr04 Mar04 Feb04 Jan04
Dec03 Nov03 Oct03 Sep03
Aug03 Jul03 Jun03 May03
Apr03 Mar03 Feb03 Jan03
Dec02 Nov02 Oct02 Sep02
back to main page

the diamond geezer index
2024 2023 2022
2021 2020 2019 2018 2017
2016 2015 2014 2013 2012
2011 2010 2009 2008 2007
2006 2005 2004 2003 2002

my special London features
a-z of london museums
E3 - local history month
greenwich meridian (N)
greenwich meridian (S)
the real eastenders
london's lost rivers
olympic park 2007
great british roads
oranges & lemons
random boroughs
bow road station
high street 2012
river westbourne
trafalgar square
capital numbers
east london line
lea valley walk
olympics 2005
regent's canal
square routes
silver jubilee
unlost rivers
cube routes
Herbert Dip
metro-land
capital ring
river fleet
piccadilly
bakerloo

ten of my favourite posts
the seven ages of blog
my new Z470xi mobile
five equations of blog
the dome of doom
chemical attraction
quality & risk
london 2102
single life
boredom
apr il fool

ten sets of lovely photos
my "most interesting" photos
london 2012 olympic zone
harris and the hebrides
betjeman's metro-land
marking the meridian
tracing the river fleet
london's lost rivers
inside the gherkin
seven sisters
iceland

just surfed in?
here's where to find...
diamond geezers
flash mob #1 #2 #3 #4
ben schott's miscellany
london underground
watch with mother
cigarette warnings
digital time delay
wheelie suitcases
war of the worlds
transit of venus
top of the pops
old buckenham
ladybird books
acorn antiques
digital watches
outer hebrides
olympics 2012
school dinners
pet shop boys
west wycombe
bletchley park
george orwell
big breakfast
clapton pond
san francisco
thunderbirds
routemaster
children's tv
east enders
trunk roads
amsterdam
little britain
credit cards
jury service
big brother
jubilee line
number 1s
titan arum
typewriters
doctor who
coronation
comments
blue peter
matchgirls
hurricanes
buzzwords
brookside
monopoly
peter pan
starbucks
feng shui
leap year
manbags
bbc three
vision on
piccadilly
meridian
concorde
wembley
islington
ID cards
bedtime
freeview
beckton
blogads
ecli pses
letraset
arsenal
sitcoms
gherkin
calories
everest
muffins
sudoku
camilla
london
cee fax
robbie
becks
dome
BBC2
paris
lot to
118
itv