diamond geezer

 Monday, April 20, 2020

Far more numbers start with the digit '1' than the digit '9'.

The first person to notice this counter-intuitive result was the American astronomer Simon Newcomb, who in 1881 observed that the pages at the front of his book of logarithms were tattier than those at the back.

The phenomenon was properly investigated by the physicist Frank Benford in 1938. He analysed 20 naturally occurring sets of data and noted a common pattern in their leading digits.

One of the sets of data he analysed was the area of 335 river basins. He also looked at the population of 3259 towns, all 308 numbers in a copy of Readers' Digest, the house numbers of 342 scientists and 1458 baseball records. In each case he noted the first digit of the numbers and came up with the following.

First digit123456789
Rivers, Area  31%16%11%11%7%9%6%4%5%
Population34%20%14%8%7%6%4%4%2%
Readers Digest34%19%12%8%7%6%5%5%4%
Addresses29%19%13%9%8%6%6%5%5%
Baseball33%18%13%8%8%6%5%5%5%

Benford averaged out all his results and discovered they related to a logarithmic function.

First digit123456789
Distribution30.1%17.6%12.5%9.7%7.9%6.7%5.8%5.1%4.6%

30% of numbers start with 1. Only 5% of numbers start with 9.

123456789

Today's it's called Benford's Law. In a naturally occurring set of data, the leading digit of a number is much likelier to be smaller than larger.

Let me attempt to sort-of explain why this works.

Imagine you live in a random house in this street. With nine houses, each starting digit is equally likely.

123456789

But add one more house and '1' is now twice as likely as the other starting digits.

12345678910

Add ten more houses and suddenly more than half of the numbers start with a 1!

1234567891011121314151617181920

Extending the street to 30, 40, 50... decreases the proportion of 1s each time, until at 99 the probabilities are equal again. But all of the next 100 houses start with 1, which means by house 200 the overall percentage has leapt up to an impressive 55½%. Add further houses and the proportion falls back, reaching equality at house 999, but after that it shoots up massively again through the 1000s.

If you could draw a graph, it would look something like this.



Of course most streets don't have 1000 houses, let alone 10000, but the principle remains the same. The street you live in could be of any length, and in most streets '1' is much more likely than any other leading digit.

I thought I'd have a go at extracting some data myself. I found an old copy of the East London phone book and looked up the house numbers of 100 consecutive people, starting with Mr Benford at 8 Exning Road.

Here are those 100 house numbers rearranged by leading digit.

 1 1, 10, 11, 11, 11, 12, 13, 14, 14, 14, 14, 14, 14, 16, 17, 17, 17, 17, 18, 19, 105, 109, 112, 115, 119, 125, 129, 133, 141, 144, 145, 156, 157, 163, 166, 185, 186, 195, 140139%
22, 2, 20, 22, 23, 23, 25, 27, 27, 27, 28, 28, 29, 29, 29, 29, 228, 244
18%
33, 30, 34, 36, 3035%
44, 4, 42, 43, 44, 47, 47, 488%
55, 5, 5, 51, 52, 52, 577%
661, 62, 64, 65, 695%
77, 7, 7, 7, 7, 7, 72, 78, 78, 70410%
88, 8, 8, 82, 895%
99, 90, 97, 984%

1 is by far the most popular leading digit, thanks to a lot of houses in the teens and a lot more in the hundreds. A single house in the thousands (on Westferry Road) also makes an appearance. House numbers in the twenties are also very popular, but there are rather fewer in the 30s and 40s so the frequency of the leading digits starts to fall. The results don't perfectly match Benford's Law because my sample was small, and because most house numbers are fairly low, but they do fit the general overall pattern.

Next I decided to investigate the populations of the 69 cities in the UK. Here are those populations rearranged by leading digit.

 1 1841, 10536, 14777, 16702, 18766, 18808, 107524, 107877, 116595, 117773, 120165, 121688, 123867, 132512, 138375, 140202, 140644, 145736, 151145, 151906, 153990, 168310, 183631, 189120, 198051, 109233038%
220256, 26795, 29946, 205056, 219396, 233933, 236882, 239023, 248752, 249008, 249470, 256384, 256406, 273369, 275506, 28017723%
33355, 32219, 34790, 305680, 316915, 325837, 329839, 333871, 34609013%
440302, 45770, 428234, 466415, 4687207%
558896, 503127, 522452, 5526986%
66030801%
77375, 79415, 7514854%
8888591%
991733, 93541, 94375, 987686%

Over a third of these city population figures start with the digit 1. They range from St Davids (1841) to Birmingham (1092330), with a heck of a lot of six-digit populations inbetween. Over half of the cities have populations starting with either 1 or 2 - a greater proportion than Benford's Law might suggest. But the numbers certainly drop away fast. Only Glasgow starts with 6. Only Bath starts with 8.

Importantly, Benford's Law doesn't apply to all kinds of statistical data. It works best when numbers are plucked from a continuum of values covering several orders of magnitude, for example measurements from a population or prices of goods. It doesn't work well for numbers issued in sequence, random numbers (e.g. lottery draws) or variables with constraints (e.g. years of birth). Also it doesn't matter what the units are, so for example lengths of rivers should work just as well in kilometres as in miles.

For one last experiment, I thought ticket numbers would be an interesting thing to check. I don't have hundreds of old tickets lying around at home so I went to the London Transport Museum's online collection which does. First I sampled 100 train tickets (mostly from the tube) and then 100 bus tickets. And this happened.

First digit123456789
Train tickets28%11%14%11%9%9%6%4%8%
Bus tickets13%12%5%12%15%11%11%13%8%

Serial numbers on train tickets matched Benford's Law fairly well, with a lot more 1s than any other digit and not so many 7s, 8s and 9s. This may be because the tickets were in all kinds of historical formats, some with three digit serial numbers, some four, some five or even six. But the numbers on the bus tickets were spread out much more equally across the different digits at roughly 11% each. I think this is because bus ticket numbers were invariably four digits long and sequentially issued, in which case the law wouldn't apply.

Benford's Law isn't just a mathematical curiosity, it has a genuine use in the detection of fraud. Any accountant inventing numbers is likely to balance out their first digits rather than including lots more 1s, and a simple analysis of leading digits can be all it takes to catch them out.

If you're interested you could always read more about Benford's Law (at varying levels of mathematical difficulty) or view some contemporary datasets at testingbenfordslaw.com. Better still, do some practical investigating of your own. Pick a set of data, tally the first digits and see whether or not they fit the pattern. Likely you'll find a lot more 1s than 9s, just like Frank Benford did.


<< click for Newer posts

click for Older Posts >>


click to return to the main page


...or read more in my monthly archives
Jan24  Feb24  Mar24
Jan23  Feb23  Mar23  Apr23  May23  Jun23  Jul23  Aug23  Sep23  Oct23  Nov23  Dec23
Jan22  Feb22  Mar22  Apr22  May22  Jun22  Jul22  Aug22  Sep22  Oct22  Nov22  Dec22
Jan21  Feb21  Mar21  Apr21  May21  Jun21  Jul21  Aug21  Sep21  Oct21  Nov21  Dec21
Jan20  Feb20  Mar20  Apr20  May20  Jun20  Jul20  Aug20  Sep20  Oct20  Nov20  Dec20
Jan19  Feb19  Mar19  Apr19  May19  Jun19  Jul19  Aug19  Sep19  Oct19  Nov19  Dec19
Jan18  Feb18  Mar18  Apr18  May18  Jun18  Jul18  Aug18  Sep18  Oct18  Nov18  Dec18
Jan17  Feb17  Mar17  Apr17  May17  Jun17  Jul17  Aug17  Sep17  Oct17  Nov17  Dec17
Jan16  Feb16  Mar16  Apr16  May16  Jun16  Jul16  Aug16  Sep16  Oct16  Nov16  Dec16
Jan15  Feb15  Mar15  Apr15  May15  Jun15  Jul15  Aug15  Sep15  Oct15  Nov15  Dec15
Jan14  Feb14  Mar14  Apr14  May14  Jun14  Jul14  Aug14  Sep14  Oct14  Nov14  Dec14
Jan13  Feb13  Mar13  Apr13  May13  Jun13  Jul13  Aug13  Sep13  Oct13  Nov13  Dec13
Jan12  Feb12  Mar12  Apr12  May12  Jun12  Jul12  Aug12  Sep12  Oct12  Nov12  Dec12
Jan11  Feb11  Mar11  Apr11  May11  Jun11  Jul11  Aug11  Sep11  Oct11  Nov11  Dec11
Jan10  Feb10  Mar10  Apr10  May10  Jun10  Jul10  Aug10  Sep10  Oct10  Nov10  Dec10 
Jan09  Feb09  Mar09  Apr09  May09  Jun09  Jul09  Aug09  Sep09  Oct09  Nov09  Dec09
Jan08  Feb08  Mar08  Apr08  May08  Jun08  Jul08  Aug08  Sep08  Oct08  Nov08  Dec08
Jan07  Feb07  Mar07  Apr07  May07  Jun07  Jul07  Aug07  Sep07  Oct07  Nov07  Dec07
Jan06  Feb06  Mar06  Apr06  May06  Jun06  Jul06  Aug06  Sep06  Oct06  Nov06  Dec06
Jan05  Feb05  Mar05  Apr05  May05  Jun05  Jul05  Aug05  Sep05  Oct05  Nov05  Dec05
Jan04  Feb04  Mar04  Apr04  May04  Jun04  Jul04  Aug04  Sep04  Oct04  Nov04  Dec04
Jan03  Feb03  Mar03  Apr03  May03  Jun03  Jul03  Aug03  Sep03  Oct03  Nov03  Dec03
 Jan02  Feb02  Mar02  Apr02  May02  Jun02  Jul02 Aug02  Sep02  Oct02  Nov02  Dec02 

jack of diamonds
Life viewed from London E3

» email me
» follow me on twitter
» follow the blog on Twitter
» follow the blog on RSS

» my flickr photostream

twenty blogs
our bow
arseblog
ian visits
londonist
broken tv
blue witch
on london
the great wen
edith's streets
spitalfields life
linkmachinego
round the island
wanstead meteo
christopher fowler
the greenwich wire
bus and train user
ruth's coastal walk
round the rails we go
london reconnections
from the murky depths

quick reference features
Things to do in Outer London
Things to do outside London
Inner London toilet map
20 years of blog series
The DG Tour of Britain
London's most...

read the archive
Mar24  Feb24  Jan24
Dec23  Nov23  Oct23  Sep23
Aug23  Jul23  Jun23  May23
Apr23  Mar23  Feb23  Jan23
Dec22  Nov22  Oct22  Sep22
Aug22  Jul22  Jun22  May22
Apr22  Mar22  Feb22  Jan22
Dec21  Nov21  Oct21  Sep21
Aug21  Jul21  Jun21  May21
Apr21  Mar21  Feb21  Jan21
Dec20  Nov20  Oct20  Sep20
Aug20  Jul20  Jun20  May20
Apr20  Mar20  Feb20  Jan20
Dec19  Nov19  Oct19  Sep19
Aug19  Jul19  Jun19  May19
Apr19  Mar19  Feb19  Jan19
Dec18  Nov18  Oct18  Sep18
Aug18  Jul18  Jun18  May18
Apr18  Mar18  Feb18  Jan18
Dec17  Nov17  Oct17  Sep17
Aug17  Jul17  Jun17  May17
Apr17  Mar17  Feb17  Jan17
Dec16  Nov16  Oct16  Sep16
Aug16  Jul16  Jun16  May16
Apr16  Mar16  Feb16  Jan16
Dec15  Nov15  Oct15  Sep15
Aug15  Jul15  Jun15  May15
Apr15  Mar15  Feb15  Jan15
Dec14  Nov14  Oct14  Sep14
Aug14  Jul14  Jun14  May14
Apr14  Mar14  Feb14  Jan14
Dec13  Nov13  Oct13  Sep13
Aug13  Jul13  Jun13  May13
Apr13  Mar13  Feb13  Jan13
Dec12  Nov12  Oct12  Sep12
Aug12  Jul12  Jun12  May12
Apr12  Mar12  Feb12  Jan12
Dec11  Nov11  Oct11  Sep11
Aug11  Jul11  Jun11  May11
Apr11  Mar11  Feb11  Jan11
Dec10  Nov10  Oct10  Sep10
Aug10  Jul10  Jun10  May10
Apr10  Mar10  Feb10  Jan10
Dec09  Nov09  Oct09  Sep09
Aug09  Jul09  Jun09  May09
Apr09  Mar09  Feb09  Jan09
Dec08  Nov08  Oct08  Sep08
Aug08  Jul08  Jun08  May08
Apr08  Mar08  Feb08  Jan08
Dec07  Nov07  Oct07  Sep07
Aug07  Jul07  Jun07  May07
Apr07  Mar07  Feb07  Jan07
Dec06  Nov06  Oct06  Sep06
Aug06  Jul06  Jun06  May06
Apr06  Mar06  Feb06  Jan06
Dec05  Nov05  Oct05  Sep05
Aug05  Jul05  Jun05  May05
Apr05  Mar05  Feb05  Jan05
Dec04  Nov04  Oct04  Sep04
Aug04  Jul04  Jun04  May04
Apr04  Mar04  Feb04  Jan04
Dec03  Nov03  Oct03  Sep03
Aug03  Jul03  Jun03  May03
Apr03  Mar03  Feb03  Jan03
Dec02  Nov02  Oct02  Sep02
back to main page

the diamond geezer index
2023 2022
2021 2020 2019 2018 2017
2016 2015 2014 2013 2012
2011 2010 2009 2008 2007
2006 2005 2004 2003 2002

my special London features
a-z of london museums
E3 - local history month
greenwich meridian (N)
greenwich meridian (S)
the real eastenders
london's lost rivers
olympic park 2007
great british roads
oranges & lemons
random boroughs
bow road station
high street 2012
river westbourne
trafalgar square
capital numbers
east london line
lea valley walk
olympics 2005
regent's canal
square routes
silver jubilee
unlost rivers
cube routes
Herbert Dip
metro-land
capital ring
river fleet
piccadilly
bakerloo

ten of my favourite posts
the seven ages of blog
my new Z470xi mobile
five equations of blog
the dome of doom
chemical attraction
quality & risk
london 2102
single life
boredom
april fool

ten sets of lovely photos
my "most interesting" photos
london 2012 olympic zone
harris and the hebrides
betjeman's metro-land
marking the meridian
tracing the river fleet
london's lost rivers
inside the gherkin
seven sisters
iceland

just surfed in?
here's where to find...
diamond geezers
flash mob #1  #2  #3  #4
ben schott's miscellany
london underground
watch with mother
cigarette warnings
digital time delay
wheelie suitcases
war of the worlds
transit of venus
top of the pops
old buckenham
ladybird books
acorn antiques
digital watches
outer hebrides
olympics 2012
school dinners
pet shop boys
west wycombe
bletchley park
george orwell
big breakfast
clapton pond
san francisco
thunderbirds
routemaster
children's tv
east enders
trunk roads
amsterdam
little britain
credit cards
jury service
big brother
jubilee line
number 1s
titan arum
typewriters
doctor who
coronation
comments
blue peter
matchgirls
hurricanes
buzzwords
brookside
monopoly
peter pan
starbucks
feng shui
leap year
manbags
bbc three
vision on
piccadilly
meridian
concorde
wembley
islington
ID cards
bedtime
freeview
beckton
blogads
eclipses
letraset
arsenal
sitcoms
gherkin
calories
everest
muffins
sudoku
camilla
london
ceefax
robbie
becks
dome
BBC2
paris
lotto
118
itv