The surprisingly useful mathematical patterns in some real-world data


“Look at stock market prices and you might see the model …”
Images Muhla1 / Getty
If you had to look at the first page of a newspaper, you would probably notice that it contains many figures: money amounts, population sizes, length or area measurements. If you have removed all these figures and put them in a list, you would have a collection of random numbers.
But these figures would not be as random as you think. In real world data, such as cash totals or buildings, the first number of a given number is surprisingly likely to be 1. If the figures were really random, about 1/9 would start with 1, but in practice, it often looks more like a third party. The figure 9 is the least likely to show the way, occurring approximately 1 / 20th of the time, and the other figures follow a curve between them.
This model, known as Benford’s law, is a commonly observed distribution of the first figures in certain types of data sets – in particular those where the values are taken from a large unpertified range. You do not see that it happens with things like human heights (where the figures are all in a small beach) or dates (where there are restrictions on the values that the number can take).
But if you have asked a group of people to check the amount of money on their bank account, or give their house number, or search for stock market prices (photo), you could see the model – these are all figures that could expand on several orders of magnitude. Some streets have only a few houses, while others have hundreds. This is why the phenomenon occurs.
Imagine a street with nine properties: the proportion of house numbers starting with each figure would be a division equal to nine lanes. But in a street with 19 houses, more than half start with 1. These two extremes occur while we increase the number of houses: with 100, there are an almost equal number of each initial figure; Jump that at 200 and, once again, half of them start with 1.
Since each data element in the real world comes from an unknown size set, the average probability of a number starting with 1 ends up being somewhere between these two values. Similar calculations can be made for other figures, which gives us the overall frequency with which each appears. The effect is most visible in large data collections.
One of the reasons is useful is that it gives you an index when the data has been faked. If you are looking at a set of corporate accounts, you expect to find Benford type distributions in sales figures. But if someone has made data by choosing random numbers, when you draw the frequencies of the first figures, it will not have the characteristic curve. This is a trick that forensic accountants use to detect a suspicious activity.
So, the next time you check your accounts or compare the lengths of the rivers, keep an eye on the number of figures start with 1 – You could simply locate Benford’s law in action!
Katie Steckles is a mathematician, lecturer, youtuber and author based in Manchester, in the United Kingdom. She is also an advisor for the New Scientist puzzle column, Brentwister. Follow it @steck
For other projects, visit Newscientist.com/maker
Subjects:


