Does it not feel good if your favourite party or political cause is leading in the polls? Well, beware. Data is just as prone to manipulation and misinterpretation as words, and possibly more so.
In the areas of interest to me — European politics and political economy — fake data has become a bigger problem than fake news because it gives policymakers and observers a false sense of security and leads to bad decisions. The most notorious example is former UK prime minister David Cameron, whose belief he could easily win the Brexit referendum was based on available polling data at the time. Pro-European centrists in Italy also have a tendency to misread the public mood.
The problem with statistical fake news is not the devil we know, like the infamous £350m-a-week claim on the Leave campaign’s bus in the Brexit referendum. It is the more subtly misleading stuff. Dagbladet Information, a Danish newspaper, last week carried the extraordinary story that the respected Eurobarometer polls are based on dubious sampling. The response rate in Germany, for example, is so low as to make any statistical inference questionable.
The paper quotes a statistician who believes that the sampling error produces a pro-European bias in the poll, and lures politicians into believing that support for the EU is stronger than it really is. The problem is not wilful manipulation, but incorrect inference. Brexit data misled not only Mr Cameron. It was the main factor behind the complacency of the Remain campaign. Remainers keep repeating that mistake even now, citing polls showing a persistent majority for their cause.
Economic data is also prone to being misread. If you had looked at average per capita income before the Brexit referendum, you would have missed the underlying economic tensions in British society. You would have needed to dig deeper to find that real disposable income had stagnated for most of the lower- and medium-income groups.
Statistical averages can be false friends. When Bill Gates walks into a pub, only a fool would conclude that the average income has gone up. But we keep making that same mistake all the time in situations that are less obvious.
Over-reliance on the wrong kind of data intrudes into all aspects of our lives. One of the most absurd statistics I have come across are UK school rankings. It is well known that high attainment scores are heavily distorted by factors such as the affluence of neighbourhoods and school selection policies.
But some parents still treat these statistics as objective truths. The worst part is that some schools manipulate their rankings by excluding low-performing children from public exams. In this case, statistics are not just misleading, but encourage immoral behaviour: the schools are failing in their principal task to educate all children for the sake of a ranking.
There may well be some valuable information in a school league table, just as there may be in an opinion poll. The best data is more often than not underneath the headline numbers, but this is not usually reported.
As consumers of published polls and other political statistics, we should not take data at face value but instead try to understand the margin of error, sampling methods, and adjustments made by polling organisations to the raw data. When analysing the data, we should beware lazy historical comparisons. There is no logical reason to think that the late surge in support for the Labour party in the 2017 general election will be repeated in 2019.
For all we know, this election might end up like 1987, when Margaret Thatcher won a landslide. Financial regulators also tend to make the mistake of thinking the next crisis will be like the last one.
So how should we deal with data and statistics in areas where we are not experts? My most important advice is to treat statistics as tools to help you ask questions, not to answer them. If you have to seek answers from data, make sure that you understand the issues and that the data are independently verified by people with no skin in the game.
What I am saying here is issuing a plea for perspective, not a rant against statistics. On the contrary. I am in awe of mathematical statistics and its theoretical foundations. Modern statistics has a profound impact on our daily lives. I rely on Google’s statistical translation technology to obtain information from Danish newspapers, for example.
Statistical advances allow our smartphone cameras to see in the dark, or a medical imaging device to detect a disease. But political data are of a much more uncertain quality. In political discussions, especially on social networks, statistics are used almost entirely to confirm political biases or as weapons in an argument. To the extent that this is so, you are better off without them.