If we want a sporting chance of using data to make insightful decisions in healthcare, how do we ensure it’s of the best quality?

James Friend, Programme Director for the London Health Data Strategy, reflects on how sport can show us the importance of good quality data.

There has been much discussion around how the use of data to support decision making can actively drive up the quality of the data itself. In many fields we are familiar with different drivers of data quality – for example today, not a football, rugby or cricket match goes by without the data being tracked – how many metres has a player run is common metric of their productivity. Those measurements have been automated by adding active tracking devices to the players themselves – but even so there are incentives and disincentives associated; if a player receives constructive feedback from the data insights that they can understand and act upon then they might study subsequent insights to see how they are getting on. However, if team selections started to be made predominantly on the data points themselves then it is important that the players have faith that data is measuring the right things and is accurate.

The idea of data being accurate is one of the factors associated with what we refer to as good quality data. With the range of data being entered growing all the time and leading to decision making, does it matter whether data quality is acceptable? Or will AI and analytics sort the data wheat from the chaff allowing the insights to emerge from the patterns truly seen? Viewing the patterns in good quality data from the right depth will allow appropriate insight and feeding those insights back to the inputter of the data will help validate the data and the insight.

But often data is not at the right quality because of errors within it. How we improve data quality depends on the source of any original error. The first challenge is to identify, understand and then resolve the root causes of the data errors.

In observing operational data input across commercial and public sector environments over many years, there appear to be at least eight different sources of potential error in data quality. The root causes to be resolved will differ in each case:

Tool Use – did the person measuring the data correctly use the tool that gave the measurement?
Calibration – was the tool used to generate the measurement correctly set up?
Reading – did the person recording the data read or measure the quantity correctly?
Input – did the person recording the data enter the quantity correctly?
Scale – was the data measured in the same units as it was recorded?
Recording – did the data point actually get captured into the data set?
Assignment – has the data point been recorded against the correct event or person?
Processing – has the data point been inappropriately changed in its processing?

User optimism, pessimism and environmental factors might play into many of those potential errors.

Whilst user training, communication, observation and monitoring might reduce the risk of some of these potential errors, these actions alone are unlikely to eliminate them entirely.

One way to reduce the risk of errors is investing in a system where measurement devices seamlessly upload data to the core datasets without human interaction. This would make the right thing, the easiest to do.

In the sports analogy above this would mean the coach and manager could have faith that the decisions they were making about players were based on accurate and timely information, specific to that player. If we can see that such investments lead to better leadership decision making using data at scale then why wouldn’t we want use this lesson for healthcare as well?