The power of digital health: what can we learn from one million posts?
Digital health is sexy. Last year our take on the eight technologies that will change health and care was the most popular piece on our website and we continue to support the NHS to engage through our Digital Health and Care Congress and other means.
But there is far more to digital health than how it is used in the NHS. For example, early last year Public Health England’s ‘Sugar Smart’ app was leading app download charts and had been downloaded more than one million times.
For me, one of the most exciting areas is whether and how we can gain insight from unstructured conversations in open chatrooms and forums. This rich ‘third space’ is where millions of people are already discussing their health and reflecting on their interaction with services. There is therefore enormous potential to understand the depth and detail of experience, a new form of mass insight unconstrained by the hierarchy and time constraints of doctor–patient communication or the structure of formal research processes such as questionnaires or focus groups.
The Fund has been involved in a small pilot project in this space, which links two of the eight technologies we highlighted last year: peer-to-peer networks and machine learning. This project has been funded by the Wellcome Trust and led by Demos – the cross-party think-tank – and the University of Sussex through their Centre for the Analysis of Social Media (CASM). The final report is out today.
The project sought to test whether machine learning and natural language-processing software can be applied to health issues. To do this we adapted existing CASM software to look at experience of one health area, mental health. We analysed more than one million posts from more than 47,000 users who posted on six open forums between June 2004 and May 2016. All these posts were visible to the public, and the forums did not require a username or password to access.
We tested the software’s ability to distinguish and categorise three types of information from this data:
‘cries for help’ in times of crisis
experience with specific treatments, eg, cognitive behavioural therapy
the relationship between mental and physical health across three areas: respiratory conditions, diabetes and musculo-skeletal conditions.
While the ability of the technology to do this varied, we could identify, count and correlate instances across the sample, and further identify very rich and meaningful accounts of experience in all of the areas above.
This is one of the first times to our knowledge that unstructured health data in a highly complex and nuanced health area has been collected and classified in this way. In the longer term, this method could be used to:
allow owners of forums to better understand the topics and issues discussed and to tailor possible service offers such as self-management
help NHS and other service providers develop a better, deeper and more truthful understanding of users’ experience of services and more thoughtful design in response
give health regulators access to additional insight about organisational performance and safety.
The information in these posts has not been designed to answer specific questions. This lends it enormous strengths, especially in that it is a source of unbiased, unguarded, full and complex accounts. But this strength can also be a weakness. The data’s lack of built-in focus means that it required careful and detailed interpretation, and this process is context-specific and value-laden. The sensitivity and specificity with which information is categorised also needs to be improved.
NHS decision-makers we talked to saw big potential in this sort of analysis, though were also well aware of the pitfalls. For example, forum users are self-selected and some demographic groups are less likely to be included in online analysis. They were also keenly interested in the ethics of undertaking work like this. This study received ethical clearance from the University of Sussex, and we were careful to follow guidelines in retrieving, storing and handling this data. But as this technique develops we need to ensure that existing research ethics and codes developed for traditional health research are fit for purpose for this new form of knowledge.
In conclusion, we are only in the foothills of applying machine learning to complex health issues and there are many technical and ethical hurdles to overcome. This study has demonstrated that we can identify, understand and construct wider meaning from millions of complex and unstructured online conversations about important issues that affect our health.