Is all data, health data?

What is datafication and how does it impact our health?

While we embrace and promote the use of data, we should also attempt to understand the impact of that data. To know, is to be transformed. To have data is also to have the responsibility to understand and act upon it.

Our health tends to evoke the ultimate authority. Life and death tend to be serious matters that come with significant responsibility. If we have the data to save a life, do we not have a responsibility to use that data and save that life?

Is health thereby the ultimate gravity or end point for data? Is health the purpose that justifies the collection and use of any and all data?

This is a relevant question in a pandemic that empowers public health authorities, and yet still lacks sufficient data to make decisive or participatory action possible.

Data as a concept is a bit of a misnomer or distraction as data is what we make of it. For example data is rarely singular, with only one use, but instead can be repurposed, perhaps infinitely.

Hence the idea that all data is health data. That all data can be used to understand or engage our health. That even the most tangential or seemingly inconsequential can be tied back to our bodies and wellbeing.

That is one of the provocations found in a recent report that looks at health datafication.

From wearable tech and mobile apps to loyalty cards and online platforms, data gathered from almost every aspect of people’s digital lives increasingly reveals insights about their health.

The latest smartwatches offer step-counts and electrocardiograms, online search data can be used to infer dietary and physiological conditions, and the way a person interacts with their smartphone or behaves on social media can be proxies for their mental health status.

This is the ‘datafication’ of health, and it has profound consequences for who can access data about health, how we practically and legally define ‘health data’, and on our relationship with our own wellbeing and the healthcare system. Health information can now be inferred from non-health data, and data about health can be used for purposes beyond healthcare.

There’s an ongoing asymmetry in our relationship with data, or more importantly the relationship that data has with us. Further that relationship is dynamic, and often automatic.

At first I thought of writing “the relationship the people who have our data have with us”, yet that’s not accurate.

As they insist in their testimony and counter-narratives, the executives of the big tech platforms do not themselves have access to our data. Nor do they share that data with their clients. Rather that data is scrambled into machine learning models that act as intermediaries or symbiotic instantaneous connections that come and go with the whims of our clicks, attention, and constantly evolving data.

Data from various sources, health-related or not – like fitness tracking apps, advertising profiles, mapping tools – can come together to ‘hypernudge’ people’s behaviour: a runner might be notified that nearest protein-shake shop is just three minutes away. Or data-driven insights from across a network of online platforms – from social media to shopping – can be used in ‘digital phenotyping’ to make inferences about people’s physiological health. Those inferences could then be pooled into a credit score, without the customer’s awareness.

Datafication means health data is now ubiquitous: the array of networked devices that can record data about health occur across almost every aspect of people’s lives. Health data is also more comprehensive, able to record and reflect a full picture of a person in real time, all the time.

The ability to gather and use highly detailed data about specific individuals makes health data more personalised. And risk can be effectively calculated and scored thanks to the measurement-based nature of data about health, to the benefit of those with the data, not those who the data is about.

The result and effect of datafication is a kind of feedback loop that fuels further datafication.

As we’ve previously discussed, the myth of datafication is that it enables the pursuit of truth. Yet the paradox is that truth is never obtained. Instead the result is a desire for more data, believing that with more data truth can be grasped. Always within reach yet never obtained, an endless loop of potential, that justifies greater data and perpetual iteration of the system for the system’s sake.

Here’s a figure from the report that helps illustrate this:

In this diagram the authors of the report are making the argument that a consequence of datafication is that it makes more of the human legible, i.e. visible and scrutable. However my point is that said legibility is never enough. There’s always a desire for more, a justification for more. That’s the self-reinforcing logic of datafication.

Which doesn’t mean there isn’t benefits. For example:

Asymptomatic people who are infected with Covid-19 exhibit, by definition, no discernible physical symptoms of the disease. They are thus less likely to seek out testing for the virus, and could unknowingly spread the infection to others.

But it seems those who are asymptomatic may not be entirely free of changes wrought by the virus. MIT researchers have now found that people who are asymptomatic may differ from healthy individuals in the way that they cough. These differences are not decipherable to the human ear. But it turns out that they can be picked up by artificial intelligence.

In a paper published recently in the IEEE Journal of Engineering in Medicine and Biology, the team reports on an AI model that distinguishes asymptomatic people from healthy individuals through forced-cough recordings, which people voluntarily submitted through web browsers and devices such as cellphones and laptops.

The researchers trained the model on tens of thousands of samples of coughs, as well as spoken words. When they fed the model new cough recordings, it accurately identified 98.5 percent of coughs from people who were confirmed to have Covid-19, including 100 percent of coughs from asymptomatics — who reported they did not have symptoms but had tested positive for the virus.

The team is working on incorporating the model into a user-friendly app, which if FDA-approved and adopted on a large scale could potentially be a free, convenient, noninvasive prescreening tool to identify people who are likely to be asymptomatic for Covid-19. A user could log in daily, cough into their phone, and instantly get information on whether they might be infected and therefore should confirm with a formal test.

That’s a significant development, which if accurate, could make testing for the virus as easy as people coughing into a device. That strikes me as a far more relevant and valuable means of containing the pandemic than vaccines (which I’m in favour of, but am skeptical of how effective they will be against coronaviruses).

However am I being paranoid in thinking that our cough data could be used for post-pandemic purposes? Or that our cough data could be mined and analyzed in ways we can’t anticipate?

Even if I am paranoid, it seems prudent that we think about what the health datafication report argues, which is that we discuss the impact of health data and the kind of protections or regulations we need to govern it.

This is the larger question of what kind of society do we want to live in. As health data is not just what all data becomes, but provides a model by which we govern in general. How we approach our health is how we approach our lives.

In this context, there’s already a growing range of concerns and critiques that surround datafication.

On the one hand there’s the false hierarchy and status that datafication fosters. However on the other hand, there’s the silencing effect of datafication. That we revert to a tyranny of only believing in that which has a data trail, and anything else does not exist.

A lot of the leading and critical research has focused on children, who are in many respects the frontier of (health) datafication.

Although I’m also interested and actively engaged in the datafication of forests.

However the larger conclusion we can take from the field of datafication studies in general, is that it exacerbates and polarizes existing divisions.

In no small part due to the way in which it celebrates quantification and ignores anything that cannot be quantified.

Perhaps this is the meta imperialism of our times, the conquest of everything by digital media, and the attempt to convert all things into data.

Yet that’s where history reminds us that not everything can be conquered, and not everything that was conquered will remain so for very long.

Which is why even as we benefit from data, we should nurture and value that which is not. Although attempting to understand what that might be is a paradox unto itself.