Responsible innovation: the benefit of participatory research design in digital health technology
Just like other parts of society, the medical sector has high expectations of innovative health technologies, like artificial intelligence. Advanced analytics are used to quickly find significant patterns in large amounts of patient data that the human brain could not calculate at such speed. Such techniques are also known as ‘machine learning’ (ML). Unfortunately, ML is still challenged by some serious problems, such as training data that are not sufficiently representative of the patient population and an all-too-easy reduction of complex health issues to measurable signals. The method of Participatory Action Research offers some solutions.
We often consider digital health technology as a promising tool and a facilitating factor for better societal health outcomes with regards to prevention and treatment of diseases. But it is essential to highlight that, depending on the execution, digital health technology can increase health inequalities and generate more vulnerabilities (Stilgoe et al., 2013).
An example of current innovations in digital health technology is the application of machine learning (ML): A frequently debated technique used to process an abundance of information and turn it into behavioural patterns and insights, of which human brains are incapable. For this reason, data scientists have moved towards exploratory data-mining techniques to develop classification algorithms that can unravel new knowledge hidden in data. This is not the only thing that is hidden; this new technology will come with new challenges and potentially unforeseen societal implications.
We at imec-SMIT-VUB propose a systematic investigation into technological development that is designed to be ethical, accountable, transparent and trustworthy. One component of this investigation is participatory action research with the aim of including a critical and representative sample of health data. We believe this is necessary for responsible (AI) innovation.
“We at imec-SMIT-VUB propose a systematic investigation into technological development that is designed to be ethical, accountable, transparent and trustworthy. One component of this investigation is participatory action research with the aim of including a critical and representative sample of health data.” – Myriam Smitt et al.
The International Collaboration for Participatory Health Research already addresses the problem of health inequalities. But its importance has been amplified by the potential societal impact of newly developed technology. We want to reflect here on the challenges we encounter by implementing one such approach of stakeholder participation within innovation research into data-driven health technology. In this policy brief, we focus on the area of accountable algorithm design by use of a stress detection example.
Imec-SMIT-VUB currently contributes to the Nervocity study. The goal of this project is to research the phenomenon of stress in an urban context. Stress is a complex construct to measure. For instance, on top of genetic predispositions and ethnicity-related differences (Manrai et al., 2016), individual and subjective differences related to stress and health perception provide another layer of potentially misleading information that algorithms can’t easily account for. Stress is a highly individual state experienced in all layers of society. That is why it requires consistent and ongoing involvement of stakeholders in the development of stress detection technologies, from inception until implementation. If a stress detection algorithm is possible, it should be able to detect stress experiences from missing a deadline at work but also from not being able to make ends meet due to the current (Covid-19 virus) health crisis.
This bring us to three key recurring problems in digital health technology:
- Understanding the issue as a societal need – Why and how are we using technology?
- The variety of participants – Who should we include for measurement?
- Quality of the data – What are we actually measuring?
Logically, these questions are the foundation of every socio-technological enquiry, but their importance needs to be emphasized when someone’s health is determined by a range of numbers collected through sensors and wearables (Lupton, 2015). Not to mention that the classification of health depends on the autonomy of the device (how well and when does it measure?) and the human interaction with the type of technology.
One of the methods that can funnel the three aforementioned challenges in digital health technology, is participatory action research. Participatory action research (PAR) is a qualitative research method that captures stakeholders’ perspectives by taking note of their situation, agency, feelings, and views. The aim of such an approach is to promote collaboration and enacting social change that is beneficial to the involved parties (e.g. participants, stakeholders). In other words, it defines the values of the stakeholders and therefore solutions are sought and directed in their expressed needs. This exercise engages stakeholders (for instance, citizens or policy makers), to co-define the complexity of a societal problem and work together towards a constructive plan of action. This act of co-definition creates co-ownership which in turn creates a larger willingness to adopt the solution resulting from a PAR approach. This method is often applied in sociological and anthropological research fields, but is slowly gaining more popularity in technological (Ausloos, et al.) and policy research to foster responsible innovation.
Due to a growing reliance on algorithmic interpretation of data, the societal implications of a non- representative sample of participants are becoming more prominent (Cabitza et al., 2017). A well-known example is how Google Images has mislabelled people as gorillas due to an underrepresentation of people of colour in the training data they used to develop image recognition algorithms. For the Nervocity study, intelligent ML algorithms are able to detect stress; nevertheless, the quality of the training of the ML algorithms and the data-driven decisions that follow, depends on the data from the participants the algorithms are trained on (Taylor & Purtova, 2019). The Nervocity consortium recognised the challenge of skewed or biased data and decided to act upon this challenge. Measuring stress requires the use of the chill+ band, an imec wearable that needs to connect to a smartphone. This technical requirement poses inclusion issues, such as: should we neglect to measure stress for people without a smartphone? And this led to other questions around recruitment and self-selection bias. As a consequence, certain demographics will be underrepresented, resulting in further skewed socioeconomic demographics (Hidalgo et al., 2019, McAuley, 2014).
RESPONSIBLE PARTICIPANT SELECTION
The PAR method defined several demographics and characteristics of citizens who are prone to experience stress. A recurring problem in algorithm development is lack of diversity in the data collection, which is why we experimented with a sampling selection method that combined purposive selecting and quota sampling of the participants. First, we identified the demographics of the city where the study took place. Next, we identified all the types of stakeholders prone to experiencing stress, to guarantee information-rich cases. For marginalised groups in our stakeholder set, we organised participatory sessions to co-design a data collection method they were willing to trust and contribute to. With this sampling method we compared the candidates who expressed their interest in the study to the rate of saturation of their demographics and characteristics.
This plan functions as a means to improve the recruitment and therefore the type of participants representing the spectrum of age, education, living areas and gender data to enable a representative balanced dataset. Secondly, the aim is to retrospectively compare algorithmic accuracy and outcomes in relation to the included pool of participants and how the sample represents or deviates from the desired demographics we intended to include. With this method, more transparency allows for necessary modifications to the algorithm to prevent overfitting. Through early and adequate detection of imbalanced datasets, algorithms can be optimized and limitations can be communicated. As a result, the type of self-selection bias for this study we have identified so far has led to an overrepresentation of citizens who are highly educated (bachelor’s degrees and over), predominantly female (ratio about 1:2), with an age range between 30-39 years old.
Table 1: An example of the ratio between the representative sample requirements and the interested candidates for the Nervocity study depending on their education (representative sample:expressed interest). Every value of the expressed interest above 1 demonstrates an underrepresentation for that category.
Table 2: An example of the ratio between the representative sample requirements and the interested candidates for the Nervocity study depending on their location and gender (representative sample:expressed interest). Every value of the expressed interest above 1 demonstrates an underrepresentation for that category.
QUALITY OF THE DATA – WHAT ARE WE ACTUALLY MEASURING?
Sensor measurements have a tendency to reduce a complex phenomenon such as stress to variables that can be measured. In that situation, the interpretation of the sensor data is predominantly guided by the worldviews of the researchers and technology developers, without inclusion of other visions. Therefore we also had to deal with this next question: what do our stakeholders themselves believe causes stress? We used our participatory research method to include stakeholders’ views. In this way, stakeholders could define how they see the construct called ‘stress’ and the various variables included in it. This has led to valuable insights that go beyond state-of-the art knowledge in health research; insights into which aspects of the daily lives of citizens prone to stress we should measure, when to measure and how to measure.
As an example, in contrast to what could be expected, work-related stress is often defined as a reason for citizens experiencing stress, but the opposite is present as well; not having work causes stress. Our workshops and interviews with stakeholders have shown the sensitivities of vulnerable citizens compared to citizens with a higher socioeconomic status. This is something we have to take into account when we try to measure and gauge stress for the development of a stress detection algorithm.
The qualitative information derived from the PAR method does not only benefit stakeholders, but also influences the management and execution of the study. For instance, the researchers, project managers and the developers of health technology became more aware of how to include vulnerable participants (allocate more resources for participants with lower digital literacy), of the variety of participants (who should we include in the study?), of practical questions regarding the data collection (what do we need to know from them?) and the data processing (what are meaningful participant’s experiences?).
To obtain desired and ethical societal changes in digital health technology research, PAR methods should be more often implemented in digital health technological research.
Overall, there are many new emerging and promising technologies. For now, we have mostly addressed AI and ML, nonetheless responsible innovation applies to all health technologies. To conclude, trustworthy and ethical technology should be a product of collaboration of all relevant stakeholders. Even though the collaboration of multi-stakeholders throughout the development process is a time-consuming procedure, the scoping of digital health innovation for and with the stakeholders whilst correcting for potential negative social impact of technology is indispensable.
Ausloos, J., Heyman, R., Bertels, N., Pierson, J., & Valcke, P. (2018). Designing-by-Debate: A Blueprint for Responsible Data-Driven Research & Innovation. In F. Ferri, N. Dwyer, S. Raicevich, P. Grifoni, H. Altiok, H. T. Andersen, Y. Laouris, & C. Silvestri, Responsible Research and Innovation Actions in Science Education, Gender and Ethics (pp. 47–63). Springer International Publishing. https://doi.org/10.1007/978-3-319-73207-7_8
Cabitza et al. – 2017—Unintended Consequences of Machine Learning in Med.pdf. (n.d.).
Hidalgo, A., Gabaly, S., Morales-Alonso, G., & Urueña, A. (2020). The digital divide in light of sustainable development: An approach through advanced machine learning techniques. Technological Forecasting and Social Change, 150, 119754. https://doi.org/10.1016/j.techfore.2019.119754
Hidalgo et al. – 2020—The digital divide in light of sustainable develop.pdf. (n.d.).
Marchiori, D. M., Mainardes, E. W., & Rodrigues, R. G. (2019). Do Individual Characteristics Influence the Types of Technostress Reported by Workers? International Journal of Human–Computer Interaction, 35(3), 218–230. https://doi.org/10.1080/10447318.2018.1449713
Lupton, D. (2015). Health promotion in the digital era: A critical commentary. Health Promotion International, 30(1), 174–183. https://doi.org/10.1093/heapro/dau091
Marchiori et al. – 2019—Do Individual Characteristics Influence the Types .pdf. (n.d.).
McAuley, A. (2014). Digital health interventions: Widening access or widening inequalities? Public Health, 128(12), 1118–1120. https://doi.org/10.1016/j.puhe.2014.10.008
McAuley—2014—Digital health interventions widening access or w.pdf. (n.d.).
Stilgoe, J., Owen, R., & Macnaghten, P. (2013). Developing a framework for responsible innovation. Research Policy, 42(9), 1568–1580. https://doi.org/10.1016/j.respol.2013.05.008
Taylor and Purtova—2019—What is responsible and sustainable data science.pdf. (n.d.).
Taylor, L., & Purtova, N. (2019). What is responsible and sustainable data science? Big Data & Society, 6(2), 205395171985811. https://doi.org/10.1177/2053951719858114
20201001 Policy Brief 40-3