key: cord-0051868-3bpgk6z9 authors: Evans, Elizabeth A.; Delorme, Elizabeth; Cyr, Karl; Goldstein, Daniel M. title: A qualitative study of big data and the opioid epidemic: recommendations for data governance date: 2020-10-21 journal: BMC Med Ethics DOI: 10.1186/s12910-020-00544-9 sha: f75d9c7547c1f0384da7a3113ec137ef3c6e812b doc_id: 51868 cord_uid: 3bpgk6z9 BACKGROUND: The opioid epidemic has enabled rapid and unsurpassed use of big data on people with opioid use disorder to design initiatives to battle the public health crisis, generally without adequate input from impacted communities. Efforts informed by big data are saving lives, yielding significant benefits. Uses of big data may also undermine public trust in government and cause other unintended harms. OBJECTIVES: We aimed to identify concerns and recommendations regarding how to use big data on opioid use in ethical ways. METHODS: We conducted focus groups and interviews in 2019 with 39 big data stakeholders (gatekeepers, researchers, patient advocates) who had interest in or knowledge of the Public Health Data Warehouse maintained by the Massachusetts Department of Public Health. RESULTS: Concerns regarding big data on opioid use are rooted in potential privacy infringements due to linkage of previously distinct data systems, increased profiling and surveillance capabilities, limitless lifespan, and lack of explicit informed consent. Also problematic is the inability of affected groups to control how big data are used, the potential of big data to increase stigmatization and discrimination of those affected despite data anonymization, and uses that ignore or perpetuate biases. Participants support big data processes that protect and respect patients and society, ensure justice, and foster patient and public trust in public institutions. Recommendations for ethical big data governance offer ways to narrow the big data divide (e.g., prioritize health equity, set off-limits topics/methods, recognize blind spots), enact shared data governance (e.g., establish community advisory boards), cultivate public trust and earn social license for big data uses (e.g., institute safeguards and other stewardship responsibilities, engage the public, communicate the greater good), and refocus ethical approaches. CONCLUSIONS: Using big data to address the opioid epidemic poses ethical concerns which, if unaddressed, may undermine its benefits. Findings can inform guidelines on how to conduct ethical big data governance and in ways that protect and respect patients and society, ensure justice, and foster patient and public trust in public institutions. use disorder (OUD) [7, 8] . In 2017, Massachusetts had an estimated 2,056 opioid overdose deaths [9] . For every fatal overdose, there are about 12 nonfatal overdoses [8, 10] ; in Massachusetts, this translates into nearly 65,000 nonfatal overdoses. To address the opioid problem, Massachusetts is using a nationally recognized multi-sectoral public health model [11, 12] . Critical to the Massachusetts' opioid response is the establishment of the Public Health Data (PHD) Warehouse [8, [13] [14] [15] . The PHD Warehouse was created in August 2015 via legislative mandate that empowered the Massachusetts Department of Public Health (MDPH) to monitor opioid-related overdose events [8, 14] . The mandate allows for individual-level linkage of administrative datasets from MDPH and other state agencies. Today, the PHD Warehouse encompasses information from more than twenty sources on all Massachusetts residents aged 11 and older with public or private health insurance, covering > 98% of the state's population [16] . Furthermore, MDPH uses a secure analytic environment to provide access to de-identified data for research purposes [13] . Described in detail elsewhere [13] [14] [15] , innovative solutions were used to protect data privacy, even beyond the standards set by federal and state law, and to create mechanisms for data sharing. Studies conducted with PHD data over the past five years have been critical to documenting the causes and consequences of the opioid epidemic [7, [17] [18] [19] [20] [21] [22] [23] [24] .The PHD Warehouse is a groundbreaking essential resource for conducting data-driven public health surveillance, resource allocation, intervention planning, and innovative research [25] [26] [27] . From an ethical perspective, the PHD Warehouse raises new, and mostly unconsidered issues. While existing administrative data has been used for more than two decades to study addiction treatment outcomes and costs [28] [29] [30] [31] [32] , the PHD Warehouse is different in several ways. First, it encompasses most of the Massachusetts adult population, not only individuals with addiction who consented to research. Also, the PHD Warehouse was developed by state mandate, establishing it as a potential public health resource. It was created rapidly as aided by technological innovations and within an emergency response context [13] [14] [15] . Finally it was developed without adequate knowledge or input from the general population or people who have been most impacted by the opioid epidemic, i.e., people with OUD and their family or friends. Ethics research on biomedical big data warns that although such data may save lives, when the affected population is excluded from data governance, efforts may also be experienced as harmful, and undermine public trust in government [33] [34] [35] [36] . For example, the affected population may perceive big data as infringing on privacy [36] [37] [38] , perpetuate biases, and be unjust [36, 39, 40] . As Massachusetts works to sustain the PHD Warehouse, and as other states seek to assemble and manage big data on opioid use, guidelines are needed on how to conduct ethical big data governance. We address this gap by exploring stakeholder concerns and perceptions of strategies for uses of big data on opioid use. We conclude by discussing recommendations for future big data governance. We drew on the Kass Public Health Ethics Framework [41] to develop the project. This framework specifies that public health officials should communicate with and involve constituent communities, along with experts, to understand the benefits and risks of strategies to address public health threats. Within this context, we solicited perspectives on the benefits and harms of big data on opioid use as perceived by key stakeholder groups: researchers who conduct analysis of big data on opioid use, gatekeepers of these data, and patient advocates. We interviewed 39 key informants. Researchers were recruited from those who had utilized the PHD Warehouse (i.e., biomedical researchers, clinician-researchers, epidemiologists, data scientists), with priority given to authors of peer-reviewed publications. Big data gatekeepers (i.e., data managers, regulatory specialists, legal counsel, ethicists whose position entails gatekeeper duties) were recruited from MDPH staff who manage the PHD Warehouse and also from local agencies that create county-level big data repositories on opioid use. Patient advocates were recruited from community forums held by peer-led support networks for parents and families coping with opioid overdose. Individuals were invited to participate via flyers distributed at public meetings and direct email outreach. A semi-structured 1:1 interview (n = 13) or focus group (n = 4 groups; 2-10 participants per group totaling 26 individuals) was conducted in-person or by teleconference, after which participants completed a socio-demographic questionnaire. Data collection was conducted separately with researchers, gatekeepers, and advocates. The discussion guide included the following topics and prompts. (1) Protect and respect patients and society: What are the concerns regarding using big data on opioid use? To what extent and how should we involve people with OUD and allies in big data governance? (2) Ensure justice: How do we ensure that big data does not further privilege certain groups, or widen existing disparities? Which research topics, questions, or methods should be "off limits"? What phenomena shape opioid use but are not captured in big data? (3) Foster patient and public trust in public institutions: How might big data undermine or strengthen relationships between individuals and public institutions? How do we ensure that the potential harms of big data are outweighed by its benefits? When participants hesitated or expressed uncertainty or disagreement, the facilitator invited participants to share their thoughts, indicating that it was appropriate to disagree with the assumptions of the prompt, and asked probing follow-up questions. Data were collected in March-December 2019. Each discussion lasted 1.0-2.0 h and was held privately either in-person or by video-conference. Participants were compensated $100. To maintain confidentiality, participants were assured that findings would be anonymized. Interviews were digitally recorded, professionally transcribed, and transcripts were reviewed for accuracy. All procedures were approved by the University of Massachusetts Institutional Review Board. Using thematic analysis [42, 43] , the research team reviewed transcripts and developed codes and their definitions. Two research staff coded each transcript independently in ATLAS.ti (ATLAS.ti 2020, Version 8), and then met with the Principal Investigator to compare and refine their codes, definitions, and themes, and resolve minor discrepancies regarding the relative salience of themes through consensus-building. Each team member identified major themes inductively, identifying analytical categories from the data. The team examined patterns within and across the transcripts and grouped consistent responses along with illustrative quotations. The entire research team reviewed the resulting summary of themes. We examine data from a non-random convenience sample (Table 1 ). More researchers and gatekeepers than advocates had direct experience with big data. However, most participants referenced the broader context of living in an "information era" in which personal data are routinely collected about individuals without explicit knowledge or consent. Participants observed that when institutions inadequately inform individuals about data uses, it engenders feelings of disrespect, inequity, and distrust. In this section, first we summarize participants' concerns about big data on opioid use and then we present their ideas on the pros and cons of different strategies for big data use. Participants' concerns about big data on opioid use mostly focused on respect for persons and potential individual-level harms. Big data links together information about individuals as provided by previously distinct data systems. Participants were provided with accurate information about administrative data linkage and the privacy preserving methods that are typically employed, after which participants were asked to share potential concerns. Participants feared that big data could be misused by government or other institutional actors for "bad intentions, " i.e., in ways that infringe on privacy or While most participants understood that big data were anonymized and bound by other safeguards designed to preclude individual-level harms, some nevertheless worried that these data could be used to deny health insurance claims or use of social welfare programs, jeopardize employment, threaten parental rights, or increase criminal justice surveillance, prosecution, and incarceration. Others focused on the potential limitless lifespan of big data which could "permanently mark" individuals as having OUD and thus result in lifelong negative impacts. Furthermore, researchers, many of whom were also clinicians, observed that patient beliefs that these harms could occur, however unlikely, would deter some from seeking healthcare altogether. Participants reported that individuals with OUD generally are not aware of the existence of big data and were concerned that exclusion of patients from big data deliberations further diminishes the connection between public health professionals and the public. Individuals with OUD were perceived to be more vulnerable to data misuse yet among those least likely to do anything about it. Finally, one gatekeeper observed that "…the benefits and harms [of big data] do not accrue to the same person, " drawing attention to the injustices of when those individuals who contribute their information to big data do not themselves directly benefit. For these and other reasons, participants felt it was critical that individuals be informed about potential big data uses, that strict safeguards are observed, and that processes are instituted to ensure that data are used to benefit people with OUD. Other concerns pertained to the potential for big data to be misused in ways that increase health inequities. Participants noted how some datasets in the big data warehouse come with significant limitations and "baked in bias, " such as ill-defined variables or omitted phenomena, missing data, and an uncertain causal ordering of events. One gatekeeper observed that the opioid epidemic has resulted from systemic inequities yet, given data limitations, we do not examine or address conditions that enable the epidemic, a problem that ultimately contributes to continued health disparities. Participants cautioned that if big data limitations are ignored or mishandled, then results could be incorrect or misinterpreted. Participants also highlighted how big data could exacerbate community-level health and social inequities. For example, geographical hotspot maps of opioid overdoses identify certain communities as being especially hard hit by the opioid epidemic. This attention could have negative economic impacts, yet participants felt that with appropriate safeguards the knowledge gained was worth potential community-level harms. One researcher said about hotspot maps, "…it shines a light on public health needs and perhaps can help to enforce the direction of resources to curtail that problem…if you don't shine a light on it, things will continue the same and…more people will die…so, I'd rather shine a light on the truth of what's happening, but with the hopes that intervention can follow, such that it doesn't become an ongoing perennial prob-lem…. " Big data's ability to yield otherwise unavailable insights into place-based OUD prevalence and harms was thought to be critical for avoiding preventable morbidity and premature mortality. Participants noted that public mistrust of big data is created when it operates outside the awareness of the people being studied. An advocate said, "I think it's just a matter of trust. Like when you look at the things that happened with Facebook…[people] were doing some unethical things in collecting data and who they were sharing that data with and not telling people that that data was being collected…that's the basis for this mistrust is that it's just this passive thing. You don't even know it's happening…And I share people's con-cerns…[in] this information era, this isn't just going to magically disappear when we solve this [opioid] problem. This information is still all going to be sitting somewhere. So, how do we make sure…that we don't misuse it moving forward?". Others highlighted how public mistrust is worsened when the involved institutions do not adequately interact with individuals to inform and determine how data are collected, managed, and used. Participants' big data concerns center on potential privacy infringements due to increased profiling and surveillance capabilities, limitless lifespan, and lack of explicit informed consent. Also problematic is the inability of affected groups to influence how big data are used, the potential of big data to increase stigmatization and discrimination of affected communities and groups despite data anonymization, and uses that ignore or perpetuate biases. Next, we present participants' consideration of data use strategies as mapped to three broad topics, i.e., ideas for using big data in ways that (1) protect and respect patients and society, (2) ensure justice, and (3) foster patient and public trust in public institutions. We asked participants for ideas on how to use big data in ways that protect and respect both individuals and also society. Participants discussed perceived pros and cons of better-informed consent processes, but most suggestions emphasized the value of community advisory boards. Participants generally recognized the need for better communication about the purpose and uses of big data. Some focused on adapting consent forms to make it much clearer that if the individual agrees, their data will be added to a big data warehouse. Others, however, felt that while more explicit opt-in procedures might create more informed populations, it might also cause significant selection bias which could, in turn, potentially impede the ability of science to benefit vulnerable populations. Others considered whether individuals should be given opportuntities to review their records after being included in big data, or even change or delete data after the fact. Participants felt this option would be impracticable and could lead to unexpected harms. For example, a researcher recalled his clinical experiences to illustrate how doing so could jeopardize the ability of science to determine the truth, saying: Participants highlighted how the nature and potential risks of big data research are different from clinical trials, explaining that data are gathered whether analyzed for research or not, which contributes to why big data warrants different types of protections and consent processes. In considering whether individuals should be able to review and alter records, one data researcher said, "…in a randomized trial…if you don't want to be exposed to an intervention or randomized anymore, that is 100% your right. I should be able to say, 'I'm done right now. I don't want to do this anymore, ' and those are clearly very important protections to have. But when we're looking at observational data, it already exists. However I've decided to label you, as someone with heart disease or someone with opioid use disorder or someone with diabetes, that just exists…it's already been observed. We're not altering anything, and…if you…say, 'I don't want to be seen as someone with opioid use disorder, ' I understand… but we label things so that we can get answers. So, I think that the biggest con is that we get then even messier data and can't actually answer questions for people. " Participants equated big data on opioid use with other types of public health records that are used to monitor emerging epidemics, improve healthcare, and protect population health. Participants felt that consent processes should be different for big data that are used for public health rather than for commercial purposes. In contrast to the uncertainty of instituting more or different consent processes, participants overwhelmingly endorsed opportunities for "thoughtful conversations. " For example, one researcher said that distrust could be addressed if providers had a "larger conversation [with patients] about how this [big data] gives back to the community, and how it may improve lives. " Participants also proposed that community forums be established to discuss these issues with the public at large and that decision-making incorporate the viewpoints of people with OUD, specifically with community advisory boards. Participants identified CABs as a potential key component of ethical big data governance. CABs were seen as part of "a new research paradigm" that would entail "more direct involvement of people whose data are being used" and ensure appropriate oversight. Participants felt that individuals with OUD and their families should be involved in "every stage" of research. This included community involvement in developing research questions and hypotheses, specifying inclusion and exclusion criteria, selecting outcomes, interpreting results, and disseminating findings. Comments highlighted how stakeholder involvement should go beyond dissemination such that stakeholder views influence the design and conduct of big data activities. Perceived CAB benefits included the potential to: empower affected populations to identify potential harms and benefits, ensure research priorities are grounded in meeting health and social needs, enable accurate interpretation, and translate findings into salient practices and policies. At the same time, however, participants shared several expected CAB challenges. Challenges included limited resources needed to form, manage, and sustain CABs and the inability to engage CAB members with sufficient big data expertise. Some participants feared that CABs could influence the conduct or interpretation of big data research in harmful ways. Finally, participants recommended how to optimize CAB utility, which we explore in-depth elsewhere [44] . Participants had several ideas for ensuring that big data on opioid use does not further privilege certain groups or widen existing disparities in health knowledge or practice. Key suggestions were to advance health equity, set off-limit uses, and recognize big data "blind spots. " Participants valued health equity and identified processes to ensure that big data are used to achieve it. Suggestions underscored that health equity be a prominent and distinct goal that is integrated into all aspects of big data research. For example, participants called for better training of researchers on how to conduct health equity research and also the use of analytic methods (e.g., sample weights) to enable broader generalization of findings. Many pointed to aspects of data stewardship itself, highlighting how it could be structured to ensure big data are used for health equity. One researcher called for processes to ensure adequate big data access, "…so that people who are interested in disparities can access it and analyze it" and such that a "diversity of perspectives" are represented, including researchers from different institutions who think about opioid use epidemiologically, but also in terms of resource allocation, and in relation to prevention and treatment. Another researcher suggested that big data stewards play "a more proactive role" by being responsible for setting research priorities and reviewing proposals with health equity and disparities in mind, reasoning that "the default of doing nothing has problems…[not] demanding more 'just' projects has its own ethical consequences. " Results highlight the prominent and proactive role of big data stewards to safeguard data while also acting to promote its value and utility. Participants felt that big data on opioid use should never be used to harm individuals. Using big data for criminal justice purposes was the most commonly identified offlimits use. One participant said, "…if we were using [big data] to identify people who had used illegal drugs and then prosecute them, that would be, to me, a very, very different story…I wouldn't think is an appropriate use of data sets constructed for research purposes. " Others felt that certain frames for research results, and related policy implications, should also be off-limits. Speaking about how research has helped to establish understanding of in utero opioid exposure and its relationship to downstream developmental and medical issues, a researcher observed that this is "critical to know" but findings have been: "framed in the wrong way [by]…lawmakers [who have said], 'Children who are exposed to opioids in utero do poorly…and therefore, moms who use opioids during pregnancy should be in prison. ' And so, for me it's less about the study questions being off-limits and more about…how the results are framed and the researcher's responsibility to really help policymakers understand the data in the right way. " Another participant was concerned that big data could be used to perpetuate racial profiling and unjust criminal justice policies, saying. In contrast to these perspectives, others felt that given the urgency of the opioid epidemic, and the need for information on how to address it, no research should be off-limits. Participants were careful to endorse a "no offlimits" approach only with appropriate safeguards and ethical review. Comments generally reflected a desire to balance ethical concerns against the ability to use big data in meaningful ways to resolve a health crisis. Participants identified critical phenomena that shape opioid use but are not captured in big data. Notably, participants felt that the most important "blind spot" is the limited measurement of opioid and other substance use itself. This big data gap was thought to contribute to spurious or confounded results, unjustified conclusions and policy implications, and an inability to concentrate on the upstream causes of OUD. In addition to opioid and other substance use, participants pointed to several other experiences that shape OUD but are simply missing from big data. As we detail elsewhere [45] , other blind spots include early life risk factors (e.g., childhood adversity, family factors), socioeconomic status and other social status indicators (e.g., homelessness, poverty), social support status, and exposure to contexts that can increase OUD risks (e.g., incarceration, military service). Participants felt that a broader implication of blind spots is an incomplete or biased understanding of the opioid epidemic and limited thinking on how to address it. It was emphasized that these issues, if unaddressed, could maintain an unjust status quo. Results point to the need to create institutional processes to reflect on and respond to the complexities, limitations, and uncertainties embedded in big data. Participants considered ways to foster patient and public trust in the institutions that contribute to and manage big data. An overarching aim was to identify ways to ensure that the potential harms of big data are outweighed by its benefits. Participants shared uncertain and divided views about the utility of enabling citizens to directly access and analyze big data on opioid use. A "citizen science" option was considered, i.e., placing data online for public download and analysis. While participants felt that citizen science could increase public buy-in, this potential benefit was weighed against concerns regarding the complicated nature of big data and the lack of needed expertise to understand it and risks of data breaches and inappropriate interpretations. The following comments were made by five different participants. Participants suggested that if big data were made available to citizens, it would be best to first pilot-test mechanisms and also institute safeguards such as releasing only limited datasets along with sufficient documentation and technical support to enable appropriate uses. Although participants were skeptical about the benefits of citizen science, they were clear that opportunities should be available for the public to be involved in other ways. Participants agreed that the roles and responsibilities of institutions that are charged with creating and managing big data are critical to fostering public trust and ensuring that potential harms are outweighed by benefits. Key data stewardship activities involved safeguards to protect patient privacy and data confidentiality, clear communication on how data will be used for the greater good, and application of high ethical standards. A gatekeeper said, "…a lot of the so-called 'ethical concerns' about harms with data breaches, confidentiality, privacy, they seem to be taken care of with… safeguards…. "-An advocate said, "…it's really critical that whoever is in charge of…managing this data… [consider] what the criteria are…to release this data… [and] realizing what they're putting together here is community-sensitive information…there has to be a return from…the state [Public Health Department] , back to us [the public]…to sustain the trust. " A gatekeeper said, "…transparency…is really impor-tant…we say all the time, 'This isn't ever going to be used for clinical decision-making, because you will never be able to identify that this person who had this trajectory was that person…you can identify groups of people with similar characteristics and change things at…that higher systems level, but…it's really not there to be calling anyone out or trying to find a person. ' And so I think that can be helpful… you have to be really transparent about that…. " Another gatekeeper said, "…people really wanted to know how we were going to use the data… [ [and] often that data is going somewhere kind of without…the user's knowledge. And…everything that came out with Facebook…really shed a light on the lack of really any policies…It's a conscious decision… to proactively be an ethical steward, in whom the public could have trust. We set the bar high…higher than legally required. " Comments reflected participants' recognition of how big data stewardship can minimize harms and ensure that research mitigates, rather than creates or exacerbates, vulnerabilities faced by individuals with OUD. Moreover, transparent processes convey respect for individuals and, by enabling public scrutiny, can help build trust. Other ideas to foster public trust in big data, and maximize its potential benefits, entailed much greater education and engagement with the affected population and their allies, the health and social providers who serve them, and the general public. For example, a researcher observed that a way to strengthen trust is to "…translate the message from this academic big data study to the community that's affected…making that an effort, and potentially a stated goal, even before you start the study, is…really important. Like 'how are you going to translate these findings back to the community from which the data's obtained?'" Similarly, a gatekeeper said, "…the more that you can make those goods consumable by different audiences, not only academics or…policymakers, but… [other] audiences that are in the weeds of the programs or…[in] community engagement…that's very important. " Also important was returning findings to those who contributed data, while also informing them of how their data helps to address the opioid epidemic, and engagement of the affected population in deciding how to communicate findings. An advocate said, "…you might see some info-graphic…[for example] 'there's 120% more overdose deaths in this geographic region than there was this time five years ago' And…I think people would feel better about it, knowing if they were actually contributing to this data…I'm probably already a part of that data, but no one told me I was…I'd probably feel a heck of a lot better about it knowing that I was. " A researcher reflected on experiences of sharing research findings with patients, observing that, "…it falls pretty flat…people just sort of really hold their beliefs…they see even my medical advice as a personal belief…there is often this disconnect… [with] science and results that we as a…scientific community hold as valid…it just doesn't necessarily matter, if the process and the results are not translated in a way that they're palatable to people or in a way that they can digest. And I just don't think we do a very good job of doing that…and unfortunately now we're living in this time where people are…more paranoid than ever. And so…really engaging people in the process more is probably the only way to do it. I just don't think that either researchers or government… can talk their way into convincing people that this is a reasonable thing to do. " Participants' comments suggested that public engagement could offer ways to gain public buy-in and ensure that big data uses cohere with public expectations and values. A final idea for engendering public trust in big data was to better convey how it is an essential public resource to protect and produce population health. To this end, participants suggested big data media campaigns and public engagement projects. Some suggested that these and other efforts be guided by an ethics board, an entity with a much broader mission than an IRB. Massachusetts is engaged in unsurpassed use of administrative big data on opioid use as routinely provided by health, criminal justice, and social services systems. Informed by the Kass framework, we documented how key stakeholder concerns regarding big data on opioid use are rooted in perceptions of potential privacy infringements due to linkage of previously distinct data systems, increased profiling and surveillance capabilities, limitless lifespan, and lack of explicit informed consent. Also problematic is the inability of affected groups to control how big data are used, the potential of big data to increase stigmatization and discrimination of affected communities and groups despite data anonymization, and big data uses that ignore or perpetuate biases. We also synthesized stakeholder perceptions of different strategies for big data uses. Implications of our results inform the following recommendations for big data governance ( Table 2) . A key finding is that individuals with OUD may be particularly vulnerable to potential big data misuses (for examples of potential misuses, see section "Off-limits topics and methods"). This population is generally unaware of big data and is excluded from deciding how it is created or used, representing significant asymmetries in big data knowledge and power. Also, this population faces added risks of OUD-related discrimination and stigma, elevated susceptibility to systemic disadvantages, and diminished opportunities to avoid or ameliorate consequent harms. Furthermore, the benefits of big data on opioid use mostly accrue to future generations while any potential harms are borne today. Thus, when considering big data policies and procedures it may be useful to view individuals with OUD as a population whose status warrants added protections to guard against potential harms. It is also important to ensure that big data research mitigates vulnerabilities rather than creates or exacerbates them. Our results indicate that a few places to start are to prioritize health equity, set off-limits topics and methods, and recognize blind spots. Our findings indicate that shared big data governance systems offer ways to protect people with OUD from potential added harms. Other research has suggested that big data co-governance is an ideal rather than a feasible reality [46] . Consistent with this idea, our findings point to Community Advisory Boards as forums for the affected population to have a say in how data about them is gathered, stored, disseminated, and translated. CABs can be used to engage in transparent and collaborative activities, identify and respond to blind spots and other embedded limitations and uncertainties, and appropriately frame findings and policy implications. More broadly, shared data governance enables affected groups to make the most of their own big data resources. Results revealed that as big data stewards, governmental public health is responsible for establishing policies and procedures that enable ethical data governance. Essential elements include transparent information on how big data is regulated, protections and potential risks for individuals whose data may be used, governance mechanisms, accountability pathways, and expected public benefits. These activities promote openness to public scrutiny of big data decision-making, processes, and actions. Such transparency demonstrates respect for persons and contributes to the trustworthiness of public institutions, conditions that are necessary for public support of big data [34, 47] . A related next step is to consider how established principles for good data management and stewardship, such as the Findable, Accessible, Interoperable and Reusable (FAIR) Guiding Principles [48] , can be adapted to support knowledge discovery and innovation in the uses of big data on opioid use. Furthermore, our results suggest that it is important to not assume there is social permission for big data activities and this is the case even when individuals have provided consent, a finding reported by other research [49] . Thus, an important role for big data stewards is to elicit public views, concerns, and expectations in relation to big data and make efforts to ensure that uses do indeed align with public expectations and values. Another important role for stewards is to communicate how big data holds the prospect of direct benefits both for individuals with OUD and also the population at large. These activities should be prepared to address how the potential harms and benefits of big data for public health are different from those posed by big data for commercial purposes [50] . Actions such as these can create a sense of commitment among persons with similar interests to share costs and benefits for the greater good. Finally, data stewards can lead efforts to identify key values for guiding how to use big data on opioid use and for making decisions when those values conflict. Ethics frameworks and deliberative balancing approaches [46, 51] provide options for considering salient values and how to minimize potential harms. Ethical guidance for big data research has mostly been concerned with protection from presumed harms, consent, and individual control of data uses [52, 53] . While salient, our results indicate that the ideas of respect, equity, and trust are essential for creating guidelines on ethical uses of big data for public health purposes. In addition, it may be more helpful to refocus ethical approaches to give primacy to community engagement, which extends concerns beyond the individual [52] [53] [54] . Findings are based on a non-random convenience sample of 39 individuals in Massachusetts who are knowledgeable about or interested in big data. Small sample sizes are typical in qualitative research and are not meant to support generalizations, but rather provide depth of information [55, 56] . Our list of concerns and data governance recommendations is not exhaustive, but rather highlights selected points as identified by participants. Researchers and gatekeepers had more direct experience than advocates with big data collection, management, and analysis which likely contributed to variation by group in responses to certain prompts. For example, gatekeepers and researchers shared more than advocates in relation to data stewardship and blind spots whereas advocates shared more in relation to off limits topics and the value of educating and informing the public. We highlight findings that represent overall views. We did not analyze variation in perspectives by group, pointing to an area for future research. Findings pertain to static cross-sectoral administrative big data that is created by and for public health. We do not consider issues that may be unique to big data that is assisted by artificial intelligence or the "internet of things" (e.g., mobile phones, environmental sensors, wearable devices), mined in real-time, or created for commercial, criminal justice, or other uses. A strength is that we solicited perspectives from diverse stakeholder groups, including advocates most impacted by OUD, and regarding big data on opioid use, thereby examining topics that previously have been little studied [57] . Also, the study is set in Massachusetts, which is on the forefront of using big data for public health. We employed qualitative methods to explore the experiences of advocates, researchers, and gatekeepers, thereby gaining insight into the complex set of factors that shape views. Finally, the Massachusetts PHD warehouse originated in the opioid epidemic. However, the PHD warehouse was intentionally designed to be used to study other emergent public health issues and, as such, it is now being used by MDPH to understand maternal and child health inequities and assess the impact of COVID-19 [58] . We expect that similar issues as we have identified in this paper in relation to the opioid epidemic are likely to arise in these other fields of investigation. In this sense, our recommendations for big data governance may generalize to these and other areas where public health big data are used for research purposes. Using big data to address the opioid epidemic poses significant ethical concerns that, if unaddressed, may undermine its benefits. Findings can inform guidelines on how to conduct ethical big data governance and in ways that protect and respect patients and society, ensure justice, and foster patient and public trust in public institutions. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century Drug overdose deaths in the United States National Center for Health Statistics Mortality in the United States Drug and opioid-involved overdose deaths-United States Assessment of changes in the geographical distribution of opioid-related mortality across the United States by opioid Type Estimated prevalence of opioid use disorder in Massachusetts Massachusetts: MDPH Opioid-Related Overdose Deaths among Massachusetts Residents The ratio of non-fatal to fatal heroin overdose Post opioid overdose outreach by public health and public safety agencies: exploration of emerging programs in Massachusetts Shared responsibility: Massachusetts legislators, physicians, and an act relative to substance use treatment, education, and prevention Building a prototype of a statewide public health data warehouse: Data privacy and security issues addressed by the Massachusetts Chapter 55 Opioid Initiative Massachusetts: MDPH Developing a collaborative approach to addressing critical public health issues: a case study of the Massachusetts Chapter 55 Opioid Initiative The Massachusetts Public Health Data (PHD) Warehouse: History, Current Operation, and Impacts. Report submitted to the Massachusetts Department of Public Health (MDPH) Non-fatal opioid-related overdoses among adolescents in Massachusetts Overdose risk for veterans receiving opioids from multiple sources Medication for opioid use disorder after nonfatal opioid overdose and association with mortality: a cohort study Medication for opioid use disorder after nonfatal opioid overdose and mortality Potentially inappropriate opioid prescribing, overdose, and mortality in Massachusetts Effect of age on opioid prescribing, overdose, and mortality in Massachusetts Fatal and nonfatal overdose among pregnant and postpartum women in Massachusetts Opioid overdose deaths and potentially inappropriate opioid prescribing practices (PIP): a spatial epidemiological study The opioid epidemic: data driven case studies • fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year • At BMC, research is always in progress. Learn more biomedcentral.com/submissions Ready to submit your research Ready to submit your research ? Data sources and data-linking strategies to support research to address the opioid crisis: final report. US Department of Health and Human Services: RAND Corp Offender diversion into substance use disorder treatment: the economic impact of California's Proposition 36 Benefitcost in the California treatment outcome project: does substance abuse treatment "pay for itself Using administrative data for longitudinal substance abuse research Cost-effectiveness of publicly funded treatment of opioid use disorder in California Big data begin in psychiatry What is trust? Ethics and risk governance in precision medicine and predictive analytics Public responses to the sharing and linkage of health data for research purposes: a systematic review and thematic synthesis of qualitative studies Power to the people: data citizens in the age of precision medicine The ethics of big data: current and foreseeable issues in biomedical contexts Using routinely collected health data for surveillance, quality improvement and research: Framework and key questions to assess ethics The acceptability of conducting data linkage research without obtaining consent: lay people's views and justifications Weapons of math destruction: How big data increases inequality and threatens democracy Big data science: opportunities and challenges to address minority health and health disparities in the 21st century An ethics framework for public health Using thematic analysis in psychology What can "thematic analysis" offer health and wellbeing researchers? Community advisory boards and big data governance: Lessons learned from the opioid epidemic in Massachusetts Utility of big data for research on the current opioid epidemic An ethics framework for big data in health and research Big data ethics The FAIR guiding principles for scientific data management and stewardship Patients' and public views and attitudes towards the sharing of health data for research: a narrative review of the empirical evidence The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. London: Profile Books Ltd Cross-sectoral big data: the application of an ethics framework for big data in health and research Adjusting the focus: a public health ethics approach to data research Ethics, big data and computing in epidemiology and public health Beyond consent: seeking justice in research Approaches to sampling and case selection in qualitative research: examples in the geography of health Research design: qualitative, quantitative, and mixed methods approaches Public health research, practice, and ethics for justice-involved person in the big data era Using data to guide action in response to the public health crisis of opioid overdoses Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations We would like to thank the research participants for sharing their insights and time for this study. Authors' contributions EE served as PI, conceptualized and designed the study, collected and analyzed data, drafted the paper, and had final approval for all aspects of the study. ED collected and analyzed data and assisted with writing sections of the paper. KC collected and analyzed data and assisted with writing sections of the paper. DG assisted with conceptualizing the study and contributed to writing the paper. All authors read and approved the final manuscript. Supported by The Greenwall Foundation. Dr. Evans is also supported by the National Institute on Drug Abuse (NIDA) UG3 DA0044830-02S1 and 1UG1DA050067-01, and the Substance Abuse and Mental Health Services Administration (SAMHSA), Center for Substance Abuse Treatment (CSAT) Grant No. 1H79T1081387-01. None of the funding sources had influence over study design, data collection and analysis, or interpretation of findings. Not applicable. All procedures were approved by the University of Massachusetts Amherst Institutional Review Board. All participants received written information about the study during recruitment, including their right to withdraw from the study at any time, and they provided written informed consent. Not applicable. The authors report no competing interests.Received: 9 June 2020 Accepted: 13 October 2020