key: cord-0763348-fa0k4j19 authors: O’Connor, Siobhan title: Secondary Data Analysis in Nursing Research: A Contemporary Discussion date: 2020-06-05 journal: Clin Nurs Res DOI: 10.1177/1054773820927144 sha: 19a0a06f963fa9db2dc2dbfdfaa077cf14afb2ce doc_id: 763348 cord_uid: fa0k4j19 This editorial provides an overview of secondary data analysis in nursing science and its application in a range of contemporary research. The practice of undertaking secondary analysis of qualitative and quantitative data is also discussed, along with the benefits, risks and limitations of this analytical method. The earliest reference to the use of secondary data analysis in the nursing literature can be found as far back as the 1980's, when Polit & Hungler (1983) , in the second edition of their classic nursing research methods textbook, discussed this emerging approach to analysis. At that time, this method was rarely used by nursing researchers. McArt & McDougal (1985) posit a number of reasons for the lack of secondary data analysis in nursing at that point including a preference for empirical research, limited datasets available in healthcare making it less favourable, and low awareness or appreciation of this type of analysis. Its origins lay further back within the wider educational, social science, and other scientific literature as Glass (1976) described the process as 'reanalysis of data for the purpose of answering the original research question with better statistical techniques, or answering new questions with old data'. Over 40 years later, it seems nursing science has come full circle with secondary data analysis widely employed across many areas of clinical, pedagogical, and policy research (Aktan, 2012; Naef et al., 2017) . So, what has changed? One paradigm shift over the preceding decades has been the digitisation of data, driven by advances in computing. Early forms of modern day Electronic Health Records (EHRs) and other hospital information systems emerged in the United States in the 1960's and 1970's, and their use slowly spread worldwide (Musen & van Bemmel, 1997) . These enable researchers to tap into a wealth of clinical and administrative hospital data for secondary analysis. For example, nurses have examined electronic care plans to determine if they meet national documentation standards (Häyrinen et al., 2010) and used EHRs in home health agencies to identify interventions that could improve urinary and bowel incontinence (Westra et al., 2011) . As the World Wide Web, more commonly known as the Internet, became accessible to the public in the 1990's, researchers were able to utilise this new global, communications tool to share and access health datasets more easily. In tandem, online environments themselves, particularly social media platforms, created new virtual forums where patients, carers, health professionals and others could interact. This permits researchers to mine these new data sources in numerous ways. For instance, nurses have investigated patient and family blogs about illness to enhance online communication (Heilferty, 2009) and used Twitter datasets to appreciate how social media could be employed to inform health policy (O'Connor, 2017) . The rapid developments in computing along with those in telecommunications led to the rise of mobile technology in the 1990's and 2000's. Smartphones and accompanying health applications give patients and the public the ability to collect their own personal health data for self-management and selfcare (Heidi et al., 2017) . Mobile devices and applications are also used by health professionals and students for a range of purposes such as monitoring patient vital signs, prescribing Ventola, 2014) . This allows researchers to employ citizen science techniques to gather data from apps for secondary analysis. To illustrate, nurses developed analytics for an app to track tobacco use in psychiatric patients (Oliveira et al., 2016) and monitored how health professionals engage with apps through Google and other analytics platforms (Maskey et al., 2013) . Wearable and sensor technologies for personal and home health monitoring were next to follow, adding to the health datasets potentially available for re-analysis (Sloan et al., 2018) . This is not to say that paper based forms of information such as patient diaries are not valuable sources and are still being used for secondary analysis (Cheraghi-Sohi et al., 2013) . Making the most of digital data by linking datasets to enable 'Big Data' analysis (O'Connor, 2018) is now considered by some to be the epitome of secondary data analysis in many areas of science. Big Data is often described using five 'Vs'; volume, velocity, variety, veracity and value, reflecting the types of datasets it can encompass and the challenges of analysing these. This approach is being utilised in nursing in many ways such as mining EHRs, web-based reporting systems, and clinical and organisational databases by employing a range of statistical and other techniques (Westra et al., 2017) . Now, we also have more sophisticated software tools such as Hadoop and Tableau (see Figure 1 ) that enable secondary analysis of digital datasets or some researchers use programming languages such as Python or R to create their own specialised analytical tools (Bogdan & Raluca Mariana, 2014) . Furthermore, visualisation is becoming a popular approach to presenting the results of this process. Although data visualisation was pioneered by Florence Nightingale over 150 years ago (O'Connor et al., 2020) , it is now being widely applied to augment primary and secondary analysis so that complex findings can be presented in clear and coherent ways. Outside of the major technological shifts and digitisation of data in recent years, secondary data analysis has also become more widely used due to the challenges of undertaking empirical research. An issue some nursing scientists face is recruiting populations of patients or carers that are difficult to reach due to a myriad of social, cultural, economic and political reasons. These may include refugee and migrant groups, those who experience domestic and sexual violence, homelessness and many more (Biederman & Forlan, 2016) . Research fatigue in over-researched groups that may include some cancer patients, certain indigenous communities, nursing students, and others can also be avoided by adopting secondary analysis (Clark, 2008) . Hence, utilising existing datasets related to participants of interest can offer an alternative way to examine some issues while removing respondent burden (Ziebland & Hunt, 2014) . Szabo & Strang (1997) suggest it can also help reduce researcher bias and provide some objectivity, as the researcher may not have been immersed in the original study design or data collection. Equally, accessing different professional groups from clinicians to policy makers may prove difficult at times. For instance, emerging global health crises such as COVID-19 pose barriers to recruiting these types of participants and carrying out primary data collection, outside of research focused on addressing the immediate health crisis (Nicol et al., 2020) . Therefore, tapping into existing datasets and interpreting them to address research questions can be beneficial, enabling an area of nursing science to move forward. Another important factor driving more secondary analysis of data is the emergence of open data initiatives and policies that promote open access to scientific data and research. The open data movement has been gathering pace for several years as governments make public datasets more easily accessible to accelerate data driven innovation and gain greater returns on the funds invested in research (Conway & Vanlare, 2010) . For example, the Human Genome Project was a major international research initiative in the 1990's and early 2000's that modelled the full human genome. It made this data freely available to facilitate knowledge and scientific discovery, leading to new fields such as personalised medicine (Collins & McKusick, 2001) which has many implications for nursing (Vorderstrasse et al., 2014) . Open access is now encouraged by many research funders so that datasets collected are placed in a freely available online repository for others to use. This culture of data sharing is likely to continue given the large amounts of funding required for empirical research as secondary analysis can be a costeffective way to uncover new insights, particularly when there is a challenging financial landscape to contend with. The other driver of secondary data analysis in nursing can be attributed to developments in the professionalisation of nursing education, clinical practice and research. With a move to graduate nursing education in many countries, evidence based practice is now taught to nursing students so they can incorporate the results of scientific research into future decision making and care delivery (Mackey & Bassendowski, 2017) . Masters and doctoral level study are now more widely available to nurses globally, with the volume of nursing PhD and Doctor of Nursing Practice programmes increasing in some regions (Bednash, et al., 2014) . Nowadays, nurses often have extra opportunities to undertake research throughout their clinical and academic careers (Currey et al., 2011; Francis & Humphreys, 1999) . This means that more primary data in nursing and healthcare is being gathered that can form the basis for secondary analysis. Furthermore, with added opportunities for advanced training, nursing science is broadening its traditional base and expanding its ontological, epistemology and methodological expertise facilitating more secondary data analysis. Given the widespread and growing use of secondary data analysis in scientific research, it is worthwhile revisiting this practice to appreciate how it can generate evidence in nursing, now and into the future. Beck (2019) discusses some of the practical aspects to consider before beginning such as identifying appropriate datasets and negotiating access to these as this can take time and money. Assessing the quality of the secondary dataset is also recommended so its strengths and limitations can be understood, as this may impact the analytical methods used along with the findings and will need to be reported in any published works. This could include reviewing the expertise and qualifications of those involved in gathering and processing the data, considering any contextual information such as accompanying field notes or ethical approval, and examining the completeness of the primary data whether that is qualitative, quantitative or a mixture of both. Heaton (1998) also suggests considering the size and diversity of the sample to gauge whether it is adequate to address certain research questions, along with the availability of the original researchers to consult and provide guidance and clarification where needed. In terms of the secondary analysis of qualitative data, Heaton (2004) describes five different ways to undertake this. Firstly, supplementary analysis is an in-depth analysis of an emergent concept in a qualitative dataset not fully explored in the primary study. For example, Hurlock-Chorostecki et al. (2013) used this approach on focus groups to uncover nurse practitioner interprofessional practice. While this retroactive interpretation can provide useful insights quickly and easily, it may be limited if the underlying dataset is not rich enough. Secondly, supra analysis uses existing qualitative data to address a new research question in a separate study. Sanna-Maria et al. (2017) employed this technique to examine the perceptions of nurse leaders on ethical recruitment in clinical research. Although this can allow for new perspectives and settings to broaden our understanding of a phenomenon, bias could be introduced if there is not a good 'fit' between the secondary data and the new research questions or study design (Hinds et al., 1997) . Thirdly, re-analysis relies on additional analyses of qualitative data to authenticate the results of a primary study. Saal et al. (2018) applied this in a mixed methods study to develop a complex intervention to improve social participation in care home residents with joint contractures. Even though reanalysis can strengthen the results of a primary study simply and speedily, the reinterpretation of qualitative data may lead to misconceptions and different findings (Swanson, 1986) . Fourthly, amplified analysis occurs when two or more qualitative datasets are combined and then compared and contrasted using secondary analysis. Stickley et al. (2018) utilised this to examine the relationship between participatory arts and how people recover from mental health conditions. Granted this may provide richer datasets to investigate a phenomenon but by collectively pooling qualitative data some contextual or conceptual insights may be lost (Sandelowski, 1991) . Lastly, assorted analysis involves secondary analysis that is undertaken alongside the analysis of primary qualitative data. Watters et al. (2018) exploited this method to explore resilience and social inclusion among single mothers across Canada. Although analysing qualitative datasets in tandem could enrich and augment the final results, there is a risk of cross contamination during coding and analysis that might lead to inaccurate findings (Heaton, 2004) . In terms of the secondary analysis of quantitative data, traditional approaches utilising descriptive and inferential statistics on an array of datasets are common. Secondary sources of quantitative data may include national census conducted by government, local or regional datasets held by public bodies, or questionnaires and surveys undertaken by researchers at a university or other type of national or international institution (Dale et al., (2008) . For example, Oh et al. (2016) mined the Korea Youth Risk Behaviour Webbased Survey to determine whether satisfaction with sleep was linked to stress in adolescents with atopic disease, while Jacoby et al. (2017) reused data from a longitudinal cohort study of psychological outcomes from minor injury to examine how this relates to recovery and disability. Digital archives held by libraries, museums or other social and cultural agencies could also be useful sources of quantitative data, with some providing an extensive catalogue that is searchable online. The International Federation of Data Organisations (IFDO, 2020) and the Consortium of European Social Science Data Archives (CESSDA, 2020) may be helpful in identifying national archives for secondary analysis. However, Dale et al. (2008) warn of potential problems with the secondary analysis of quantitative data as surveys and other measurement tools may have been constructed and their reliability and validity determined in specific ways. Equally the sample of participants, their characteristics and response rates may pose issues when modelling for correlation or causation. Hence, a critical eye should be cast to appreciate the strengths, limitations and biases inherent in a quantitative dataset before reusing it. The power of combining qualitative and quantitative datasets in a mixed methods study design is becoming more evident within nursing science (Hall et al., 2018) , providing a richer ground within which to employ secondary data analysis. Dugas et al. (2017) adopted a sequential explanatory design and examined the results of a systematic review on the processes of developing patient decision aids, followed by interviews with vulnerable patients to better understand how to involve them in this type of research in the future. Newer statistical techniques such as machine learning, in particular deep learning, from the growing artificial intelligence community are also being developed and offer new ways to interpret secondary quantitative datasets. Although this is still a relatively novel approach in nursing, Bose & Radhakrishnan (2018) employed a variety of clustering techniques to model the characteristics of heart failure patients who used telehealth services from two home health agency datasets. Secondary data analysis is now firmly embedded in nursing science, helping researchers to uncover new insights that can improve nursing education, patient care, health service delivery, public health, and health policy. No doubt new ways of approaching and conducting this analytical method will continue to emerge and become adopted into the practice of nursing research worldwide. Social support and anxiety in pregnant and postpartum women: A secondary analysis Secondary qualitative data analysis in the health and social sciences PhD or DNP: Planning for doctoral nursing education Desired destinations of homeless women: Realizing aspirations within the context of homelessness Integrating R and Hadoop for big data analysis Using unsupervised machine learning to identify subgroups among home health patients with heart failure using telehealth Consortium of European Social science data archives Patient priorities in osteoarthritis and comorbid conditions: A secondary analysis of qualitative data We're Over-Researched Here!': Exploring accounts of research fatigue within qualitative research engagements Implications of the human genome project for medical science Improving access to health care data: The open government strategy Clinical nurse research consultant: A clinical and academic role to advance practice and the discipline of nursing Secondary analysis of quantitative data sources Involving members of vulnerable populations in the development of patient decision aids: A mixed methods sequential explanatory study Enrolled nurses and the professionalisation of nursing: A comparison of nurse education and skill-mix in Australia and the UK Primary, secondary, and meta-analysis of research Nurses' attitudes and behaviour towards patients' use of complementary therapies: A mixed methods study Evaluation of electronic nursing documentation-Nursing process model and standardized terminologies as keys to visible and transparent nursing Secondary analysis of qualitative data Reworking Qualitative Data Tailored communication within mobile apps for diabetes selfmanagement: A systematic review Toward a theory of online communication in illness: Concept analysis of illness blogs The possibilities and pitfalls of doing a secondary analysis of a qualitative data set The value of the hospital-based nurse practitioner role: Development of a team perspective framework International Federation of Data Organizations The effect of early psychological symptom severity on long-term functional recovery: A secondary analysis of data from a cohort study of minor injury patients The history of evidencebased practice in nursing education and practice Powering transplant professional collaborations with web and mobile apps Secondary data analysis-a new approach to nursing research Handbook of medical informatics Variances in family carers' quality of life based on selected relationship and caregiving indicators: A quantitative secondary analysis Action at a distance: Geriatric research during a pandemic Smartphones and mobile applications (apps) in clinical nursing education: A student perspective Using social media to engage nurses in health policy development Big data and data science in health care: What nurses and midwives need to know Data visualization in healthcare: The Florence effect The mediating effect of sleep satisfaction on the relationship between stress and perceived health of adolescents suffering atopic disease: Secondary analysis of data from the 2013 9th Korea Youth Risk Behavior Web-based Survey Development of the TabacoQuest app for computerization of data collection on smoking in psychiatric nursing Nursing research: Principles and methods Development of a complex intervention to improve participation of nursing home residents with joint contractures: A mixed-method study Telling stories: Narrative approaches in qualitative research Collaborative partnership and the social value of clinical research: A qualitative secondary analysis The influence of a consumer-wearable activity tracker on sedentary time and prolonged sedentary bouts: Secondary analysis of a randomized controlled trial The art of recovery: Outcomes from participatory arts activities for people using mental health services Analyzing data for categories and description Secondary analysis of qualitative data Mobile devices and apps for health care professionals: uses and benefits Nursing implications of personalized and precision medicine The lone mother resilience project: A qualitative secondary analysis Predicting improvement in urinary and bowel incontinence for home health patients using electronic health record data Big data science: A literature review of nursing research exemplars Using secondary analysis of qualitative data of patient experiences of health care to inform health services research and policy None. The sole author drafted and wrote the manuscript. The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author(s) received no financial support for the research, authorship, and/or publication of this article. Siobhan O'Connor https://orcid.org/0000-0001-8579-1718 Dr Siobhan O'Connor, BSc, CIMA CBA, BSc, RN, FHEA, PhD, is a Lecturer in Nursing Studies at the University of Edinburgh, United Kingdom. She is a core member of the faculty, with a multidisciplinary background in both nursing and information systems. Hence, her research interests focus on the design, implementation, and use of technology in healthcare. https://www.research.ed.ac.uk/ portal/en/persons/siobhan-oconnor(283996e9-3744-46c4-b4a2-dc2700e17297).html