key: cord-0017559-ydxe2kfi authors: Bergeron, Julie; Massicotte, Rachel; Atkinson, Stephanie; Bocking, Alan; Fraser, William; Fortier, Isabel title: Cohort Profile: Research Advancement through Cohort Cataloguing and Harmonization (ReACH) date: 2020-12-25 journal: Int J Epidemiol DOI: 10.1093/ije/dyaa207 sha: 6cf224b4a6cb7ffd53546afab4d55761278a1672 doc_id: 17559 cord_uid: ydxe2kfi nan Why was the consortium set up? Considerable evidence exists that the risk determinants of many non-communicable diseases have their origins in early life, including cardiovascular disease, cancer, chronic obstructive lung disease, musculoskeletal diseases, diabetes and mental illnesses. The Developmental Origins of Health and Disease (DOHaD) research explores how the interplay between maternal and environmental factors programme fetal and child growth and influence developmental trajectories and susceptibility to disease later in life. 1 Canada, as well as many other countries worldwide, have invested heavily in pregnancy and birth cohort studies supporting such research interests. However, the impact of single existing cohort databases could be expanded to provide more value to the research community, healthcare providers and policy-makers than each contributes individually. Cohort investigators recognize that individual studies often do not have the statistical power, specific data items or sufficient duration of follow-up needed to fully support the current and upcoming needs of research. Even the largest and best-designed cohorts often only generate enough participants to detect the most important relative risks, or to investigate relatively common health issues, and are limited in the potential to explore how the social and physical environment interacts with genetic factors to influence health. To address these issues and expedite discovery, we need a paradigm shift in the way we conduct our research. Enabling timely access to available data and samples, increasing the potential to share data across cohorts and promoting a multidisciplinary and collaborative approach to research are major assets of this new approach. They form the motivation behind the instigation of the Research Advancement through Cohort Cataloguing and Harmonization (ReACH) initiative. Several national and international initiatives have been created to support DOHaD data access, integration and co-analysis. Birthcohorts.net, 2 LifeCycle Project, 3 Maternal, Infant, Child and Youth Research Network (MICYRN), 4 Environmental Health Risks in European Birth Cohorts (ENRIECO) 5 CoLab COLLECT database 6 are very good examples of such initiatives. However, all these platforms face similar challenges. First, national, regional or organizational ethical and legal policies often limit access to individual participant data to external investigators. Second, even with many relatively well-known cohorts, information on the samples and data items collected is often not publicly available or is only available in a format that does not allow investigators to easily find the specific information they need to understand data content. The lack of accessible and structured documentation represents a major barrier for external investigators interested in using cohort-specific data. Third, because of the complexity and inevitable heterogeneity of the information collected across pre-existing studies and databases, valid comparison and/or integration of information presents major methodological challenges. Finally, the achievement of scientifically founded data harmonization and co-analysis requires access to funding, secure data environments, and specialized expertise and resources, fundamentals that are not always accessible. Bringing together the expertise and tools developed by MICYRN 4 and Maelstrom Research, 7 ReACH was launched in 2016 to tackle these challenges. The initiative aims to leverage DOHaD research by providing the Canadian and international research community with a platform optimizing data discoverability and facilitating co-analysis across studies. Making use of the approach and tools developed by Maelstrom Research, 8, 9 ReACH implemented a centralized web-based catalogue documenting 26 Canadian pregnancy and birth studies relevant to DOHaD research. The initiative also offers support to investigators interested in harmonizing and co-analyzing data across studies. Building on data collected by the Canadian DOHaD studies, ReACH brings together cohort leaders, investigators requesting cohort data to achieve their research goals, and experts from different fields developing data discovery and integration resources. The ReACH project assembled studies with potential for collaborative DOHaD research. Inclusion criteria for the studies were: (i) recruit Canadian mothers and/or children; (ii) have a longitudinal design (i.e. at least one follow-up of participants after the initial collection event); (iii) have collected data (baseline or follow-up) after the year 2000; (iv) collect information on pregnancy and birth outcomes. Currently, 26 Canadian studies have been selected, but new studies are welcome to join the initiative. Table 1 provides an overview of the study characteristics. One of the participating studies started recruitment of participants preconception, 19 during pregnancy, 4 when the child was an infant (<1 year old) and 2 when the child was older (>1 year old). The majority (n ¼ 18) of the studies recruited women through a care provider during their routine pregnancy visits at the clinic. The number of participants recruited ranged from 70 to 11 379, with 2 studies including specifically >5000 children (Table 1) . Together, studies recruited 34 891 mothers, 45 907 children, 8835 fathers, 30 grandmothers and 264 siblings. Of the 26 participating studies, 21 followed up both mothers and children, 4 children only and one followed only mothers. Fathers were included in 5 studies, whereas one also included grandmothers and another included siblings. The scientific focus varied broadly across studies. The most frequent outcomes pertain to child development (12 studies), child health (10 studies) and pregnancy and birth outcomes (10 studies). The main risk factors of interest include environmental exposures, mothers' lifestyle habits and behaviours, and familial and socio-economic environments. How often have participants been followed up? Significant variation is present across studies in years of recruitment, duration and frequency of the follow-up of participants (data collection time points). Two studies started recruitment before 2000, 18 from 2000 to 2009 and 6 in 2010 or later. Currently, 20 of the 26 studies continue to follow participants ( Figure 1 ). The follow-up duration ranges between 16 months and 24 years. The number of data collection time points also varies across studies and ranges from 1 to 13 for mothers, 1 to 22 for children and 1 to 6 for fathers. Fifteen of the studies collected information from infancy to at least 5 years of age, but the time points of data collection differ across studies. What has been measured? Figure 2a outlines the sources of data collected by subpopulations and time points. All 26 studies collected information from questionnaires, 23 also collected biospecimens, 21 performed physical measurements, 12 conducted cognitive assessments and 3 retrieved information from administrative databases at least once during the follow-up of participants. Blood was the most common biospecimen collected ( Figure 2b) , with 21 studies having collected blood at least once. Among the 23 studies that collected biospecimens for a given sub-population, the number of biospecimen collection time points varies from 1 to 6 for mothers, 1 to 20 for children and 1 to 3 for fathers. A broad range of data was collected by the studies. Details of specific information collected by at least 50% of the 24 studies with a documented list of variables is displayed in Figure 3 . Based on the data dictionaries provided by the cohorts, all studies collected information about age, sex, anthropometric measures, and pregnancy and delivery outcomes. The majority of cohorts (>80%) also collected information about tobacco use, breastfeeding, diseases of the circulatory or respiratory systems, visits to health Participating studies have collected an invaluable amount of data and samples particularly useful to support collaborative DOHaD research, and the tools (catalogue, methods and software) offered by ReACH can leverage the use of these scientific resources. The ReACH catalogue (https:// www.maelstrom-research.org/mica/network/reach) documents the design of the 26 studies and the potential to access data and samples. It also offers a detailed description of each sub-population, data/biospecimen collection time points and variables collected. A user-friendly search engine (including comparison tables and a variable cart), as well as support from the ReACH team, facilitates data discovery. The catalogue resources allow identification of studies of interest, selection of variables and evaluation of the harmonization potential across studies. Furthermore, the methodological tools 8, 10 and open source software 9 developed are freely accessible. The software has been successfully used by a number of Canadian and international projects to catalogue and disseminate metadata (Mica), store, curate and harmonize data (Opal) and, where relevant, perform federated data analysis (DataSHIELD). 11 A case study illustrating usage of the ReACH resources to support co-analysis of data is included in Supplementary Material 2, Using the resources, a case example, available as Supplementary data at IJE online. Investigators interested in the available resources are also welcome to explore our website or contact the ReACH team. The first project developed under the ReACH umbrella was launched in 2017 and the number of Canadian and international initiatives established or supported by the team is rapidly increasing. Some of these projects aim to harmonize and co-analyze existing data to answer various research questions, each initiative assembling data from 10 000 to >250 000 mothers. For example, the Prenatal Alcohol Exposure initiative ( Table 2 ) brought together data from 5 Canadian cohorts (10 263 mothers) and allowed successful harmonization of information about alcohol intake, tobacco and drugs use, age, sex, marital status, education, ethnicity, working status, income, anthropometric measures, and pregnancy and birth outcomes. Analyses are currently underway to investigate the impact of alcohol consumption during pregnancy on birthweight and preterm birth, and identify the correlates of drinking before, during and after pregnancy. Other initiatives focus on supporting the prospective implementation of common measures to be collected across studies to facilitate future data sharing and co-analysis among specialized DOHaD cohort networks. One of the projects supported by ReACH, the Healthy Life Trajectories Initiative 12 (Table 2) , is addressing the burden of non-communicable diseases by developing evidence-based interventions. It brings together cohorts from Canada, China, South Africa and India. ReACH helped the network to define >2000 variables to be collected across countries for mothers, children and fathers sub-populations at ten time points (one preconception, two during pregnancy, one at delivery and six from birth to 5 years old). Finally, some projects foster the development of specialized methods and accessible resources to improve cost efficiency of DOHaD harmonization initiatives and optimize their scientific impact. The resources developed include ethico-legal and methodological tools and software applications. ReACH investigators are also working towards joining efforts with the Birthcohorts.net, 2 Lifecycle Project 3 and EUCAN-Connect 13 projects to foster implementation of common standards and international research activities. Most of the projects developed or supported by ReACH are still ongoing and have not generated publications yet. An overview of some of these initiatives is provided in Table 2 . What are the main strengths and weaknesses? ReACH brings under a common umbrella (i) Canadian pregnancy and birth cohort studies having carefully conducted longitudinal follow-ups and collected well phenotyped participant data and high-quality biospecimens; (ii) DOHaD investigators with various research interests; and (iii) expert data scientists, ethicists and software developers. In addition to the comprehensive study and variable catalogue developed, the ReACH team provides open access to expertise and specialized software to support data harmonization, integration and co-analysis. Although primarily developed to leverage usage of existing data, the Identify the most commonly asked questions regarding alcohol consumption during pregnancy and explore standardized alcohol screening tools used to collect information in perinatal surveillance efforts, with the aim to develop recommendations for collection of information on alcohol intake during pregnancy. ReACH resources also provide useful tools supporting conceptualization and implementation of new cohorts. A major strength of ReACH is the level of detail and standardization offered by the catalogue, combined with the capacity to search and easily extract information to (i) identify existing variables and studies of interest and (ii) explore harmonization potential across studies. The catalogue allows one to easily estimate whether data of interest is accessible, is suitable to answer the specific research questions addressed (e.g. level of physical activity measured with a specific scale), and is similar enough to enable co-analysis across multiple studies. In addition, the catalogue supports documentation of the harmonized data generated, helping investigators to learn from the rules and algorithms used by others and to improve the quality and efficiency of forthcoming data harmonization initiatives. ReACH does not offer access to a central data repository including study-specific and/or harmonized data from all participating studies. The platform was implemented as an adaptable resource aimed at leveraging and supporting discretionary harmonization initiatives. Researchers interested in using original data from specific studies need to contact the principal investigators and/or data access committees to seek permission for access to data, complete data transfer agreements and generate project-specific harmonized datasets. Although the process requires substantial effort and time for each project, the approach empowers achievement of projects requiring harmonization of comprehensive sets of data across a limited number of studies. Although the ReACH catalogue is among the catalogues worldwide providing the most comprehensive information, its true value clearly depends on its regular improvement and long-term maintenance. Cohorts are, by definition, longitudinal studies, and most of the ReACH partners are still following participants. Information included in the catalogue will thus need to be regularly updated to be kept current. The quality of the catalogue also rests on the information Can I access the data? Where can I find out more? The ReACH web-based metadata catalogue is freely accessible: https://www.maelstrom-research.org/mica/network/ reach#/. The individual participant data cannot be downloaded from the website, users interested in accessing the original data must contact the principal investigator or the study access committee. Contact information and study website can be found on each study description page included in the catalogue. Researchers interested in obtaining more information about the ReACH initiative or in need of support to use the ReACH resources can contact the team (info@maelstrom-research.org). All participating studies received ethical approvals from their respective institutions and informed consent from participants. Investigators requiring access to studyspecific data need to obtain ethical approval from their home institution and follow study-specific data access rules and procedures. Supplementary data are available at IJE online. The developing world of DOHaD Birth cohort studies: past, present and future The LifeCycle Project-EU Child Cohort Network: a federated analysis infrastructure and harmonized data of more than 250,000 children and parents An inventory of Canadian pregnancy and birth cohort studies: research in progress European birth cohorts for environmental health research Availability of COLLECT, a database for pregnancy and placental research studies worldwide Fostering population-based cohort data discovery: The Maelstrom Research cataloguing toolkit Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination Maelstrom Research guidelines for rigorous retrospective data harmonization DataSHIELD: taking the analysis to the data, not the data to the analysis HeLTI Consortium. Healthy Life Trajectories Initiative (HeLTI) The potential for fetal alcohol spectrum disorder prevention of a harmonized approach to data collection about alcohol use in pregnancy cohort studies Maelstrom Research team. ReACH-Research Advancement through Cohort Cataloguing and Harmonization The authors acknowledge the contribution and support of all participating studies as well as of the Maelstrom Research staff, particularly Guillaume Fabre, Karla Ordonez, Sara Samoely Lala, and Audrey Bégin Poissant. Data from the ReACH participating studies are not publicly available. Access to data needs to be requested from individual studies. None declared.