key: cord-1031794-fqwjadan authors: Schwab-Reese, Laura M.; Hovdestad, Wendy; Tonmyr, Lil; Fluke, John title: The potential use of social media and other internet-related data and communications for child maltreatment surveillance and epidemiological research: Scoping review and recommendations() date: 2018-02-01 journal: Child Abuse Negl DOI: 10.1016/j.chiabu.2018.01.014 sha: 0238a1b19642b5488f67b429cb8368f332efc1e5 doc_id: 1031794 cord_uid: fqwjadan Collecting child maltreatment data is a complicated undertaking for many reasons. As a result, there is an interest by child maltreatment researchers to develop methodologies that allow for the triangulation of data sources. To better understand how social media and internet-based technologies could contribute to these approaches, we conducted a scoping review to provide an overview of social media and internet-based methodologies for health research, to report results of evaluation and validation research on these methods, and to highlight studies with potential relevance to child maltreatment research and surveillance. Many approaches were identified in the broad health literature; however, there has been limited application of these approaches to child maltreatment. The most common use was recruiting participants or engaging existing participants using online methods. From the broad health literature, social media and internet-based approaches to surveillance and epidemiologic research appear promising. Many of the approaches are relatively low cost and easy to implement without extensive infrastructure, but there are also a range of limitations for each method. Several methods have a mixed record of validation and sources of error in estimation are not yet understood or predictable. In addition to the problems relevant to other health outcomes, child maltreatment researchers face additional challenges, including the complex ethical issues associated with both internet-based and child maltreatment research. If these issues are adequately addressed, social media and internet-based technologies may be a promising approach to reducing some of the limitations in existing child maltreatment data. literature (Levac, Colquhoun, & O'Brien, 2010) . This type of review is ideal for topics with emerging evidence, where it would be difficult to complete a systematic review or meta-analysis (Levac et al., 2010) . We searched PsychInfo, PubMed, ScienceDirect, and Academic Search Elite using two search frameworks. The first framework was broadly focused on how social media and other forms of internet-based technology were used for health surveillance, which also included some broader epidemiologic research. The second framework focused on the use of social media for child maltreatment research. For each framework, one author reviewed the title and, if available, the abstract of all articles found through the search framework. Articles were considered possibly relevant if social media or internet-based approaches to health research or surveillance were discussed in the abstract. Articles with possibly relevant content were downloaded and the full text of the article was reviewed. For this review, social media was conceptualized using the Bright, Margetts, Hale, and Yasseri (2014) definition as "a means of communication, based around a website or internet service, where the content being communicated is produced by the people using the service." The search terms for the general health framework included, "social media" OR "social media surveillance" OR crowdsourcing OR crowdsource OR "internet surveillance" OR "online surveillance" OR Facebook OR Twitter OR Google OR Tumblr OR YikYak OR Instagram OR Youtube OR apps OR "mobile app" AND "public health surveillance" OR "bio-surveillance" OR "health surveillance". Articles were included in this portion of the review if data collection was conducted using social media or an internet-based technology and the study focused on a health-related issue. Commentaries on the use of social media, articles in languages other than English, articles without relevance to human disease or disability (i.e., plant/animal disease), and technical reports without application to human research were excluded. In total, 2134 possibly relevant articles were identified through this process (Fig. 1 ). Of these, 147 relevant articles were included in this review. Articles were most commonly excluded because they were commentaries or they focused on the computer science aspects of technology. The search terms for the child maltreatment portion of the search included all the technology search terms, but included "child abuse OR "child neglect" OR "child maltreatment", instead of the surveillance search terms. Articles were included in this portion of the review if data collection was conducted using social media or an internet-based technology and the study focused on a child maltreatment-related issue. As with the first review, commentaries, articles without human relevance, and technical reports were excluded. A total of 740 articles were considered for this review (Fig. 1 ). Of these, 12 articles were found to be related to social media or internet-based surveillance or research. Articles were commonly excluded because they were not directly relevant to child maltreatment (i.e., internet-based pedophilia; peer-to-peer harassment) or did not use social media and/or internet-based approaches for data collection purposes (i.e., only online dissemination of findings; participant self-reported use of social media). The initial searches for this review were completed in March of 2016. The child maltreatment-related search was repeated in August of 2017 with no additional articles found. The general health research search was not repeated because the March 2016 search results provided a comprehensive overview of methods for conducting this type of research and it was unlikely there were significant advances in these methods. Most of the studies and methods found through this scoping review were focused on physical health, which may be least applicable to child maltreatment research and surveillance. In the interest of brevity, we have provided an overview of the approaches, strengths, and weaknesses in Table 1 with references throughout the text. For mental health and health behavior related studies, we have provided some additional detail on the topic and approach in the text. Active data collection methods include direct interaction with research participants. Although contact with participants is facilitated through the internet, the process of designing the measurement tool, collecting data through interaction with participants, and analyzing the data are similar to traditional data collection methods. Crowdsourcing is the process of obtaining services, ideas, or content from a large, undefined group of volunteers or part-time workers through a flexible open call. The volunteers and part-time workers have varying degrees of experience, knowledge, and skills. Most often, researchers use crowdsourcing to complete large, monotonous tasks or recruit large numbers of survey participants. Amazon Mechanical Turk, Google Consumer Surveys, and proprietary systems or websites are the most common methods of crowdsourcing. Amazon Mechanical Turk is an online multi-use crowdsourcing platform hosted through Amazon where users are recruited to answer surveys or complete repetitious tasks for a small amount of money. Google Consumer Surveys is another crowdsourcing platform used to recruit a large number of participants for short surveys (< 10 questions) (Sell, Goldberg, & Conron, 2015) . Samples may be constructed to be nationally-representative based on age, gender, and geographic distribution of respondents. However, participants tend to be younger and more technologically savvy than the general population. Participants who complete surveys receive micropayments ($1 or less) in the form of Google Play Store credit. Crowdsourcing tends to be cost-effective, facilitate easy recruitment, and allow for access to a geographically diverse sample. However, there may be bias in the results of studies using Amazon Mechanical Turk and Google Consumer Surveys related to varying access to the internet within a population so older adults, individuals with lower socioeconomic status, and others may be underrepresented (Bethlehem, 2010) . Crowdsourcing has been used to collect information on a range of physical health issues (Adler, Eames, Funk, & Edmunds, 2014; Alqahtani et al., 2014; Camacho, Eames, Adler, Funk, & Edmunds, 2013; Candido Dos Reis et al., 2015; Chunara, Chhaya et al., 2012; Harber & Leroy, 2015; Ilakkuvan et al., 2014; Kim, Lieberman, & Dench, 2015; Lwin et al., 2015; Mandl et al., 2014; Norr, Albanese, Oglesby, Allan, & Schmidt, 2015; Nyman & Biener, 2016; Paolotti et al., 2014; Smolinski et al., 2015; Zhang, Ho, Fang, Lu, & Ho, 2014) . Crowd sourcing has also been used to study several health beahviors, including pedestrian behavior (Hipp, Adlakha, Eyler, Chang, & Pless, 2013) ; public awareness and knowledge about ovarian cancer (Carter, DiFeo, Bogie, Zhang, & Sun, 2014) ; and the cost of diverted prescription opioid analgesics (Dasgupta et al., 2013) . Numerous studies have used websites and social media to recruit participants (Adler et al., 2014; Altshuler, Gerns Storey, & Prager, 2015; Barratt et al., 2015; Bauermeister et al., 2012; Ben-Ezra et al., 2013; Camacho et al., 2013; Chaulk & Jones, 2011; Dal Moro, 2013; Harris et al., 2014; Hernandez-Romieu et al., 2014; Janiec, Zielicka-Hardy, Polkowska, Rogalska, & Sadkowska-Todys, 2012; Jones, Saksvig, Grieser, & Young, 2012; Klein, Thomas, & Sutter, 2007; Moreno, Grant, Kacvinsky, Egan, & Fleming, 2012; Schumacher et al., 2014; Stein et al., 2014; Sueki, 2015; Thomas, Heysell, Houpt, Moore, & Keller, 2014; Turbow, Kent, & Jiang, 2008 ; van Genderen, Slobbe, Koene, Mastenbroek, & Overbosch, 2013; Zhang, Bi, Hiller, & Lv, 2008; Zheluk, Quinn, & Meylakhs, 2014) . Online recruitment is a convenient method of reaching samples for rare outcomes (Schumacher et al., 2014) or hidden or difficult to reach populations (Barratt et al., 2015; Hernandez-Romieu et al., 2014) . However, online recruitment may introduce bias in the sample due to disparities in technology use among minority race/ethnic groups, low socioeconomic status groups, and older adults (Bauermeister et al., 2012) . Passive data collection methods are more similar to secondary data analysis methods than to traditional data collection. In these methods, data are created for purposes other than research. Through a variety of methods, researchers gather the existing data, manipulate it into analyzable form, and analyze it. As these data may include millions of records, novel analytic techniques and software programs, often known as big data analytics, have been developed to handle these large sets. Internet search query analysis, specifically Google Search Trends, is one of the most common forms of online public health surveillance. Google Search Trends, a regularly updated database of aggregated search queries, provides the relative search volume of terms selected by the researcher. The relative search volume scale normalizes queries to a scale of 1-100 where 100 is the highest search proportion and 1 the lowest search proportion. In one of the earliest studies of internet search trends, Yahoo! and Google searches for cancer related terms in the US were significantly associated with estimated incidence and mortality rate of cancer (Cooper, Mallon, Leadbetter, Pollack, & Peipins, 2005) . This study and another also found significant associations between cancerrelated internet search and the volume of related news coverage, which may suggest at least some of the correlation is due to media coverage prompting online searches (Cooper et al., 2005; Fazeli Dehkordy, Carlos, Hall, & Dalton, 2014) . For example, searches for multiple sclerosis in Italy are highest in the geographical areas with the highest rates of multiple sclerosis but may also be attributed to increased media reports in the region (Brigo, Tezzon, Lochner, & Nardone, 2014) . A review from 2013 found internet search query surveillance often had comparable findings to traditional surveillance methods (Bernardo et al., 2013) . However, false positive and false negative results were a common problem in many of the reviewed studies. The review authors concluded internet search query data should be used to support, rather than replace, traditional surveillance methods. Internet search query analysis has been used to study a range of mental health issues and health behaviors, including correlations between mental health-related searches and times of economic stressors (Ayers et al., 2012) ; seasonality of depressive symptoms (Ayers, Althouse, Allem, Rosenquist, & Ford, 2013) ; suicide (Bruckner, McClure, & Kim, 2014; Page, Chang, & Gunnell, 2011) ; use of tobacco, e-cigarettes, and vaping (Ayers, Ribisl, & Brownstein, 2011 , Ayers, Althouse, Ribisl, & Emery, 2014 Ayers et al., 2016; Cavazos-Rehg et al., 2015) ; vaccine use (Barak-Corren & Reis, 2015); provider prescribing behaviors (Simmering, Polgreen, & Polgreen, 2014) ; bath salts use (Yin & Ho, 2012) ; prenatal care (D'Ambrosio et al., 2015); and krokodil use (an extremely dangerous street drug) (Zheluk et al., 2014) . Internet search query analysis has also been used to predict or estimate many physical health conditions (Althouse, Yih Yng, & Cummings, 2011; Carneiro & Mylonakis, 2009; Chan, Sahai, Conrad, & Brownstein, 2011; Cho et al., 2013; Cook, Conrad, Fowlkes, & Mohebbi, 2011; Cooper et al., 2005; Desai et al., 2012; Dugas et al., 2012 , Dugas et al., 2013 Fazeli Dehkordy et al., 2014; Gluskin, Johansson, Santillana, & Brownstein, 2014; Martin, Xu, & Yasui, 2014; Min, Haojie, Jianfeng, Rutherford, & Fen, 2013; Ocampo, Chunara, & Brownstein, 2013; Ortiz et al., 2011; Patwardhan, Bilkovski, & Goldstein, 2012; Samaras, García-Barriocanal, & Sicilia, 2012; Timpka et al., 2014; Wilson & Brownstein, 2009; Zheluk, Quinn, Hercz, & Gillespie, 2013; Zhou et al., 2013 , Zhou, Ye, & Feng, 2011 . Wikipedia, an online encyclopedia where the online community creates, edits, and modifies articles, has also been used for a modified internet search query analysis. Across several influenza seasons, the number of page views associated with influenza was significantly associated with CDC influenza diagnosis reports (McIver & Brownstein, 2014) . Despite the large number of studies supporting the use of internet search trends to predict infectious disease, search term algorithms may not perform well in the long-term. Google Flu Trend failed to predict the A/H1N1 pandemic in 2009 and greatly overestimated the A/H3N2 epidemic in 2012/2013, which may suggest changes in internet search behavior, geographical heterogeneity, and differences in age-distribution of the epidemic significantly influence the predictive power of the algorithms (Olson et al., 2013) . Online media reports may also be used for public health surveillance. Individual researchers may monitor online reports or researchers may use databases that are automatically created or curated by other researchers. Examples of automated databases include HealthMap (Barboza et al., 2014; Brownstein et al., 2010; Chanlekha & Collier, 2010; Collier, 2010 Collier, , 2012 Freifeld, Mandl, Reis, & Brownstein, 2008; Lyon, Nunn, Grossel, & Burgman, 2012) ; GENI-DB (Collier & Doan, 2012) ; Project Argus (Nelson, Li, Reilly, Hardin, & Hartley, 2012; Torii et al., 2011) ; ProMed-mail (Zhang, Dang, Chen, Thurmond, & Larson, 2009 ); MiTAP (Zhang et al., 2009) ; and BioCaster (Lyon et al., 2012) . In addition to existing databases, researchers may use text mining (Collier, 2011) or human analysts to identify media reports (Collier, 2011; Nerlich & Koteyko, 2012) . Text mining of media reports may reduce the resources required for identification of articles but may not perform as well as human analysts (Collier, 2011) . Human analysts have also been used to identify newspaper and website articles associated with drowning or near-drowning (Ferretti, De Angelis, Donati, & Torre, 2014; Zhu, Jiang, Li, Li, & Chen, 2015) and sudden cardiac death in athletes (Choi, Pan, Pock, & Chang, 2013) . Several other possible passive data collection methods have been examined in the literature, including internet death notices as a source of mortality surveillance data (Boak, M'Ikanatha, Day, & Harrison, 2008) , forum postings (Kate, Negi, & Kalagnanam, 2014; Weitzman, Adida, Kelemen, & Mandl, 2011) and restaurant reviews and reservations (Harrison et al., 2014; Nsoesie, Kluberg, & Brownstein, 2014; Nsoesie, Buckeridge, & Brownstein, 2014) 3.3. Active or passive data collection methods Several data collection methods may be used for active, passive, or combined types of data collection. Twitter is a microblogging site where users post messages (tweets) that are 140 characters or less. Users may "follow" other users to see their posts on the front page. Users may also respond to tweets posted by other users, share (retweet) messages posted by other users, or approve (like) other posts. Users may use hashtags to categorize their messages. When a large number of users are posting with a hashtag, the hashtag is listed on the Twitter front page and is said to be "trending." There are multiple methods of conducting research with Twitter, including both active and passive data collection methods. As a passive data collection method, researchers may download existing information from Twitter. Alternatively, researchers may use Twitter as a platform for reaching participants. Regardless of the collection method, there are several ways Twitter data may be used for research and surveillance. The content of individual tweets may be examined to determine how people are discussing topics and changes in trending hashtags may be examined to determine how discussions around topics change over time. Social network analysis, examination of the relationships between users, may also be conducted to determine how information moves through networks. These analytic methods can be implemented by automated algorithms (Cao et al., 2015; Denecke et al., 2013; Odlum & Yoon, 2015; Paul & Dredze, 2014; Prieto, Matos, Alvarez, Cacheda, & Oliveira, 2014; Yom-Tov, Borsa, Cox, & McKendry, 2014) or by human analysts. These research methods are complicated by the microblogging (140-character limit) aspect of Twitter because the syntax and spelling are often altered to fit within the limit. Since these changes to language vary across users, it may be difficult for researchers to accurately categorize or recognize health information. However, Twitter recently doubled the character limit in many languages so issues with syntax and spelling may also change (Castillo, 2017) . Twitter data may also be difficult to work with because they are unstructured and created at a very rapid rate. To address some of these issues, MappyHealth was created to simplify data management and analysis (Boicey, 2013) . At the time of writing, it appears MappyHealth has been discontinued as the website and Twitter account are no longer active. Twitter has been used to examine a range of health behaviors, including e-cigarettes and tobacco use (Aphinyanaphongs, Lulejian, Brown, Bonneau, & Krebs, 2016; Jo, Kornfield, Kim, Emery, & Ribisl, 2015; Myslín, Zhu, Chapman, & Conway, 2013; Sofean & Smith, 2013; Step, Bracken, Trapl, & Flocke, 2016); drug use and abuse (Cavazos-Rehg, Krauss, Grucza, & Bierut, 2014; Daniulaityte et al., 2015; Hanson et al., 2013; Katsuki, Mackey, & Cuomo, 2015) ; suicide (O'Dea et al., 2015) ; and HPV vaccination (Zhou et al., 2015) . It has also been used to study a range of physical health outcomes (Aslam et al., 2014; Bosley et al., 2013; Broniatowski, Paul, & Dredze, 2013; Chew & Eysenbach, 2010; Chorianopoulos & Talvis, 2015; Collier, Son, & Nguyen, 2011; Fung et al., 2013; Gesualdo et al., 2013; Heaivilin, Gerbert, Page, & Gibbs, 2011; Jain & Kumar, 2015; Nagel et al., 2013; Signorini, Segre, & Polgreen, 2011; Velardi, Stilo, Tozzi, & Gesualdo, 2014) . It was rare in these studies to compare Twitter data to other sources of surveillance data so the validity of inferences based on Twitter data was not clear. However, three studies found the characteristics or number of tweets on a specific subject were associated with at least one validated measure of the subject (Ireland, Schwartz, Chen, Ungar, & Albarracin, 2015; Jashinsky et al., 2014; Widener & Li, 2014) . Facebook is an online social networking site where users create a profile, add other users to their network, send messages to their network connections, and post messages to their profiles. Users may also join common-interest groups and communicate with businesses. Similar to Twitter, Facebook may be used for active or passive data collection. It has frequently been used to recruit participants (Altshuler et al., 2015; Barratt et al., 2015; Bauermeister et al., 2012; Ben-Ezra et al., 2013; Hernandez-Romieu et al., 2014; Schumacher et al., 2014; Stein et al., 2014; Thomas et al., 2014; van Genderen et al., 2013) , but the information created by users has also been used for research. Facebook users have the option of listing interests, such as movies, books, sports teams, or activities, on their profile. In one study, obesity prevalence in communities was predicted using Facebook interests (Chunara, Bouton, Ayers, & Brownstein, 2013) . Geographical areas where a higher proportion of users endorsed activity-related interests, such as health and wellness or outdoor fitness activities, and a lower proportion of users endorsed interests in sedentary behaviors, particularly television watching, tended to have lower rates of obesity. Facebook "likes", users' expressions of interest or approval of posts, may also be used to predict health behaviors. The proportion of users by zip code who "like" certain categories of information is available through the advertising program interface. These data were significantly associated with life expectancy and many health conditions reported in the Behavioral Risk Factor Surveillance System (Gittelman et al., 2015) . The content of Facebook groups may also be used to assess individual health behavior. Posts, comments, and photos in Facebook profiles and groups have been used to assess social support seeking among caregivers of children with autism spectrum disorder (Mohd Roffeei, Abdullah, & Basar, 2015), depressive symptoms (Moreno et al., 2011) , alcohol use (Ridout, Campbell, & Ellis, 2012) and. One study of nine public antifluoridation groups examined the connectedness between these groups and the extent to which individuals in these groups shared or endorsed posts (Seymour, Getman, Saraf, Zhang, & Kalenderian, 2015) . Some research suggests combining multiple approaches may reduce the limitations associated with each of the single approaches. For example, media reports of outbreaks or unusual events may drive increases in social media activity as awareness increases in the population so studies may combine the two approaches to evaluate the potential effects of traditional media (Chunara, Andrews, & Brownstein, 2012) . One evaluation found that models predicting influenza rates from internet-based surveillance were most accurate when they accounted for newspaper and television reports (de Lange et al., 2013) . Another evaluation found the number of social media messages was more strongly correlated with the number of online news articles than the number of reported measles cases, which supports social media as a measure of public opinion, rather than disease detection (Mollema et al., 2015) . The limitations associated with social media and internet-based approaches were also demonstrated during the Ebola outbreak in the US. Social media posts and internet searches increased due to the public panic surrounding the outbreak, rather than due to a high incidence of the disease (Towers et al., 2015) . Despite these limitations, social media may provide important context not available in traditional forms of surveillance and combined approaches may reduce some of the limitations associated with each approach (Powell et al., 2016) . Models including Twitter data, Google search trends, and environmental sensors were able to accurately predict, in nearly real-time, asthma-related emergency department visits in the US with 70% accuracy (Ram, Zhang, Williams, & Pengetnze, 2015) . Another study found models based on Twitter data, Google search trends, and a crowdsourced survey were able to create robust weekly influenza predictions (Santillana et al., 2015) . A third study found the combination of Twitter and internet search data was able to estimate the influenza rate with a correlation of 0.72 to officially reported influenza diagnoses (Santos & Matos, 2014) . There has been limited application of social media and internet-based approaches to child maltreatment surveillance ( Table 2 ). The most common use found in the literature was recruiting participants or engaging existing participants through a variety of online methods. One study recruited participants through listservs, websites, groups, organizations, and clubs targeting lesbian, gay, and bisexual populations to retrospectively assess their exposure to child maltreatment (Balsam, Lehavot, Beadnell, & Circo, 2010) . Another study tracked existing participants in a study of child maltreatment through social media (Nwadiuki, Isbell, Zoloto, & Kotch, 2011) . Other studies have recruited adults to report their experiences or their children's experience of childhood abuse (Brian, • Online questionnaires and review of social network profiles • Content coding of publicly available profile pages study surveyed professionals at child protection agencies to establish the rates of reported sexual maltreatment (Maier, Mohler-Kuo, Landolt, Schnyder, & Jud, 2013) . Media reports have also been used to research aspects of child maltreatment, but similar to other health topics, there may be some bias in media reports. Researchers have found that media reports tend to disproportionately focus on physical and sexual abuse with limited reporting on emotional abuse and neglect (Lonne & Gillespie, 2014) and often present a simplified version of the events (Walklate & Petrie, 2013) . Another study used media reports to examine perpetrator, victim, and context variations in sentences for fatal child abuse (Nambu, Nasu, Nishimura, Nishimura, & Fujiwara, 2011) . There have been two other novel examples of social media and internet-based assessment of child maltreatment characteristics. One novel assessment of child maltreatment was an evaluation of caregiver blogs in caregiver-fabricated child illness (Brown, Gonzalez, Wiester, Kelley, & Feldman, 2014) . Researchers found that blogs created by these caregivers tended to distort and exaggerate the medical information shared by the doctors. There were also visually graphic images of the children and frequent discussions of fundraising and charity. Although the study was limited in size, the findings suggested physicians and child protective service providers may be able to evaluate caregiver blogs for these patterns. Another study examined the profiles of youth with and without substantiated maltreatment reports to determine if there were differences in risky online behaviors and found maltreated youth tended to engage in more provocative self-presentation (Noll, Shenk, Barnes, & Haralson, 2013) . In sum, online recruitment or follow-up of participants was the most common technology-based approached to child maltreatment research. Six recent studies have used online methods to recruit participants to self-report past childhood maltreatment experiences or the more recent experiences of their children or of children with whom they have had professional contact. Child maltreatment-related internet media reports have also been examined, but there were clear biases in reporting that suggest the use of media for surveillance would be problematic. Two innovative studies about child maltreatment using social media addressed very narrow questions about the risk behaviors of a subset of children who have been maltreated and about detecting risk for child maltreatment by examining the social media of caregivers of seriously ill children. At the time of writing, there was very little peerreviewed published work related to social media and other internet-based approaches in the study of child maltreatment. From the broad technology and health literature, social media and internet-based approaches to surveillance and epidemiologic research appear promising. Several strengths were identified in the reviewed literature. First, many of the approaches are relatively low cost and easy to implement without extensive infrastructure. This is particularly true for well-established social media environments, such as Twitter, where there are existing tools for accessing and analyzing data. Second, social media may present an opportunity to reach communities or populations that would be otherwise difficult to reach through traditional approaches. Many researchers promote this claim in their work, and a small number of researchers across disciplines have conducted parallel studies using traditionally-created and online samples to compare the findings (Rindfuss, Choe, Tsuya, Bumpass, & Tamaki, 2015; Simon Rosser, Oakes, Bockting, & Miner, 2007; Temple & Brown, 2011) . Although all authors agree there are some differences in the findings depending on the recruitment methods, there is disagreement about which approach best represents the underlying population. Finally, many of the technology-based approaches allow for continuous data collection in real-time or nearly real-time, which may facilitate the identification of trends or evaluation of national or community interventions. A range of limitations for each method were also identified in the literature. Across methods, there was a mixed record of validation and sources of error in estimation were not yet understood or predictable. Studies of internet search query trends suggested that methods operated well for a time but often failed extended tests of validation. Crowdsourcing methods, which may be another source of continuous data, appeared to suffer from substantial attrition of users over time. Media reporting seemed to incur observer effects that may have created short term issues with estimates. Media reports also caused issues for other types of online surveillance because increased public awareness may contribute to inflated estimates. Other concerns included changes over time in the technology or the way the technology is used. Although not explicitly raised in the literature captured in this review, misrepresentation on social media or other internet-based methods may also result in issues with the data (Whitehead, 2007) . Overall, validation of social media and internet-based methods is a challenge across all methods. To some extent, these issues may be of little concern if social media methods are combined and correlated with other established methods. That is, so long as these new methods are part of a triangulation approach, rather than replacing existing approaches, they may have much to offer. The use of social media methods for child maltreatment surveillance and research is promising, but there has been limited implementation, relative to other health outcomes. Based on the methods, strengths, and limitations identified in the reviewed literature, we propose several considerations for future research focused on child maltreatment. First, the strengths of social media and internet-based technologies may be leveraged to improve child maltreatment surveillance. The capacity to introduce these techniques anonymously into a range of social and professional environments (schools, community agencies, etc.) makes them potentially ideal for studying populations that are difficult to reach through traditional methods. Further, the ability to have access to these populations on a continuous basis creates the opportunity for long term monitoring, which may facilitate the identification of trends or evaluation of national or community interventions. Validation of the epidemiological use of child maltreatment data collected via social media or other internet-related technologies poses challenges. In addition to the problems relevant to the application of these approaches to other health outcomes, child maltreatment researchers face additional challenges. Unlike many other health issues addressed in this review, complete data about child maltreatment are rarely available. For example, influenza researchers in the United States can validate their technology-based research through FluView, a weekly influenza surveillance report prepared by the Centers for Disease Control and Prevention (2017). FluView provides weekly information on influenza virus type reported by public health laboratories, mortality information, proportion of outpatient visits for influenza-like illness, and information about the geographic spread of influenza (Centers for Disease Control & Prevention, 2017) . This detailed, timely information allows for reliable assessment of the underlying influenza trends and gold standard data for validity assessment. In contrast, it is well known that child maltreatment reported to authorities is not indicative of the true prevalence and other sources of data using traditional methods are rarely repeated with sufficient frequency. Further, the use of official statistics based on reported maltreatment and other existing maltreatment data to evaluate the accuracy of estimates drawn from social media is not straightforward. On the other hand, the ability to create consistently available, timely information about incidence may be the key promise of social media and internet-based methodologies, assuming issues of validity can be addressed. Another critical consideration is the ethical issues of child maltreatment research, which are further complicated by the ethical challenges associated with social media and internet-based research. The Association of Internet Researchers suggests researchers engaging in internet-based research consider several ethical questions before beginning a study (Buchanan, 2004) . • How are the researchers accessing the participant/data and what expectations are established by the method (e.g., social media site, blog, forum)? • Who is creating the data and what vulnerabilities may exist that create an obligation for the researcher to protect the data? • Should the researcher obtain informed consent and how should it be collected? • How will the data be used and could these uses create new or additional risks for participants? In addition to these basic guidelines, a recent update to the recommendations from the Association of Internet Researchers suggests researchers consider several additional questions (Markham & Buchanan, 2012 ). • What is the primary purpose of the study? • How are data managed, stored, and analyzed during the study? • How are findings presented? • Who might be harmed or benefit from this study? In reviewing and considering these questions, the Association of Internet Researchers suggest researchers broadly consider several ethical considerations that are important in all research but may be particularly challenging in technology-based studies (Markham & Buchanan, 2012) . The first challenge surrounds the definition of human subjects. Regulatory bodies have often used interaction with human subjects as an indicator of need for institutional ethical review (Markham & Buchanan, 2012) . Since many technology-based studies, particularly passive data collection studies, do not directly engage with human subjects, these studies have been carried out without ethical review. It may be prudent, however, to engage with the institutional review board, an ethical regulatory body, or an external advisory committee to critically process the potential harms, vulnerabilities, benefits, and so forth, even if the research does not directly engage with human subjects and thus, does not explicitly require intuitional ethics review. In addition, it may be necessary to reevaluate what constitutes engaging with a human subject (Markham & Buchanan, 2012) . Direct connection with individual-level data has historically been considered engagement with a human subject, while studies with aggregated or deidentified data were not. However, recent research suggests deidentified datasets often contain sufficient personal information to potentially identify individuals (Markham & Buchanan, 2012) . As a result, it may be necessary to develop an ethical review process, with or without institutional oversight, for research that involves person-based data without direct human contact. These ethical review processes should focus on reducing potential harms and vulnerabilities while balancing the benefits of the research. The second challenge relates to definitions of public and private space (Markham & Buchanan, 2012) . These expectations are often ambiguous and frequently change. As such, it is important to critically examine the potential harms and vulnerabilities associated with the context and data used in each study. Researchers must carefully consider the expectations of the individuals who are creating the data. In active data collection methods, the researcher has contact with the participants and may follow the standard informed consent process so individuals expect their data are being used for research. However, in passive data collection methods, the individual may have no knowledge that their data are being used for research. The researcher must consider if the individuals likely considered their data to be public or private. The creator of a public blog or social media profile which is viewable by anyone may have a lower expectation of privacy than in a private forum that requires log-in by members. Researchers must also consider the level of risk to participants and the potential benefits of the research to participants and other individuals. As child maltreatment is a particularly sensitive topic, it is likely that many participants would be embarrassed, hurt, or angry if their expectation of privacy did not match with that of the researcher. In addition, if the researcher has access to identifiable information and learns of child maltreatment, there may be an institutional requirement to report to child protection authorities, which has the potential to impact data collection. The researcher should also consider the expectations around research within the participants' culture. Although it is now relatively simple to conduct online research in communities across the world, it is necessary to understand the cultural norms around research and privacy within the participants' culture, as they may be different from those of the researcher. In some instances, communities have come together to define how non-community members should engage with them around research. In Canada, a First Nations Steering Committee representing diverse communities participating in a longitudinal health survey developed the OCAP principles (Ownership, Control, Access, Possession). OCAP is "a set of standards that establish how First Nations data should be collected, protected, used, or shared" (First Nations Information Governance Centre, 2017). Even when this type of guidance is available, it may be challenging for researchers from outside the community to fully understand the cultural expectations. When this type of explicit guidance is not available, it may be even more challenging for researchers to understand cultural norms so careful attention must be given to understanding and following community expectations. The final ethical challenge relates to how the field will chose to make ethical decisions around social media and internet-based technology research (Markham & Buchanan, 2012) . In some aspects of research, there are tensions between regulation-driven and context-specific ethical decision making. Regulations are often intended to encourage ethical research and practice, but when applied universally without consideration, regulations may inadvertently restrict important, necessary research. In child maltreatment research, ethics regulations are rarely based on empirically derived information regarding the degree to which ethical concerns emerge (International Society for the Prevention of Child Abuse & Neglect, 2015). As technology-based research moves forward, it will be important to establish firm ethical boundaries for some clearly defined issues, while encouraging flexibility and situation-based ethical decision-making for ethically grey areas. Despite these ethical challenges, the complexity of research in child maltreatment necessitates the development and validation of novel data collection approaches. Although social media and internet-based data collection should not, due to the current limitations, replace other traditional forms of data collection, it is possible that these methods may complement existing methodologies to address the current limitations for the field. For example, analysis of child maltreatment-related internet search queries that account for media recognition may be a novel way to collect in-the-moment trends that could be used to inform tailoring and targeting of just-intime interventions. However, these approaches should be carefully validated prior to wide scale implementation and continuously monitored for reliability and validity. Although it would be difficult to validate these approaches for child maltreatment on a largescale due to existing data limitations, it may be possible to begin to validate in specific subpopulations. Official child protection data from a school or other geographically confined area could be paired with self-reported surveys and analysis of youth Facebook or Twitter postings during a specific time. Using this approach would reduce the limitations associated with each type of collection and could create a better source of validation data. Replicating this type of validation in other subpopulations could provide the research support for an eventual wide scale implementation based on small-area data compilation. This review had several limitations. First, the review excluded articles that were not published in English so important work conducted in other languages would not be included in this review. This limitation is common among reviews conducted by researchers from majority native English-speaking countries; yet, future reviews may benefit from including other languages. Second, it was possible to define social media, internet-based technology, and surveillance in various ways. Differing definitions may have resulted in a different literature base to review. Finally, the search framework resulted in the inclusion of a variety of research designs, health outcomes, and technology approaches. This variety prevented quantitative comparisons across studies. However, as previously noted, that type of analysis is outside the scoping review framework, which focused on the extent, range, and nature of research with respect to a focused topic, as well as the identification of gaps in the literature. As also noted, scoping reviews are ideal for topics with emerging evidence, such as social media. Social media and internet-based technologies may be a promising approach to address the existing issues with child maltreatment data collection. However, it is necessary to account for the issues within each type of data collection approach and carefully validate the approach. In addition, researchers should thoughtfully consider the ethical issues associated with both child maltreatment research and internet-based research and take steps to protect participants before conducting future studies. Incidence and risk factors for influenza-like-illness in the UK: Online surveillance using flusurvey Relationship between child abuse exposure and reported contact with child protection organizations: Results from the Canadian community health survey Pilot use of a novel smartphone application to track traveller health behaviour and collect infectious disease data during a mass gathering: Hajj pilgrimage Prediction of dengue incidence using search query surveillance Exploring abortion attitudes of US adolescents and young adults using social media Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot The reliability of tweets as a supplementary method of seasonal influenza surveillance Novel surveillance of psychological distress during the great recession Revisiting the rise of electronic nicotine delivery systems using search query surveillance Seasonality in seeking mental health information on google Digital detection for tobacco control: Online reactions to the 2009 U.S. cigarette excise tax increase Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query Childhood abuse and mental health indicators among ethnically diverse lesbian, gay, and bisexual adults Internet activity as a proxy for vaccination compliance Factors influencing performance of internet-based biosurveillance systems used in epidemic intelligence for early detection of infectious diseases outbreaks Lessons from conducting trans-national internet-mediated participatory research with hidden populations of cannabis cultivators Innovative recruitment using online networks: Lessons learned from an online study of alcohol and other drug use utilizing a web-based, respondent-driven sampling (webRDS) strategy Face it: Collecting mental health and disaster related data using facebook vs. personal interview: The case of the 2011 Fukushima nuclear disaster Scoping review on search queries and social media for disease surveillance: A chronology of innovation Selection bias in web surveys Internet death notices as a novel source of mortality surveillance data Innovations in social media: The MappyHealth experience Decoding twitter: Surveillance and trends for cardiac arrest and resuscitation communication Childhood neglect: Exploring a short questionnaire in Poland and Germany The use of social media for research and analysis: A feasibility study Magic google in my hand, who is the sickest in my land?" infodemiology and epidemiology of multiple sclerosis and lyme disease in Italy National and local influenza surveillance through twitter: An analysis of the 2012-2013 influenza epidemic Caretaker blogs in caregiver fabricated illness in a child: A window on the caretaker's thinking? Information technology and global surveillance of cases of 2009 H1N1 influenza Google searches for suicide and risk of suicide Readings in virtual research ethics: Issues and controversies: IGI global Information sharing and reporting systems in the UK and Ireland: Professional barriers to reporting child maltreatment concerns The sexual maltreatment of students with disabilities in American school settings Estimation of the quality of life effect of seasonal influenza infection in the UK with the internetbased flusurvey cohort: An observational cohort study Crowdsourcing the general public for large scale molecular pathology studies in cancer A scalable framework for spatiotemporal analysis of location-based social media data. Computers, Environment and Urban Systems Google trends: A web-based tool for real-time surveillance of disease outbreaks Crowdsourcing awareness: Exploration of the ovarian cancer knowledge gap through amazon mechanical Turk Twitter expands tweets to 280 characters in most languages Characterizing the followers and tweets of a marijuana-focused twitter handle Monitoring of non-cigarette tobacco use using google trends FluView: A weekly influenza surevillance report prepared by the influenza division Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance A methodology to enhance spatial understanding of disease outbreak events reported in news articles Online obsessive relational intrusion: Further concerns about facebook The child abuse prevention and treatment act including adoption opportunities & the abandoned infants assistance act Correlation between national influenza surveillance data and google trends in South Korea Active surveillance of sudden cardiac death in young athletes by periodic internet searches Flutrack.org: Open-source and linked data for epidemiology Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak Assessing the online social environment for surveillance of obesity prevalence Online reporting for malaria surveillance using micromonetary incentives What's unusual in online disease outbreak news Towards cross-lingual alerting for bursty epidemic events Uncovering text mining: A survey of current work on web-based epidemic intelligence GENI-DB: A database of global events for epidemic intelligence Assessing google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic Cancer internet search activity on a major search engine, United States Web-based surveillance of public information needs for informing preconception interventions Online survey on twitter: A urological experience Time for dabs Crowdsourcing black market prices for prescription opioids Comparison of five influenza surveillance systems during the 2009 pandemic and their association with media attention How to exploit twitter for public health monitoring? Norovirus disease surveillance using google internet query share aata Race and child maltreatment reporting: Are blacks overrepresented? Influenza forecasting with google flu trends Google flu trends: Correlation with emergency department influenza rates and crowding metrics Concordance between adolescent reports of childhood abuse and child protective service determinations in an at-risk sample of young adolescents Methodological challenges in measuring child maltreatment Novel data sources for women's health research: Mapping breast screening online information seeking through google trends Fatal and non-fatal unintentional drownings in swimming pools in Italy: Epidemiological data derived from the public press in 2008-2012 HealthMap: Global infectious disease monitoring through automated classification and visualization of internet media reports Chinese social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks. Infectious Diseases of Poverty Influenza-like illness surveillance on twitter through automated learning of naive language A new source of data for public health surveillance: Facebook likes Evaluation of internet-based dengue query data: Google dengue trends Tweaking and tweeting: Exploring twitter for nonmedical use of a psychostimulant drug (Adderall) among college students Assessing work-asthma interaction with amazon mechanical Turk Health department use of social media to identify foodborne illness Using online reviews by restaurant patrons to identify unreported cases of foodborne illness Public health surveillance of dental pain via twitter The comparability of men who have sex with men recruited from venue-time-space sampling and facebook: A cohort study Emerging technologies: Webcams and crowd-sourcing to identify active transportation Cameras for public health surveillance: A methods protocol for crowdsourced annotation of point-of-sale photographs Ethical consideations for the collection, analysis, & publication of child maltreatment data Future-oriented tweets predict lower county-level HIV prevalence in the United States An effective approach to track levels of influenza-A (H1N1) pandemic in india using twitter Did public health travel advice reach EURO 2012 football fans? A social network survey Tracking suicide risk factors through twitter in the US Price-related promotions for tobacco products on twitter Recruiting adolescent girls into a follow-up study: Benefits of using a social networking website Evaluation by moments: Past and future Monitoring food safety violation reports from internet forums Establishing a link between prescription drug abuse and illicit online pharmacies: Analysis of twitter data Crowdsourcing data collection of the retail tobacco environment: Case study comparing data from crowdsourced workers to trained data collectors Self-reported smoking in online surveys: Prevalence estimate validity and item format effects What factors affect the identification and reporting of child abuse-related fractures? Given the increasing bias in random digit dial sampling, could respondentdriven sampling be a practical alternative Commentary-child maltreatment surveillance: Enumeration, monitoring, evaluation and insight. Health Promotion and Chronic Disease Prevention in Canada Teens, social media & technology overview Scoping studies: Advancing the methodology How do Australian print media representations of child abuse and neglect inform the public and system reform?: Stories place undue emphasis on social control measures and too little emphasis on social care responses Baseline evaluation of a participatory mobile health intervention for dengue prevention in Sri Lanka Comparison of web-based biosecurity intelligence systems: BioCaster, EpiSPIDER and HealthMap The tip of the iceberg. Incidence of disclosed cases of child sexual abuse in Switzerland: Results from a nationwide agency survey Participatory surveillance of diabetes device safety: A social media-based complement to traditional FDA reporting Ethical decision-making and internet research: Recommendations from the AoIR ethics working committee Improving google flu trends estimates for the United States through transformation Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time Using google trends for influenza surveillance in South China Seeking social support on facebook for children with autism spectrum disorders (ASDs) Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013 College students' alcohol displays on facebook: Intervention considerations Feeling bad on facebook: Depression disclosures by college students on a social networking site Using twitter to examine smoking behavior and perceptions of emerging tobacco products The complex relationship of realspace events and messages in cyberspace: Case study of influenza and pertussis using tweets Fatal child abuse in Japan: Does a trend exist toward tougher sentencing Event-based internet biosurveillance: relation to epidemiological observation Crying wolf? Biosecurity and metacommunication in the context of the 2009 swine flu pandemic Association of maltreatment with high-risk internet behaviors and offline encounters Anxiety sensitivity and intolerance of uncertainty as potential risk factors for cyberchondria Guess who's not coming to dinner? Evaluating online restaurant reservations for disease surveillance Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports Using social networking sites in subject tracing The new product watch: Successes and challenges of crowdsourcing as a method of surveillance Detecting suicidality on twitter Using search queries for malaria surveillance What can we learn about the ebola outbreak from tweets? Reassessing google flu trends data for detection of seasonal and pandemic influenza: A comparative epidemiological study at three geographic scales Monitoring influenza activity in the United States: A comparison of traditional surveillance systems with google flu trends Surveillance of Australian suicidal behaviour using the internet? Web-based participatory surveillance of infectious diseases: The influenzanet participatory surveillance experience Recruiting young adults to child maltreatment research through facebook: A feasibility study Comparison: Flu prescription sales data from a retail pharmacy in the US with google flu trends and US ILINet (CDC) data as flu activity indicator Discovering health topics in social media using topic models Social media listening for routine post-marketing safety surveillance Predicting asthma-related emergency department visits using big data Off your face(book)': Alcohol in online social identity construction and its relation to problem drinking in university students Do low survey response rates bias results? Evidence from Japan Syndromic surveillance models using web data: The case of scarlet fever in The UK. Informatics for Health & Social Care Combining search, social media, and traditional data sources to improve influenza surveillance Analysing twitter and web queries for flu trend prediction Child sexual abuse and psychological impairment in victims: Results of an online study initiated by victims Social media methods for studying rare diseases Fourth national incidence study of child abuse and neglect The utility of an online convenience panel for reaching rare and dispersed populations When advocacy obscures accuracy online: Digital pandemics of public health misinformation through an antifluoride case study The use of twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic Web search query volume as a measure of pharmaceutical utilization and changes in prescribing patterns Capturing the social demographics of hidden sexual minorities: An internet study of the transgender population in the United States Imprisoned by the past: Unhappy moods lead to a retrospective bias to mind wandering Flu near you: Crowdsourced symptom reporting spanning 2 influenza seasons Sentiment analysis on smoking in social networks Online respondentdriven sampling for studying contact patterns relevant for the spread of close-contact pathogens: A pilot study in Thailand The association of suicide-related twitter use with suicidal behaviour: A cross-sectional study of young internet users in Japan A comparison of internet-based participant recruitment methods: Engaging the hidden population of cannabis users in research Outbreak of pyrazinamide-monoresistant tuberculosis identified using genotype cluster and social media analysis Performance of eHealth data sources in local influenza surveillance: A 5-year open cohort study An exploratory study of a text classification framework for internetbased surveillance of emerging epidemics Mass media and the contagion of fear: The case of ebola in America Web-based investigation of water associated illness in marine bathers Administration for children and families, administration on children youth and families, & children's bureau. Child Maltreatment Keeping venomous snakes in the Netherlands: A harmless hobby or a public health threat? Twitter mining for fine-grained syndromic surveillance Witnessing the pain of suffering: Exploring the relationship between media representations, public understandings and policy responses to filicide-suicide Sharing data for public health research by members of an international online diabetes social network Methodological and ethical issues in internet-mediated research in the field of health: An integrated review of the literature Using geolocated twitter data to monitor the prevalence of healthy and unhealthy food references across the US Intergenerational transmission of child abuse and neglect: Real or detection bias? Early detection of disease outbreaks using the internet Monitoring a toxicological outbreak using internet search query data Detecting disease outbreaks in mass gatherings using internet data Web-based HIV/AIDS behavioral surveillance among men who have sex with men: Potential and challenges Methodology of developing a smartphone application for crisis research and its clinical application Automatic online news monitoring and classification for syndromic surveillance Internet search patterns of human immunodeficiency virus and the digital divide in the Russian Federation: Infoveillance study Internet search and krokodil in the Russian Federation: An infoveillance study Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on twitter Monitoring epidemic alert levels by analyzing internet search volume Tuberculosis surveillance by analyzing google trends Mortality among drowning rescuers in China, 2013: A review of 225 rescue incidents from the press