key: cord-0681647-l58lm1k4 authors: Liu, Tingting; Giorgi, Salvatore; Tao, Xiangyu; Bellew, Douglas; Curtis, Brenda; Ungar, Lyle title: Cross-Platform Difference in Facebook and Text Messages Language Use: Illustrated by Depression Diagnosis date: 2022-02-03 journal: nan DOI: nan sha: 77df86de685681f8c30ab053c9edfb088db82da1 doc_id: 681647 cord_uid: l58lm1k4 How does language differ across one's Facebook status updates vs. one's text messages (SMS)? In this study, we show how Facebook and SMS use differs in psycho-linguistic characteristics and how these differences drive downstream analyses with an illustration of depression diagnosis. We use a sample of consenting participants who shared Facebook status updates, SMS data, and answered a standard psychological depression screener. We quantify domain differences using psychologically driven lexical methods and find that language on Facebook involves more personal concerns, experiences, and content features while the language in SMS contains more informal and style features. Next, we estimate depression from both text domains, using a depression model trained on Facebook data, and find a drop in accuracy when predicting self-reported depression assessments from the SMS-based depression estimates. Finally, we evaluate a simple domain adaption correction based on words driving the cross-platform differences and applied it to the SMS-derived depression estimates, resulting in significant improvement in prediction. Our work shows the Facebook vs. SMS difference in language use and suggests the necessity of cross-domain adaption for text-based predictions. Language, reflecting users' psychology, has been used as an effective tool to understand and predict mental health conditions (i.e., De Choudhury et al. 2013) . While language analyses widely utilize social media platforms, like Facebook (Eichstaedt et al. 2018; Seabrook et al. 2018 ), text messages (SMS) have just recently been demonstrated as a new platform to detect translational linguistic markers for mental health conditions, such as depression and schizophrenia ; Barnett et al. 2018) . When new and different platforms emerge, researchers face the question of whether and how these platforms differ in language use patterns. As demonstrated in the psycho-linguistic research, language use is a social-related behavior, characteristics of which are closely tied to and adjusted based on social contexts and communication channels (Forgas 2012) . The use of Facebook and SMS might be different because people could selectively share content, engage in different social activities, and choose different communication styles on different platforms (Harari et al. 2020) , and SMS contains denser information and is used by broader populations than Facebook does . However, many sentiment analysis studies often focus on the conclusions and model development (see reviews in Guntuku et al. 2017) , and apply models which are pre-trained on one platform (e.g., Facebook) to language from another (e.g., SMS, Liu et al. 2021) , without considering the cross-platform differences. It is important to be aware of the different linguistic characteristics between Facebook language and SMS to properly process and analyze the data, handle the machine learning models, and interpret the results. The present work aims to understand the difference between the Facebook and SMS language from the same useres. In this paper, we first compare Facebook and SMS in language use; then illustrate the change in performance of and correction solutions to depression diagnosis using Facebook-derived models on SMS. Contributions Our contributions are: 1) we provide evidence that Facebook and SMS language are psycholinguistically different for the same users; 2) with a focus on depression prediction, we show that a naive application of Facebook-trained model suffers from accuracy deprecation on SMS due to cross-platform differences in language, not demographics; 3) we derive a domain adaption correction to bridge the linguistic differences between Facebook and SMS, and demonstrate significant improvement in depression-prediction model performance; 4) the implications and generalized impacts of cross-platform language model selection and adaptation are discussed. Preliminary work has shown that linguistic model predictions should be platform-aware. For example, Seabrook et al. (2018) examined the association between depression and emotion word expressions on both Facebook and Twitter, and found different patterns: instability of negative emotion words predict depression on Facebook but not Twitter, but the variability of negative emotion words reflect depression severity on Twitter. Jaidka, Guntuku, and Ungar (2018) showed the difference between Facebook and Twitter in predicting users' traits. By qualitatively comparing the linguistic and demographic features underlies the differences between Facebook and Twitter, they found that users prefer to talk about family, personal concerns, and emotions on Facebook, while more ambitions and needs on Twitter. The variation of language on different platforms may be attributed to users' psychological differences during communication and the anticipated function of each platform. Although no study to our knowledge has compared SMS with Facebook public posts, some work has been done to compare Facebook status updates with direct messages, a private communication form on Facebook that is similar to SMS. Bazarova et al. (2013) observed that sharing positive emotions is associated with self-presentational concerns in Facebook status updates, but not private messages, noting the difference between communication on public and private channels. Bazarova and Choi (2014) further identified self-disclosures in Facebook status updates and private messages are associated with different strategic goals and motivations. Status updates are associated with higher odds of social validation, self-expression, and relief, whereas private messages are related to higher odds of relational development, social maintenance, and information sharing. Our focus is to demonstrate the social-psychological differences between Facebook and SMS in language use, and then provide an example from the language prediction of depression. Participants were recruited between Sept 2020 and July 2021 for a larger national survey focused on COVID-19, mental health, and substance use. Participants were recruited online via the Qualtrics Panel. To qualify, consenting participants must have been 18 years or older, U.S. residence, and Facebook users. Specifically, participants must have posted at least 500 words across their status updates over the lifetime of the account and posted at least 5 posts within the past 180 days, to ensure that they are active users (Eichstaedt et al. 2021 ). 2,796 participants were paid $30 to finish an initial survey, which consisted of multiple items centered on socio-demographics, physical and mental health (including depression), substance use, and COVID-19. This pool of participants has been used to study loneliness and alcohol use (Bragard et al. 2021 ) and COVID-related victimization (Fisher et al. 2021) . After completing this survey, participants were invited to install the open-source mobile sensing application AWARE on their mobile phones (Ferreira, Kostakos, and Dey 2015; Nishiyama et al. 2020 ). This application collects mobile sensor information, such as movement, app usage, and, importantly for the current study, keystroke data. Participants were paid $4 per day to keep the AWARE app running for at most 30 days. A total of 300 participants completed this phase of the study. We note that the mobile sensor data depends on the phone manufacturer (i.e., iPhone vs. Android) and, in particular, keystroke data is only available for Android users. We collected keystroke data from a total of 192 Android users, out of which 123 wrote at least 500 words within the 30day study period 1 . We note that while keystroke data is col-1 Extensive cleaning was automatically applied (i.e., no human in the loop) to the keystroke data in order to remove any sensitive lected across all applications, we only consider the Google, Verizon, and Samsung messaging apps, hereafter referred to as SMS data. Finally, three participants were removed because of their text-based depression estimates (see below) being outliers due to mostly Spanish Facebook status updates. Thus, the final sample consisted of 120 participants who posted at least 500 words across their Facebook status updates, 500 words across their SMS, and answered a standard depression screener (PHQ-9; see below). We employ an off-theshelf text-based depression estimation model, which was trained on Facebook status updates to predict self-reported depression (Schwartz et al. 2014) . This model was built on roughly 28,000 Facebook users who consented to share their Facebook data and answered the depression facet of neuroticism in the "Big 5" personality inventory, a 100-item personality questionnaire (the International Personality Item Pool proxy to the NEO-PI-R (Goldberg et al. 1999) ). This model resulted in prediction accuracy (Pearson r) of 0.386. Please see the original paper for full details. The Patient Health Questionnaire (PHQ-9) is a 9-item questionnaire developed based on DSM-IV criteria, which has been widely used to assess depression in both clinical and non-clinical settings (Kroenke, Spitzer, and Williams 2001) . We utilize this scale to assess the severity of individuals' depression symptoms and as a "gold standard" measure of depression in our participants. In order to assess how Facebook and SMS use differs and how these differences drive downstream analyses, we proceed in three parts: (1) examine how language use (as measured through standard dictionary approaches) differ across each platform, (2) show that depression estimates derived from a model trained in a single domain are less accurate when applied out of domain, and (3) quantify platform differences and use these to correct the out-of-domain depression estimates. Task 1: Cross-platform differences We begin by first tokenizing both the Facebook status updates and SMS data, using a tokenizer designed for social media data (Schwartz et al. 2017) . Given the small sample size of the study (N = 120), we do not have sufficient power to explore crossplatform differences in a large feature space. We, therefore, use the Linguistic Inquiry and Word Count (LIWC) dictionary, which consists of 73 manually curated categories (e.g., both function and content categories such as positive emotions, sadness, and pronouns; Pennebaker et al. 2015) . This dictionary has a rich history in psychological sciences with over 8,800 citations as of April 2020 (Eichstaedt et al. 2021) and can, thus, aid in interpreting cross-platform differences. For each of the 120 participants, we separately extract the 73 LIWC categories for both the Facebook and SMS data. Next, to calculate differences in LIWC usage, we compute a dependent t-test for paired samples (i.e., one sample, repeated measures) for each category and adjust the overall significance threshold using a Benjamini-Hochberg False Discovery Rate (FDR) correction. Task 2: In vs. Out of Domain Depression Estimates Next, for each participant we estimate depression from both Facebook and SMS text using a preexisting text-based depression model described in the previous section. We then correlated the depression estimates with the PHQ-9 depression screener survey responses. To quantify which features drive the depression estimates in both domains we examine feature importance i, which is defined as: Here w f is the weight of the feature f in the depression model, f req * (f ) is the frequency of feature f in either the Facebook (FB) or SMS domain. Here we show that a simple domain adaptation algorithm can be applied to the SMS data in order to increase the predictive accuracy of the depression model. To do this, we multiply each participant's SMS word frequency by a ratio of the global mean Facebook to global mean SMS frequency. We note that, this method is informed by the feature importance measure defined above in Equatin 1: if we apply this correction factor, the importance measure reduces to 0: Since we use a ratio of word frequencies, we only adjust those words which are not rare, since rare word frequencies are noisy. As such, we only adjust words used by at least 5 users in each text data set (i.e., Facebook and SMS). This also stops single users from dominating the frequency measures. Task 1: Cross-platform differences Table 1 shows usage differences between Facebook and SMS. Psycho-linguistic characteristics in forming the within-differences between Facebook and SMS have been compared using paired ttests for each LIWC category extracted from Facebook and SMS. As shown in achievement), what they see, and use certain grammar features (common adjectives, quantifiers) on Facebook. In SMS, people use more differentiation and discrepancy, personal pronouns (I, you), informal language (assent), more auxiliary and common verbs, and more present and future temporal focus. We see that in-domain depression estimates from Facebook data correlate (Pearson r) with PHQ-9 at 0.38, which is an equivalent prediction accuracy to that listed in the original paper from which the depression model was derived. When correlating SMS-based depression estimates (i.e., out-of-domain) with PHQ-9 scores, we see a drop in prediction accuracy (Pearson r = 0.29), showing that the model does not work as well when applied to out-of-domain data. Results are listed in Table 3 . Linguistic features driven discrepancies across Facebook vs. SMS for prediction of depression are shown in Figure 1 , which shows the top-weighted features in driving the differences in positive and negative depression estimates between Facebook and SMS. For example, more features reflecting language style, such as more use of contractions ("i'll", "i'm", "they're", "she's", "haven't") in SMS link to the discrepancies in predicting positive depression estimates; while more features about contents people discussed ("family", "sick", "chicago", "anniversary", "year") on Facebook link to the discrepancies in predicting positive depression estimates. As shown in Figure 1 , we argue that domain-specific adaption correction based on these features are needed. We then performed domain adaption correction on SMS in predicting depression. Results are provided in Table 3 . We could see an improvement in correlation with one's PHQ-9 scores after the model reconstruction, suggesting a necessity of domain adaption correction in cross-platform language analysis. To the best of our knowledge, this is the first-to-date study investigating Facebook vs. SMS language use differences. We show that, for the same users: (1) Facebook and SMS contain different linguistic features, (2) Facebook-derived language model of depression performs weaker on SMS, and (3) corrections based on word use frequencies improve Facebook-derived depression estimates on SMS. We found that the same user uses Facebook and SMS for different purposes. In line with psychology research, Facebook usage links to the need to belong and selfpresentation (Nadkarni and Hofmann 2012) , leading to more contents sharing and opinion expression. While SMS is used for playful forwarding, for phatic communication to maintain than impact social relationships via pointless texts, and for intimate and informal discussions (Fibaek Bertel and Ling 2016). Our findings of LIWC categories in Task 1 and feature importance results in Task 2 also confirm these variances, with more content-wise features from Facebook and style-wise features from SMS. By using data from the same users, we showed that discrepancies in Facebook vs. SMS language and model prediction accuracy are not due to demographic differences but to varied language patterns. Table 3 : Pearson correlations between text-based depression estimates and survey-based PHQ-9 scores, before and after domain adaptation. Depression model was trained on Facebook data, so Domain Adaptation was only applied to the SMS data. Our findings are important for future language analysis research. Facebook and SMS contain significantly different linguistic features reflecting social and psychological attributes. Future studies should explore more downstream applications along this line. Researchers in computational social science should be aware of such differences between Facebook and SMS in model selection and adaption. Domain-specific corrections based on user preference in language are needed for prediction accuracy. Limitation One limitation is the small sample size, which prevents generalization to a broader context. However, all our comparisons are within-person, which promises the power of the analysis. Another limitation comes from the unbalanced word and post counts from two platforms, with Facebook language containing more words and posts in total than SMS texts (due to the data being collected over a longer time span). Instead of matching the samples by word or post counts, we choose to include all possible texts from both platforms to maximize the sample size, noting that a minimum of 500 words per text-domain is needed for inclusion. To ensure our cross-platform differences are not caused by the unbalanced word counts, we further create a subset of Facebook language to match the number of words from both platforms and generate new depression estimates. Depression estimates from newly matched Facebook language correlate with original Facebook language at 0.86 and with PHQ-9 at 0.36, proving that our findings are driven by crossplatform differences per se, not the word count difference. Ethics Statement This study involves human subjects and was approved by an Institutional Review Board (IRB). The methods and types of data used in this study open a number of ethical issues. First, social media, keystroke, and mobile sensing data are highly sensitive and can contain PII. We took extreme care to store, clean, and analyze the data (see Supplemental Materials for exact data cleaning methods). As such, data sharing is unavailable for this study. We also estimate sensitive attributes like depression using social media data, SMS data, and machine learning methods. This can be problematic for many reasons, including biases in training data and misclassifications in downstream tasks which can further marginalize vulnerable populations, among other issues. Due to the sensitive nature of this study, data cannot be shared publicly. The AWARE mobile sensing app logs each non-password keystroke on Android phones across all apps (e.g., text messages and search engine entries). These logs are stored one character at a time and include modifications such as deletions and auto-correct. For example, if a user searched "Talyor Swift" in a search engine, AWARE would log separate database entries for "T", "Ta", "Tal", etc. If the same user misspelled "Talyor" while typing, AWARE would also log the misspelling and the delete key; for example "T", "Ta", "Tai", "Ta" (i.e., a backspace occurred), "Tal", etc. This presents a unique challenge when dealing with possibly sensitive information. While the main goal of cleaning Personal Identifiable Information (PII) is to enable non-trusted sources to access the collected data by removing PII, a secondary goal is to replace the PII with a tag indicating what kind of data has been removed to allow deeper analysis. Basic cleaning of each string was done in several stages. The first was to remove PII data that was structurally identified by the device itself as either a password field or a phone number. The second stage was to use spaCy's Name Entity Recognizer (NER) and to replace all flagged entities with their category label. The third stage was to check against a list of common data formats using regular expressions using a modified version of CommonRegex 1 . We noted that these category labels were ignored by our tokenizer and not used in the downstream analyses in the present study. Cleaning keystroke data which changes 1 character at a time; however, contains an extra challenge over standard complete string cleaning. Detection of partial PII data that doesn't yet match a known form (but will eventually) is required. We accomplished this by rolling future data back through the previous data in two stages. The first stage was that, each time when the completion of a new token at the end of a string was detected, we applied the replacement information, or lack thereof, back through the previous strings until the beginning of that token (there may be incomplete tokens which match NER that were not necessary to replace based on subsequent characters). This allowed us to clean data that might be removed via deletion before the entry is complete. The second stage was once the whole entry was 1 https://github.com/madisonmay/CommonRegex complete, we rolled all of the changed data back through all of the incomplete string items for this entry. This involved overlaying data replacement item information for individual strings that was wholly contained by the completed entry information, or where the replacement data fields only overlap, merging the possible replacement item information together to create a compound tag. This process was executed automatically on the study data, with no human intervention, so as to minimize the risk of leaking sensitive information. Finally, we noted that while we collected full keystroke data, only the final text data which was sent via SMS was analyzed (i.e., no partial text messages are considered). Relapse prediction in schizophrenia through digital phenotyping: a pilot study Self-Disclosure in Social Media: Extending the Functional Approach to Disclosure Motivations and Characteristics on Social Network Sites1 Managing Impressions and Relationships on Facebook: Self-Presentational and Relational Concerns Revealed Through the Analysis of Language Style Loneliness and Daily Alcohol Consumption During the COVID-19 Pandemic Predicting depression via social media Closed-and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations Facebook language predicts depression in medical records AWARE: mobile context instrumentation framework It's just not that exciting anymore": The changing centrality of SMS in the everyday lives of young Danes COVID-Related Victimization, Racial Bias and Employment and Housing Disruption Increase Mental Health Risk Among US Asian, Black and Latinx Adults. Frontiers in Public Health Language and social situations A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models Detecting depression and mental illness on social media: an integrative review Sensing sociability: Individual differences in young adults' conversation, calling, texting, and app use behaviors in daily life Facebook versus Twitter: Differences in self-disclosure and trait prediction The PHQ-9: validity of a brief depression severity measure The relationship between text message sentiment and selfreported depression Why do people use Facebook? iOS crowd-sensing won't hurt a bit!: AWARE Framework and Sustainable Study Guideline for iOS Platform The development and psychometric properties of LIWC2015 Towards assessing changes in degree of depression through facebook DLATK: Differential language analysis ToolKit Predicting depression from language-based emotion dynamics: longitudinal analysis of Facebook and Twitter status updates This study was funded by the Intramural Research Program of the National Institutes of Health (NIH), National Institute on Drug Abuse (NIDA). Dr. Brenda Curtis is the corresponding author of the paper. Dr. Brenda Curtis and Dr. Lyle Ungar share the senior authorship.