key: cord-0913875-qp401ihq authors: Kwak, Eun Jin; Grable, John E. title: Conceptualizing the use of the term financial risk by non-academics and academics using twitter messages and ScienceDirect paper abstracts date: 2021-01-02 journal: Soc Netw Anal Min DOI: 10.1007/s13278-020-00709-9 sha: cf2021c28029f87c459833842bfc81fb2f1074c8 doc_id: 913875 cord_uid: qp401ihq A text mining technique, based on an Application Programming Interface (API) request—using narrative data from Twitter(™) and ScienceDirect(™)—was used to identify how non-academics and academics conceptualize and evaluate sentiment indicators associated with the term financial risk in their communications. It was determined that unlike the day-to-day uses of the term—all of which tend to focus predominately on the business and technology aspects of risk taking—the academic definition of the term is expressed broadly. It was also determined that the term was mainly associated with negative emotions in daily conversations, whereas the term tended to be used in a positive way in research paper abstracts. Results from this study suggest that the way financial risk is conceptualized and applied in real-life settings primarily represents negative emotional contexts, while academic papers tend to represent positive emotional contexts. Information presented in this paper can help educators, researchers, and policy makers better understand the way non-academics objectively and subjectively evaluate and describe financial risk. This information may help lead to better investor educational interventions and decision outcomes. Defining what is meant by-let alone conceptualizing the emotional context (i.e., sentiment) of-the term risk is an elusive task. At the broadest level, risk refers to an event or action that has the potential to be a danger or problem in the future (MacMillan Dictionary, n.d.) . In practice, the term is most often used to describe an action that can result in both or either a positive or negative outcome. 1 Risk is typically categorized as being either objective or subjective in nature. An objective risk is one that can be quantified with known probabilities. Subjective risk is something that is based on individual judgment and belief, making it, essentially, a personal perception. Outside of the academy, the term risk is most often used to describe something that entails a potential (real or imagined) negative outcome. Phrases such as,"The risk is worth the return" and "You only make money if you take a risk" are examples of the ways risk is frequently used in the day-to-day vernacular. As illustrated with these statements, there is an assumed potential negative outcome associated with nearly all risky behaviors. Incurring or taking a risk, when conceptualized negatively, is worthwhile only if the benefits accrued or other outcomes are both positive and in excess of the risk taken. While some risk-taking activities have pre-determined probabilities associated with behavioral engagement (e.g., casino gambling), the majority of risky situations and choices faced by individuals involve subjective assessments of risk and return outcomes. This helps explain why certain words consistently come to mind when describing risk. Words that indicate negative sentiment are often used to describe risk when a potential outcome is perceived as potentially harmful. Examples of negative words include danger, hazard, liability, peril, and jeopardy. When perceived positively (e.g., taking risk results in a gain), it is common to hear risk framed in positive terms. Examples of words that represent positive sentiment include opportunity, security, safety, and fun. Some descriptors of risk are neutral in tone and disguise an individual's perception of a given risk. Words fitting a neutral description include possibility, gamble, luck, speculation, and wager. Negative perceptions about risk, particularly financial risk, can result in skewed decision-making outcomes at the household level and lead to suboptimal goal achievement. Slovic (1986) argued that lay and professional risk judgments are often inaccurate. This does not mean that risk judgments are not important. An inaccurate, yet deeply held, perception can influence the type of decisions someone makes. It is well known that risks related to dramatic and sensational events, such as homicides, mass shootings, cancer, and natural disasters, tend to be overestimated (Lichtenstein et al. 1978) . Terms used to describe common sentiment related to these types of events include dread, fear, anxiousness, and loathing. These terms are also used to describe outcomes that nearly all people strive to avoid. Thus, if individuals perceive risk primarily in negative terms, it is no surprise when these same individuals take steps to avoid situations in which risk is present. The purpose of the study described in this paper is twofold. The first purpose is to identify how non-academics and academics conceptualize and use the term financial risk in their communications. A text mining technique, based on an Application Programming Interface (API) request, was used to determine the degree to which the term financial risk is associated with positive, negative, and neutral perceptions. The second purpose is to compare and contrast sentiment indicators associated with the term financial risk between non-academics and academics. Findings from this study provide a general sense of the way financial risk is conceptualized and applied in real life settings, as well as conceptually in the academic literature. Information presented in this paper can help educators, researchers, and policy makers better understand the way non-academics objectively and subjectively evaluate and describe financial risk. In this regard, the following research questions were addressed in this study. • What type of words are associated with the term financial risk in non-academic and academic conversations and publications? • What type of sentiments are expressed when people use the term financial risk in non-academic and academic conversations and publications? • What are the central concepts and words associated with the term financial risk when categorized as positive, negative, and neutral network sentiments? • Do non-academics and academics utilize similar or different words when referring to financial risk? The remainder of this paper is structured as follows: The next section presents the methodology used to analyze the number, the tone (i.e., sentiment), and the concepts underlying the use of words most widely associated with the term financial risk. This is followed by a description of study results and a discussion of findings with an emphasis on implications for educators, researchers, and policy makers who are interested in risk communication and risk education. Two data sources were used for this study. The first source of data for the analysis were collected from Twitter ™ (www. twitt er.com) in 2019. Twitter is one of the most popular social media websites/mobile applications used by consumers, policy makers, businesses, and decision makers. Twitter is known to be a novel source of data for those studying attitudes, beliefs, and behaviors of consumers and opinion makers (Kang et al. 2017) . Twitter data were analyzed to examine daily public conversations in relation to the term financial risk through users' social interactions and messages. The Twitter platform allows registered users to post and interact with others using messages known as "tweets". According to Statista (2019) , Twitter had 68 million monthly active users in the United States in 2019. Table 1 provides descriptive statistics for general users of Twitter. As shown in Table 1 , in 2019, 38% of Twitter users were between 18 and 29Â years old, followed by the age group between 30 and 49Â years (26%). In 2019, more than half of Twitter users (56%) were male, and 32% of users had a college degree or higher level of education. The second source of data for the analysis were collected from Science Direct ™ (www.scien cedir ect.com) in 2019. ScienceDirect is an open platform providing journal articles and book chapters from more than 2,500 peer-reviewed journals and 11,000 books (Journal of Medical Library Association 2013). ScienceDirect mainly covers publications in the health sciences, life sciences, and social sciences. Science-Direct offers broad access to paper abstracts and full-text papers. Data from ScienceDirect were analyzed to identify words commonly used in association with the term financial risk among those in academia. An Application Programming Interface (API) method was used to collect data from Twitter and ScienceDirect. An API is a communication protocol between clients (developers) and a server that receives and sends responses. There are several ways to conduct an API request, although the most common method, as used in this study, involves requesting permission to build an API application for a certain server (i.e., Twitter and ScienceDirect). An API request that asked for all messages and papers tweeted/published in 2019 containing the term financial risk was sent to the server. Approximately 300 tweet messages containing the term financial risk were acquired, whereas 104 research papers containing the term financial risk in the abstract, title, or as a keyword were obtained from Scien-ceDirect. Twitter data contained basic information, including usernames, tweet messages, favorite indicators, retweet counts, and other similar information. ScienceDirect data contained paper titles, journal names, publication years, abstracts, and authors' information. For the purposes of this study, only tweet content (i.e., messages) and words in abstracts were analyzed. Several data cleaning adjustments were made. Duplicated words such as finance and financial were combined into a single word called financial. All letters were converted to lowercase letters as a way to ensure consistency. Symbols, numbers, non-alphabet characters, individual usernames, company names, Uniform Resource Locator (URL), function words 2 (e.g., but and with), stop words 3 (e.g., the, and, which, etc.), and unnecessary spaces within and among letters and words were removed from the raw data prior to data analysis. For the ScienceDirect data, study, research, and heading words describing common paper sections (e.g., introduction, literature, data, methods, analysis, results, conclusion, etc.) were additionally removed. The primary goal of this study was to examine and compare sentiment about the term financial risk across social media and academic papers by analyzing the semantic network of messages and abstracts from Twitter and ScienceDirect. Several tests were used to provide insights into the central concepts associated with the term financial risk. All words and phrases were analyzed using R programming language. Figure 1 shows the main function procedures, illustrated as a three-step process, and packages used for the analysis. Process of analysis in three stages 2 A function word is a term used mainly for expressing relationships between other words in a sentence. For example, a conjunction like "but" or a preposition like "with" are considered function words (MacMillan Dictionary, n.d.) . A word whose primary purpose is to contribute to the syntax of sentence rather than the meaning of a sentence is also considered a function word (Oxford English Dictionary, n.d.). 3 A stop word (i.e., usually one of a set of words most frequently occurring in a language or text) is one that is automatically omitted from or treated less fully in a computer-generated concordance or index (Oxford English Dictionary, n.d.; http://senti ment.nrc.ca/lexic ons-for-resea rch/). Page 4 of 14 Two data analysis procedures were used to identify words that were most often used in connection with the term financial risk. Word frequency distributions and pattern recognitions were analyzed using text-mining techniques. Text mining allows various definitions and ranges, as an extension of classical data mining methods, to be applied when making sophisticated formulations using text classification and clustering procedures (Meyer et al. 2008 ). Using the text mining method, 1,366 words from tweets that included the term financial risk, and 3,216 words from abstracts that included the term financial risk, were extracted and analyzed. In addition to a single word, the word-pair that each word was associated with in the text was extracted from tweets and abstracts. The way in which these word-pairs tend to co-occur provided important information about the meaning of concepts (Evert 2005) . A total of 5,981 word pairs from tweet messages and 23,816 word pairs from the abstracts were extracted and analyzed. A unique aspect of this study was the analysis of words used in tweet messages and abstracts of academic papers as indicators of personal/authorial sentiment and emotion associated with the term financial risk. The analytical approach focused on identifying opinions that represent or express positive, negative, or neutral emotions (Ji et al. 2015; Messias et al. 2017) . In this study, the sentiment analysis used natural language processing and computational linguistics to systematically identify, extract, and quantify affective states and subjective information (Liu 2012) . This study adopted the lexicon that was developed in the Nebraska Literary Lab to obtain a sentiment score associated with each sentence observed in the sentiment analysis (Jockers 2017) . Based on score numbers that can be positive, negative, or zero, each sentence was classified into a sentiment group (i.e., positive, negative, or neutral). A further analysis was conducted to test the relationships among the three sentiment groups. The primary goal of this element of the analysis was to identify and compare sentiment differences between tweet messages (daily conversations) and research papers (scholarly communications). 4 A semantic network analysis was then used to model semantic relationships using graphical representations with labeled nodes and edges, where nodes represent words and edges represent relations among nodes. Analyzing the semantic structure of networks, as a visual text analytics system, provided a pathway, in this study, for the identification of central concepts or meaningful relationships between and among words (Drieger 2013). Embedded in this approach is the notion that clusters in a network represent groups of strongly connected words. In this study, the Girvan-Newman (Girvan and Newman 2002; Newman and Girvan 2004) algorithm was used to detect clusters. Everett (2018) noted that the Girvan-Newman algorithm is among the best-known methods to form a cohesive subgroup in a network. This analysis was followed by the estimation of statistical quantity measures, such as centrality. Calculating multiple centralities allowed for the identification of the significance and importance of words in a network (Das et al. 2018; Ibarra and Andrews 1993) Table 2 provides the definition of measured centralities used in this paper. Table 3 shows the most frequently used single words and word pairs associated with the term financial risk. Results from the two models are presented in the table. Each model includes the 10 most often used words and their frequencies. The first model includes the text source from Twitter. Among Twitter users, management was used most often whenever financial risk was included in a tweet. The word The compactness of the network Network modularity The strength of division of a network into modules Degree centrality The measure of the link that a node has Betweenness centrality The number of times a node lies on the shortest path between other nodes Closeness centrality The average length of the shortest path between the node and other nodes Eigenvector centrality The measure of the influence that a node has market was second, whereas software was the third most often used word. For word pairs, risk management was used most often. This was followed by financial management and market risk. The second model shows text results from Science Direct. As shown in Table 3 , the most often used single word when the term financial risk appeared in an abstract was market. This was followed by the words cost and system in that order. Risk management, health care, and monetary policy were the three most used word pairs contained in paper abstracts. The data shown in Table 3 indicate that financial risk was used broadly both in day-to-day and academic conversations. While most concepts shown in Table 3 relate to finance or business, nonbusiness words, such as software, health, and energy, were also used to describe financial risk across nonacademic and academic posts and abstracts. Figure 2 illustrates the results from the sentiment analysis. The semantic analysis was conducted by sentiment group based on the text source. The purpose of the analysis was to determine the tone or emotion invoked in tweets and abstracts that contained the term financial risk. The objective of the analysis was to determine whether these concepts were primarily used in a negative, positive, or neutral manner when applied in day-to-day conversations and when presented in academic publications. By conducting the sentiment analysis at a sentence level, every sentence was categorized into one of the three sentiment groups (i.e., positive, negative, and neutral) based on a sentiment score (i.e., positive, negative, and zero scores). Approximately 46% of tweet sentences that included the term financial risk were negative, whereas 39% were positive, and 15% were neutral. Among the paper abstracts, 32% of sentences represented a negative sentiment in relation to the term financial risk, while 62% were positive, and 6% were neutral. Table 4 shows the most often used words associated with each sentiment group. In the positive group, management and market were the words most often cited from tweet messages and paper abstracts. Webinar and uncertainty were the most often used negative words, whereas business and patient were the most widely used neutral words associated with the term financial risk. The data in Table 4 indicate that the term financial risk was widely associated with concepts and terms from a variety of domains. Unique words (e.g., China, climate, lecturer, psychology, readmission, etc.) emerged as important in the sentiment analysis, indicating the diversity of domains compared to what was observed in Table 3 . Table 5 provides the descriptive statistics associated with each sentiment network after the semantic network analysis was conducted. The total number of nodes ScienceDirect represents the network size. As shown in Table 5 , paper abstracts exhibited larger networks than tweet messages across the three sentiments. All networks across the three paper abstract sentiments were less dense than networks from tweet messages. A community detection algorithm indicated more communities among positive and negative networks, but less in the neutral network, from tweet messages. Average closeness centralities from tweet networks were higher than the centralities from the paper abstract networks across the three sentiments. Compared to the paper abstract networks, positive and negative networks from tweet messages had a larger average path length, higher average degree centrality, higher average betweenness centrality, higher average eigenvector centrality, and higher average clustering coefficients. Table 6 shows the value of multiple centralities that describeS the importance and influence of concepts for each sentiment network. Duplicated words within the same sentiment and the same text source (e.g., the positive sentiment network from tweet messages) across the four centralities are highlighted. Overall, there was not much overlap among words used in tweet messages and paper abstracts. Excluding an expected node (i.e., financial risk), common central concepts within the positive network from tweet messages were found to be related to business and finance. Positive network words included market, management, business, report, and growth. In addition to these business-related words, energy and performance emerged as important concepts for the positive sentiment networks from paper abstracts. Significant words in the negative sentiment network, from both text sources, included danger, aversion, uncertainty, failure, and loss. The neutral sentiment network's important words from tweet messages centered around the concept of business and technology (e.g., cybersecurity, information security, and cybercrime), while networks from the paper abstracts were related to words like patient, hospital, and readmission. Figures 3 and 4 show the significant words for both networks by betweenness centrality (x-axis), closeness centrality (y-axis), degree centrality (z-axis), and eigenvector centrality (bubble size). In relation to the four centralities, the following words emerged from the analysis of tweet data: market, danger, and security. These words represent positive, negative, and neutral sentiments, in that order, whereas energy, uncertainty, and readmission were observed to be important in relation to the analysis of paper abstracts. This study served two purposes. The first involved identifying how non-academics conceptualize the term financial risk in day-to-day communications and how academics use the term financial risk when communicating with colleagues in research papers. The second purpose involved evaluating sentiment indicators associated with the term financial risk in personal and scholarly communications. Based on a word frequency analysis using Twitter data from 2019, it was determined that, in general, key words were related to finance and business topics (e.g., management, market, and business). It appears that business discussions are often conceptualized in a way that frames the markets and the economy in a negative way. It is possible that those who use Twitter use the medium to express concerns related to certain activities. Twitter users may be looking for ways to share information about what they consider to be unsafe. After financial and business concepts, other words were found to be associated with digital and virtual markets (e.g., software, webinar, and blockchain) in daily conversations. This may mean that non-academics equate risk with the adoption and use of technologies. It could also mean that acceptance of digital and virtual products, services, and markets is thought to be risky. It was also determined, using ScienceDirect data from research paper abstracts published in 2019, that the term financial risk was associated with words representing various domains (e.g., system, chain, energy, and patient). It is likely that academics, when writing about risk, are more precise in their terminology than non-academics who communicate with others when using Twitter. Based on results from the sentiment analysis, both nonacademics and academics were found to be quite different in eliciting emotional responses in their communications. The term financial risk was mainly associated with negative emotions in daily conversations, whereas the term tended to be used in a positive way in research paper abstracts. Nonacademics may feel that situations that involve uncertainty are conditions to be feared and that when financial risk is conceptualized in day-to-day communication, taking a financial risk involves anticipating the future, being fearful of potential outcomes, and needing to trust someone or something outside of one's control. When viewed in the context of academic paper abstracts, academics appear to communicate about risk more positively. Overall, the term financial risk can be seen as something that non-academics must cope with on a daily basis, but like many aspects of daily life, financial risk evokes mixed emotional responses. As an exploratory project, it is worth noting several limitations associated with this study. First, unlike the analysis of traditional data, analyzing words is difficult in that conventional statistical tools are often inadequate in terms of data management. The methods employed in this paper are, like all data mining techniques, based on nonlinear assumptions. It is possible that a more traditional methodological approach could generate conflicting results. Additionally, Twitter data may not be generally representative. More detailed screening procedures of data components (e.g., user information) should be considered in future studies to reduce possible bias from this issue. Finally, because the narrative data used in this study were obtained from a third party, the ability to create a truly random sample was limited. This means that the results, while noteworthy, are not necessarily generalizable. The results from this study suggest that the way financial risk is conceptualized and used in daily practice differs from the definitional and academic application of financial risk in meaningful ways. Non-academics appear to use the term financial risk very loosely to describe situations that involve a degree of uncertainty and lack of transparency. Findings also show that the emotional context of the term financial risk between non-academics and academics is related to negative, positive, and neutral sentiments. In some respects, this indicates that non-academics understand that the presence of threats is not always negative or even hazardous. Taking financial risk, as illustrated in the Twitter posts, is sometimes communicated in a positive manner. This means that non-academics likely do understand the nuanced meanings of the term and phrase, but in general, non-academics tend to express a negative sentiment in relation to the concept of risk. It is, as such, important for researchers, educators, and policy makers to make a distinction between the academic application of the term financial risk and the day-to-day use of this term. The findings from this study have direct implementable implications for those who are engaged in financial risk communication and education. It is no easy task to change risk perceptions or the manner in which a word like risk will elicit an emotional response. The stability of risk perceptions appears to be determined, in part, by the inelastic nature of strongly held beliefs (Slovic 1986) . People are very slow to change belief patterns, even in the face of overwhelming evidence. Consider investments in stocks and other equities. Those who lived through and lost money during the Great Depression, the global financial crisis, and the COVID-19 pandemic may have come to believe that stocks and other equity investments are inherently risky, resulting in wealth losses and that the stock market acts like a game of chance. Even in the face of dramatic increases in stock values following the Great Depression, the global financial crisis, and the COVID-19 pandemic, many investors who had been negative affected by these events continued to perceive stocks as problematic assets in which to place savings-perceiving risk as a negative outcome. At the same time, there was a general perception held by many investors that fixed-income securities and federally insured bank/credit union products were "safe". Those who acted on this belief over the period 2013 through 2020 found, in retrospect, that they had incurred significant opportunity costs as the value of stocks and other equities increased while yields on fixed-income and bank/credit union products remained flat or decreased. Knowing that beliefs are difficult to change and that evidence that counters a deeply held perception will generally be considered erroneous, unrepresentative, or unreliable (Slovic 1986) , those who are tasked with communicating about the positives and negatives associated with financial risk and financial risk taking need to find innovative ways to counteract the relative inelasticity of beliefs. One approach to risk communication and education that appears to offer some degree of success is formatting and visualization. Consider takeaways from prospect theory. The manner in which risks are presented-either positively or negatively-can shape patterns of behavior. In general, when risks are presented in the gain domain, nearly all decision makers exhibit degrees of risk aversion. When the same risks are presented in the loss domain, many of the same decision makers shift their preference to risk seeking. Additionally, Lichtenstein et al. (1978) advocated the use of statistical displays as a way to combat deeply held beliefs. Rather than present data in nominal terms, evidence suggests that fear and trepidation can be reduced when event outcome data are shown as a comparison. While it is true that risk means different things to different people (Lichtenstein et al. 1978) , it is also true that when people believe they can control risk or the outcomes associated with a risky situation, the perception of negative outcomes associated with engagement in a risky activity falls (Simon et al. 1999) . This is the primary reason that visualizations and data presentations work well in shifting the emotional context of risk. That is, once a decision maker comes to understand the true dimensions of a risky situation, it becomes easier for the decision maker to feel more in control of their situation. The adage that it is safer to fly than it is to drive to the airport is an applied example of a risk comparison that helps reduce the fear of flying. 5 The k-core is the maximal connected subgraph which has minimum degree greater than or equal to k. Although exploratory, this study is noteworthy in being one of the first rigorous analytical attempts to examine how the term financial risk is used in practice by non-academics and academics, and to link an emotional context around this term using online media and publication data. This study provides documentation for what researchers have often assumed: non-academics evaluate, conceptualize, and communicate about risk through personal channels using user generated definitions rather than scientifically generated definitions. As shown in this study, gaining a better understanding of the use and emotional context of risk is one way to help educators, researchers, and policy makers develop tools and techniques that can enhance risk communication and education. Big data methodologies will continue to open up new pathways to better understand day-to-day emotions, human behavior, and cognitions. The use of these types of methodologies in the future can help provide additional insights into the perceptions and preferences of those who communicate about risk-taking topics. Funding This study was not funded by a grant. The codes that support the findings of this study are available from the corresponding authors upon the reasonable request. Conflicts of interest The authors declare that they have no conflict of interest. Availability of data and material The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Study on centrality measures in social network: a survey Semantic network analysis as a method for visual text analytics Classical algorithms for social network analysis: future and current trends Unpublished doctoral dissertation) University of Stuttgart Community structure in social and biological networks Power, social influence, and sense making: effects of network centrality and proximity on employee perceptions Twitter sentiment classification for measuring public health concerns Extracts sentiment and sentiment-derived plot arcs from text Semantic network analysis of vaccine sentiment in online social media Judged frequency of lethal events Sentiment analysis and opinion mining Risk: definition and synonyms An evaluation of sentiment analysis for mobile devices Text mining infrastructure in R Finding and evaluating community structure in networks Cognitive biases, risk perception, and venture formation: how individuals decide to start companies Informing and educating the public about risk Twitter-statistics and facts See Figs. 5, 6, 7, 8, 9 and 10 .