key: cord-0927829-fe6b21x8
authors: Zhang, Weifeng
title: Management and Plan of Undergraduates' Mental Health Based on Keyword Extraction
date: 2021-10-28
journal: J Healthc Eng
DOI: 10.1155/2021/3361755
sha: 1e871845c7734be0a1110680070a5b4d722eaa8e
doc_id: 927829
cord_uid: fe6b21x8

Mental health issues are alarmingly on the rise among undergraduates, which have gradually become the focus of social attention. With the emergence of some abnormal events such as more and more undergraduates' suspension, and even suicide due to mental health issues, the social attention to undergraduates' mental health has reached a climax. According to the questionnaire of undergraduates' mental health issues, this paper uses keyword extraction to analyze the management and plan of undergraduates' mental health. Based on the classical TextRank algorithm, this paper proposes an improved TextRank algorithm based on upper approximation rough data-deduction. The experimental results show that the accurate rate, recall rate, and F1 of proposed algorithm have been significantly improved, and the experimental results also demonstrate that the proposed algorithm has good performance in running time and physical memory occupation.

e mental health and wellbeing of undergraduates have deteriorated over the last decade. Before the COVID-19 pandemic, higher education was facing a "mental health crisis" [1, 2] . e rapid onset of the COVID-19 pandemic has introduced countless additional stressors, and faculty concern over student wellbeing has increased. Over the past ten or twenty years, the depression has increased from about 25% of undergraduates in 2010 to almost 30% of undergraduates in 2020, and the anxiety of undergraduates has increased from 22% in 2014 to 31% in 2020. Suicidal ideation of undergraduates has increased from 6% in 2010 to 11% in 2020 [3] . e frequency of mental health management organization in undergraduates varies from university to university. Definitely influence of the pandemic on mental health concerns within undergraduates is a big concern. e pandemic has affected the economic development of many countries, and the cooperation of relevant countries on the pandemic has also led to conflicts. e widespread public reports on the Internet and the media have made simple and inexperienced undergraduates unable to distinguish. So, the management and plan of undergraduates' mental health are important under COVID-19 pandemic [4, 5] .

Keywords are words that express the central content of a document. Keywords from a document can accurately describe the document's content and facilitate fast information processing. ere are two main types of keyword extraction algorithms, which are unsupervised keyword extraction method and supervised keyword extraction method [6] [7] [8] . Unsupervised keyword extraction method does not need manually labeled corpus, but it uses some methods to find important words in the text as keywords for keyword extraction. In unsupervised keyword extraction method, candidate words are firstly extracted, and then each candidate word is scored, so top-K candidate words with the highest score are output as keywords. According to different ranking strategies, there are different algorithms such as term frequency-inverse document frequency (TF-IDF), TextRank, and latent Dirichlet allocation (LDA). e supervised keyword extraction method regards the keyword extraction process as a binary classification problem. At first, the candidate words are extracted, and then each candidate word is labeled, so the keyword extraction classifier is trained. When a new document is coming, all candidate words are extracted, and then the trained keyword extraction classifier is used to classify each candidate word. Finally, the candidate words labeled as keywords are used as keywords [9, 10] .

Accordingly, the main contributions of this paper are summarized as follows. (i) I study the TextRank keyword extraction algorithm. (ii) An improved TextRank algorithm based on upper approximation rough data-deduction is proposed. e rest of this paper is structured as follows. Section 2 reviews the related work. In Section 3, I propose an improved TextRank algorithm based on upper approximation rough data-deduction. e experimental results are shown in Section 4. Section 5 concludes this paper.

Many strategies of management and plan for undergraduates' mental health have been proposed. In [11] , the authors studied to examine student perspectives about college mental health including the primary mental health issues affecting students, common college student stressors, student awareness of campus mental health resources, and mental health topics students wanted more information about. Little research existed into the trends associated with on-campus service utilization for mental health concerns of college students. Rates of broad service utilization existed, but no published study had examined the direct relationship between a range of common mental health symptoms and on-campus service utilization. In [12] , the authors studied to explore which common mental health concerns were associated with specific on-campus service utilization in undergraduate students and whether endorsement of more mental health concerns would predict a higher number of services utilized. In [13] , the study investigated the moderating role of perceived social support in the relationship between academic demands (measured as perceived academic stress) and mental health of undergraduate students in full-time employment. A growing number of developing countries had experienced worsening air pollution, which had been shown to cause significant health problems. However, few studies had explored the impact of air pollution on the mental health of university students, particularly in the Chinese context. In order to address this gap, in [14] , through a large-scale cross-sectional survey, the study aimed to examine the effects of air pollution on final-year Chinese university undergraduates' mental health by employing multivariable logistic regression.

e TextRank algorithm plays an important role in keyword extraction. In [15] , the author presented an automatic keyword extraction algorithm based primarily on a weighted TextRank model. In the model, word embedding vectors were used to compute a similarity measure as an edge weight. As a typical keyword extraction technology, Tex-tRank had been used in a wide variety of commercial applications, including text classification, information retrieval, and clustering. In these applications, the parameters of TextRank, including the cooccurrence window size, iteration number, and decay factor, were set roughly. In [16] , the authors conducted an empirical study on TextRank, towards finding optimal parameter settings for keyword extraction.

e keyword weight propagation in TextRank focused only on word frequency. To improve the performance of the algorithm, in [17] , the authors proposed semantic clustering TextRank, a semantic clustering news keyword extraction algorithm based on TextRank. In [18] , the authors introduced a new human-annotated Chinese patent dataset and proposed a sentence-ranking-based term frequency-inverse document frequency algorithm for patent keyword extraction, motivated by the thought of "the keywords were in the key sentences." In the algorithm, a sentence-ranking model was constructed to filter top-K-s percent sentences from each patent based on a sentence semantic graph and heuristic rules. In [19] , the authors introduced a word network whose nodes represented words in a document and defined that any keyword extraction method based on a word network was called as a Word-net method. en, the authors proposed a new network model which considered the influence of sentences and a new word-sentence method based on the new model. In [20] , the authors proposed an ontology and enhanced word embedding-based methodology for automatic keyphrase extraction from geoscience documents.

ere are also some other methods for keyword extraction. In [21] , an enhancement of the term weighting was proposed particularly in the form of a series of modified term frequency-inverse document frequencies, for improving keyword extraction. In [22] , the authors proposed an improved rapid automatic keyword extraction method, which used the word string matching feature in the dictionary method to correspond to the relevant execution action function. In [23] , a novel text mining approach based on keyword extraction and topic modeling was introduced to identify key concerns and their dynamics of on-site issues for better decision-making process.

TextRank keyword extraction algorithm is a graph-based ranking algorithm, which is derived from Google's Pag-eRank algorithm [24] . TextRank firstly divides the target text into several meaningful words and constructs the candidate word graph and then uses the voting mechanism to rank the candidate words to achieve keyword extraction. e task of keyword extraction is to extract several important words from the target text. TextRank algorithm uses the local correlation between words (i.e., cooccurrence sliding window) to determine the correlation between candidate words and then performs iterative calculation and ranking of candidate keywords. Rough set theory is originally used for text classification to speed up classification and improve accuracy. Rough data-deduction is based on rough set theory, which integrates approximate information from upper approximation concept into data reasoning process. is paper introduces upper approximation-based rough data-deduction to TextRank keyword extraction algorithm, and the extracted keywords are used in undergraduates' mental health management and plan.

In TextRank keyword extraction algorithm, the candidate keywords in the text are the graph model constructed by the cooccurrence correlation, and then the weights of each node are calculated by the average transition probability matrix for many times until convergence. After convergence, words are ranked in descending order according to their weights, and the first N words are selected as the extracted keywords. is method is more concise and effective, but it has certain limitations. In [25] , the convergent operation utilizes a clustering strategy to group the population into multiple clusters. e use of cooccurrence window only considers the correlation between local words, so some words closely related to a certain keyword may be ignored, but keywords from a document are not just limited to the keywords around the words. When doing text keyword extraction, I should fully consider the words in the text as well as some potentially related words. Words with potential correlation will have an important impact on the whole iterative ranking process, and the potential relation can be discovered by the theory of rough data-deduction. erefore, this paper proposes an improved TextRank algorithm based on upper approximation rough data-deduction.

Based on the word sense similarity of mental health words, the candidate keywords are divided. As there may be a group of words with similar word sense in a document, the weight of this group of words should be increased to improve the accuracy of extraction results when describing the same important content. TextRank algorithm only considers the word sense themselves and ignores the contribution of words with similar word sense. erefore, the improved algorithm takes the word sense into account and divides the candidate words by word sense, which can extract keywords more effectively. e rough data-deduction space M � (U, N, D) is introduced to describe the keyword extraction of undergraduates' mental health issue structurally. U is the universe of discourse (UOD) and the dataset composed of candidate keywords of undergraduates' mental health. N is a set of equivalence relation, and E ∈ N. If and only if p is similar to q, then p, q ∈ U and <p, q>∈E. D⊆U × U is defined as D � {<p, q>|p, q ∈ U and there is a relation between p and q}.

Assuming that deduction correlation is defined as equation (1) by using rough data-deduction,

where cw 1 -cw 6 are the candidate keywords from the text through word segmentation and filtering, and the deduction correlation is determined by the degree of the association rules, that is, pointwise mutual information (PMI).

At the same time, for equivalence relation E ∈ N,

where the equivalence division is based on the similarity between the candidate words.

In rough data-deduction, for candidate word cw 1 , the algorithm obtains cw 2 and cw 3 based on similarity rule and then divides cw 1 , cw 2 , and cw 3 into one dataset, and cw 4 -cw 7 can be similarly divided. en, cw 4 can be obtained from cw 1 based on the degree of the association rules of PMI, as well as cw 5 , cw 6 , and cw 7 . According to rough data-deduction, for 7 can be obtained from cw 1 ⇒ E cw 6 and cw 6 ⇒ E cw 7 . As described, there is also a potential correlation between cw 1 and cw 7 , which can provide a certain contribution rate for calculation. e association between candidate keywords is established by the above rules, and the association weight can be added to the iterative calculation process as contribution rate to improve the accuracy of keyword extraction. e upper approximation-based rough data-deduction to TextRank keyword extraction algorithm is summarized as follows.

Step 1. Based on TextRank algorithm, the text related to undergraduates' mental health is preprocessed, which includes clause, word segmentation, and part of speech (POS) tagging, and candidate keywords are obtained.

Step 2.

e candidate keywords are divided into different equivalence classes according to their similarities. is paper is divided based on WordNet and Wikitext. For any two candidate words cw 1 and cw 2 , the division rule is defined as follows:

where s 1 and s 2 are the similarities calculated by WordNet and Wikitext, respectively. ω 1 and ω 2 are the two weights assigned to s 1 and s 2 , and ω 1 + ω 2 � 1.

Assuming that candidate word cw 1 is distributed in WordNet WN, and cw 2 is distributed in Wikitext WT, the intersection of WN and WT is WW. e value strategy of ω 1 and ω 2 is summarized as follows:

(i) When cw 1 ∈WW and cw 2 ∈WW, the similarity to cw 1 and cw 2 is calculated based on WN and WT, respectively, which are denoted as s 1 and s 2 . In this paper, ω 1 � ω 2 � 0.5. (ii) When cw 1 ∈WN and cw 2 ∈WN, or cw 1 ∈WT and cw 2 ∈WT, cw 1 and cw 2 are calculated as s 1 or s 2 based on WN and WT, where one of the ω 1 and ω 2 is 1, and the other one is 0. (iii) When cw 1 ∈WN and cw 2 ∈WT, the synonym set of cw 2 is searched based on WT, and then the similarity with cw 1 is calculated based on WN, and the maximum value is denoted as s 1 . If cw 2 has no synonym in WT, then s 1 � 0.2, ω 1 � 1, and ω 2 � 0. (iv) When cw 1 ∈WN and cw 2 ∈WW, the similarity to cw 1 and cw 2 is calculated based on WN and denoted as s 1 . en, the synonym set of cw 2 is searched in WT, and then the similarity to cw 1 is calculated based on Journal of Healthcare Engineering WN, and the maximum value is denoted as s 2 . If cw 2 has no synonym in WT, then s 2 � s 1 , and ω 1 >ω 2 . In this paper, ω 1 � 0.6, and ω 2 � 0.4. (v) When cw 1 ∈WT and cw 2 ∈WW, the similarity to cw 1 and cw 2 is calculated based on WTand denoted as s 2 . en, the synonym set of cw 1 is searched in WT, and then the similarity to cw 2 is calculated based on WN, and the maximum value is denoted as s 1 . If cw 1 has no synonym in WT, then s 1 � s 2 , and ω 2 >ω 1 . In this paper, ω 1 � 0.4, and ω 2 � 0.6.

Here, the calculation of word similarity based on WordNet is defined as follows:

In equation (4), s 1 (WW 1 , WW 2 ) is the similarity calculated by the set of independent minimum semantic units. s 2 (WW 1 , WW 2 ) is the similarity of feature structure of minimal semantic unit of correlation. s 3 (WW 1 , WW 2 ) is the similarity of the characteristic structure of the relational sign. e parameter δ m (1 ≤ m ≤ 3) is adjustable and meets the requirement of δ 1 + δ 2 + δ 3 � 1. In this paper, δ 1 , δ 2 , and δ 3 are set as 0.6, 0.25, and 0.15, respectively. Equation (5) can obtain the sense similarity. When there are multiple senses in a word, equation (5) is used to calculate the maximum similarity among all combinations of senses, that is, the similarity to two words, where i is the sense number of the word cw 1 , and j is the sense number of the word cw 2 . e calculation of word similarity based on WT is defined as follows:

where dt(WW 1 , WW 2 ) is the distance function of word codes WW 1 and WW 2 in the tree structure. j is the total number of nodes in the branch layer, which indicates the number of direct child nodes of the nearest common parent node of two words. di represents the distance between branches where two words are located in the nearest public parent node.

Step 3.

e correlation of association rules in rough datadeduction is defined as follows:

where cw 1 and cw 2 are two candidate keywords in the text. p(cw 1 , cw 2 ) is the probability of cw 1 and cw 2 appearing in the same sentence. p(cw 1 ) is the probability of occurrence of cw 1 , and p(cw 2 ) is the probability of occurrence of cw 2 . According to the correlation, the candidate keywords with direct correlation are determined, when PMI(cw 1 , cw 2 ) ≠ 0, there is a direct correlation between cw 1 and cw 2 , and cw 1 , cw 2 and their correlation degrees are stored in the correlation set. Meanwhile, the rough data-deduction relation D can be established according to the correlation degree. en, by using the rules of rough data-deduction, I get the correlation between the other candidate keywords in all the different equivalence classes, and these words and their correlation degrees are stored into the correlation set.

Step 4. According to the correlation set obtained in Step 3, candidate keyword graphs with weights are constructed.

en, according to the equation of TextRank algorithm, the weight of each candidate keyword is calculated iteratively until convergence.

e experiment selects 26300 questionnaires of mental health management of undergraduates from Xinxiang University with 23 schools and 60 majors, which consist of psychological distress, depression, suicidal tendency, and selfevaluation related to mental health within 300 to 1000 words. In particular, these undergraduates are distributed for different grades uniformly. e 19000 valid questionnaires are obtained by excluding questionnaires with selfevaluation less than 300 words to test the effect of proposed method in this paper. e questionnaires use silver ink with a metal oxide [26] . 10 keywords of each questionnaire are extracted and ranked by the importance. In this paper, ω 1 and ω 2 are set 0.5.

In addition, for comparison purposes, TextRank, a keyword extraction using supervised cumulative TextRank (KESCT) [27] , scientific research project TF-IDF (SRP-TF-IDF) [28] , and high representation tags LDA (HRT-LDA) [29] are selected. ree evaluation indexes commonly used in classification are used to compare and evaluate the quality of experimental results, which include precision (P), recall rate (R), and F1 (F). P is the accuracy of extraction results. R is the coverage degree of the extraction results to the correct keywords. F is a comprehensive evaluation index of harmonic average of P and R.

Results. It is found in the experiment that the two important parameters can affect the keyword extraction result of TextRank algorithm, which are the cooccurrence window size and the number of keywords, while the implementation of TF-IDF algorithm based on statistical feature and the algorithm proposed in this paper are not affected by the cooccurrence window size. I set the number of extracted keywords as 10, and the value of the comparison window is within [4, 10] . e F1 under different cooccurrence window sizes is shown in Figure 1 .

It can be seen from Figure 1 that TextRank algorithm has different extraction effects under different cooccurrence window sizes. In the same test set, this paper compares the effect of different cooccurrence window sizes, and when the cooccurrence window size is 5, the original TextRank algorithm has the best extraction effect with high F value. erefore, in order to ensure the effectiveness of the proposed algorithm, the cooccurrence window size is set to 5. e initial window value is set to 5, and P, R, and F are calculated with the number of keywords within [3, 10] . e calculation results are shown in Table 1 .

At the same time, in order to observe the experimental results of five algorithms conveniently, the P, R, and F of the algorithm are plotted, as shown in Figures 2-4 . Figure 2 describes the variation trend of the accuracy of the five algorithms when extracting different numbers of mental health keywords. As can be seen from Figure 2 , with the increasing number of mental health keywords extracted, the accuracy of each algorithm decreases, but the accuracy of the algorithm proposed in this paper is always higher than other four baselines.

e TextRank algorithm based on rough data-deduction proposed in this paper will integrate upper approximation information into the process of datadeduction so that the mutual deduction between data presents the characteristics of approximate entailment or imprecise association, and the potential association between candidate keywords can be mined. If the potential association is added to the iterative calculation of the weight of each candidate keyword, more accurate extraction results can be obtained. erefore, the accuracy of the algorithm proposed in this paper is theoretically higher, and its accuracy P value is higher than other four baselines. Figure 3 describes the change of recall rate of five algorithms when extracting different numbers of mental health keywords. In Figure 3 , the recall rate of the algorithm proposed in this paper is higher than that of other four baselines, and the recall rate increases with the increasing number of mental health keywords. e SRP-TF-IDF algorithm relies too heavily on word frequency and does not use correlation between words at all. KESCT algorithm adopts the cooccurrence window principle. Although the relation between words is considered, the algorithm is more inclined to put forward frequent words due to its limitations, which may ignore important words with low word frequency that can describe the topics. However, the rough data-deduction used in this paper can expand the correlation range and enhance the coverage of the keywords of the correct correlation in order to improve the recall rate of the algorithm. e influence of word frequency decreases with the increasing number of keywords, and the advantages of the algorithm proposed in this paper will be more obvious. Figure 4 describes the F values of five algorithms when extracting different numbers of mental health keywords. When evaluating the experimental results, it is expected that both P and R should be as high as possible. However, in most cases, the two values are contradictory. erefore, F value should be used to comprehensively consider the two values, which can reflect the effectiveness of the whole algorithm. Keyword extraction based on rough data-deduction can mine the potential association between candidate keywords theoretically, which increases the candidate words and range of the association. e keyword extraction based on rough data-deduction adds the potential association to the iterative Journal of Healthcare Engineering calculation of the weight of each candidate keyword, so the extraction results will be more accurate, that is, the algorithm is also more effective. According to the experimental results, the proposed algorithm has higher P and R than the other four baselines. e F will be higher with the higher P and R, and the higher F can indicate the effectiveness of the algorithm. In conclusion, the accuracy rate, recall rate, and comprehensive evaluation index F1 of the proposed algorithm are higher than those of the four baselines, which indicates that the improved TextRank algorithm based on upper approximation rough data-deduction is more effective in mental health keyword extraction. e test set of this paper uses the self-evaluation in the questionnaire of undergraduates' mental health. e text length is generally less than 500 words, which is mainly concentrated in 300-500 words. is paper divides the test set by the number of self-evaluation words. Each test set randomly selects 30 texts of corresponding text words to compare the running time and physical memory occupation of the five algorithms.

As can be seen from Figure 5 , when the number of words in the text is 300-400, the number of deduction and semantic calculation is small, and the running time of the algorithm is also short. e number of deduction and semantic calculations increases with the increasing number of words. Compared with TextRank, the running time of the proposed method is still shorter than that of the other three baselines, which is similar to TextRank's efficiency.

It can be seen from Figure 6 that the physical memory occupation of the proposed method is small with good efficiency. When the number of words in the text is 600-800 or 800-1000, the physical memory occupation of SRP-TF-IDF and KESCT is nearly the same. is paper manages the undergraduates' mental health through the keyword extraction. e results show that academic problem, emotional problem, interpersonal problem, anxiety problem, sexual problem, and adaptation to college life are the universal mental health issues of undergraduates. Currently, how to deal with mental crisis is an urgent problem that colleges cannot avoid. e proposed bounded area elimination algorithm in [30] analyzes the feature extraction, and the idea of feature extraction is similar to the TextRank keyword extraction algorithm proposed in this paper . Timely plan of mental health crisis is to provide supports and help to those who have experienced personal crisis so that they can restore their mental balance and have full confidence in life. 

In a fast-paced society, there is more competition among undergraduates, that is, they are facing the dual pressure of enrollment and employment, and mental health is very important for them. Mental health is the necessary condition and foundation for everyone's all-round development in today's society, which is also a necessary psychological quality for undergraduates. is paper introduces upper approximation-based rough data-deduction to TextRank keyword extraction algorithm, and the extracted keywords are used in undergraduates' mental health management and plan. e comparison experiments reveal that the proposed algorithm outperforms four baselines in terms of accuracy rate, recall rate, F1, running time, and physical memory occupation.

e future works are stated as follows. (i) e deduction rules of rough data will be further refined and improved, so as to get better extraction effect. (ii) e words related to mental health may be incomplete in WordNet and Wikitext, which results in unsatisfactory keyword extraction. e following research will consider using a corpus of mental health related to achieve keyword extraction.

All data used to support the findings of the study are included within the article.

e author declares no conflicts of interest.

International experiences of the active period of COVID-19 -mental health care

Impact on mental health care and on mental health service users of the COVID-19 pandemic: a mixed methods survey of UK mental health care staff

Mental health problems and suicide risk: the impact of acute suicidal affective disturbance

Anxiety, depression, and health anxiety in undergraduate students living in initial US outbreak "hotspot" during COVID-19 pandemic

e mental impact of digital divide due to COVID-19 pandemic induced emergency online learning at undergraduate level: evidence from undergraduate students from Dhaka City

Keyword extraction: issues and methods

Yake! keyword extraction from single documents using multiple local features

sCAKE: semantic connectivity aware keyword extraction

Automatic keywords extraction based on co-occurrence and semantic relationships between words

Zone-based keyword spotting in Bangla and Devanagari documents

Undergraduate students survey their peers on mental health: perspectives and strategies for improving college counseling center outreach

e relationship between on-campus service utilization and common mental health concerns in undergraduate college students

Academic demands and mental health among undergraduate students in full-time employment: the moderating role of perceived social support

e impacts of air pollution on mental health: evidence from the Chinese university students

Automatic keyword extraction using textrank

An empirical study of TextRank for keyword extraction

News keyword extraction algorithm based on semantic clustering and word graph model

Sentence-ranking-enhanced keywords extraction from Chinese patents

A new network model for extracting text keywords

Geoscience keyphrase extraction algorithm using enhanced word embedding

Effect of term weighting on keyword extraction in hierarchical category structure

Improved rapid automatic keyword extraction for voice-based mechanical arm control

Understanding on-site inspection of construction projects based on keyword extraction and topic modeling

fuzzyfeaturerank. bringing order into fuzzy classifiers through fuzzy expressions

Enhancing learning efficiency of brain storm optimization via orthogonal learning design

Synthesis of a nano-silver metal ink for use in thick conductive film fabrication applied on a semiconductor package

Keyword extraction using supervised cumulative textrank

Keyword extraction from scientific research projects based on SRP-TF-IDF

A novel tagging augmented LDA model for clustering

An artifacts removal post-processing for epiphyseal regionof-interest (EROI) localization in automated bone age assessment (BAA)