key: cord-0855185-91x1mh5w
authors: Yang, Guangze; Ouyang, Yong; Ye, Zhiwei; Gao, Rong; Zeng, Yawen
title: Social-path embedding-based transformer for graduation development prediction
date: 2022-03-03
journal: Appl Intell (Dordr)
DOI: 10.1007/s10489-022-03268-y
sha: 8e0e7a0d6974a68be31a01e0233db0f30df96a19
doc_id: 855185
cord_uid: 91x1mh5w

As the education of students attracts more and more attention, the task of graduation development prediction has gradually become a hot topic in academia and industry. The task of graduation development prediction aims to predict the employment category of students in advance via academic achievement data, which can help administrators understand students’ learning status and set up a reasonable learning plan. However, existing research ignores the potential impact of social relationships on students’ graduation development choices. To fully explore social relationships among students, we propose a Social-path Embedding-based Transformer Neural Network (SPE-TNN) for the task of graduation development prediction in this paper. Specifically, SPE-TNN is divided into the Social-path selection layer, the Social-path embedding layer, the Transformer layer, and the Multi-layer projection layer. Firstly, the Social-path selection layer is designed to find social relationships that impact graduation development and embed them into the student’s performance features through the Social-path embedding layer. Secondly, the Transformer layer is adopted to balance the weights of the students’ features. Finally, the Multi-layer projection layer is used to achieve the student graduation development prediction. Experimental results on the real-world datasets show that SPE-TNN outperforms the existing popular approaches.

Since the beginning of the 21st century, with the continuous enrollment expansion of colleges and universities, the employment rate of graduates has gradually attracted attention. Recently, due to the spread of the COVID-19, college graduates have to face more severe challenges than ever before [1] . Choosing the right direction for university students after graduation, whether to be directly employed or further their studies, etc., is related to their future career orientation and is critical to achieving sustainable Yong [2, 3] . Only when students have a stable and clear goal, their career orientation can be formed gradually during their university career and future career exploration. Therefore, career guidance is an indispensable part of the university, which can help students to plan their life goals after graduation better [4] .

Therefore, the task of providing advance guidance for graduates' career choice and development after graduation appears to be crucial [5] . Although many colleges and universities have started to provide career guidance for graduates, these career guidance efforts by colleges and universities for students are often formal and do not benefit students [6] . The job matching is based on the psychological assessment questionnaire or subjective willingness to graduate throughout the existing universities. However, this matching idea is still the traditional personality-job fit theory, which only collects students' subjective judgment data, which is immediate information but ignores the years of learning and social data accumulated by students in the university stage. For this reason, by predicting students' graduation development in advance through these data of students, colleges and universities can provide more targeted employment guidance to students, thus better guiding them to choose the development direction that suits them during their school years, realizing personalization and differentiation of employment and solving the problem of difficult employment for graduates [7, 8] .

Many efforts have been made in predicting students' graduation development, mainly using traditional machine algorithms and deep learning methods. Traditional machine learning algorithms mainly include the Naive Bayes model [9] , C4.5 Algorithm [10] , etc. However, with the rise of educational data mining [11] and increased student data processing, traditional algorithms show disadvantages such as weak generalization and insufficient computational abilities. Benefiting from the merit of deep learning [12] [13] [14] , neural networks such as Back Propagation Neural Network (BP) [15] , Generative Adversarial Networks (GAN) [16] , Long Short-Term Memory (LSTM) [16] , Graph Convolutional Network (GCN) [8] and Attention mechanism [8] have also been applied to the prediction of graduation development. However, the above-mentioned research work only predicts students' graduation development based on their grades [8] [9] [10] 16] , credits [8, 9] or their regular performances [10, 15] , ignoring the potential impact of multi-dimensional social relations of college students on graduation development prediction research.

Some scholars have pointed out in the in-depth study of human behaviour data, including quantitative analysis of college students' campus behaviour is helpful to explain some complex social phenomena [17] . Humans are social creatures, and all products and activities are completed through interactions between people. In particular, this kind of interaction with different purposes tends to form various social circles. People in different social circles exchange different information, emotions, or opinions through different expressions such as language and behaviour and have different influences. To better understand the impact of social relations on the person, many studies [18] [19] [20] have analyzed social relations. Lin et al. [20] mentioned that everyone has different features in virtual social networks. The relations owned by each person are rich and multi-dimensional, and each social relationship has different influences on individuals. Generally, social relations have the following facts:

1. The social relationships that everyone has are so rich and complex that they are difficult to define and understand accurately. 2. Different social relationships are not equal, and there is a focus that has different impacts on people.

This provides new ideas for predicting the graduation development of students. As the carriers of the smallscale society, college students also have various social relationships to form different social circles among them. Different social circles have a particular impact on students' studies, interpersonal relationships, and emotions [21] . There has been much work on the analysis of students based on social relations. For example, Kim et al. [22] analyzed the potential relationship between different types of social relations and students' burnout. They found that there is a negative correlation between social relations and students' burnout. Jr [23] studied the influence of social relations on students' grades by analyzing students' social circles and found that some social relations promoted students' grades, while others caused negative effects. Similarly, for seniors facing graduation, their graduation development decisions are usually influenced by other students in the multi-dimensional social circle [21, 24] . Therefore, their graduation decisions are highly dependent on their social relations, so mining students' social relations is crucial for the prediction of students' graduation development.

To explore the conditions for the generation of student social relationships, many scholars have explored the social relations of college students based on real-world datasets. For example, Eagle et al. [25] tracked students' mobile phones to mine the social network of students and first proposed in 2009 that clustering is an essential condition for students to have social relationships in real life. The so-called gathering refers to the exchange of information, emotions or opinions between people in a certain place for some reason in a certain place through expressions such as language and behaviour [26] . Crandall et al. [27] further showed that when different students have more clusterings in a period, the possibility of building social relationships between them also increases exponentially. Furthermore, Xu et al. [28] found that when two persons gather in the same place for a long time due to some coercive factors, they will have a greater chance of having a social relationship. Therefore, we use the following two strategies to explore the social relationships between students fully.

The first one is to find out students' social relations influenced by the compulsory factors of school. Because the university campus is closed to a certain extent, students live in the dormitory for a long time due to the school's requirements, and students in the same class often participate in activities or attend classes together. In this way, students are forced to gather in the same class or the same dormitory for a long time, which leads to an increase in their contact opportunities and makes it easier to form a social circle among students.

The second is to find the social relationship formed by students' independent choices because of the same individuality. In this regard, we have mainly observed three points: First, the elective courses offered by the school, as prescribed courses arranged by students themselves, can cultivate students' individual interests and personal abilities. When different students choose multiple identical elective courses together, the same interests and individuality lead them to form social relationships. Second, self-learning is an indispensable part of university life. Suppose some students may have the same learning purposes, such as applying for studying abroad, preparing for the postgraduate entrance examination or a civil service examination. In that case, they often go in and out of the library to study together, leading to a stable social relationship between them. Third, the college canteen provides the students' daily food and beverage service. Students would often dine in droves, and the probability of having social relationships between students often eat together also relatively high. Based on the categorization described above, this paper regards relationships formed through students' compulsory clustering in the same space for a long time by the designated classes and dormitories as commonality clustering social relationship. On the other hand, the social relationships created by students who choose multiple same elective courses, often study together in the library, and often eat together in the cafeteria due to similar individualities is regarded as the individuality clustering social relationship. In addition, to better find the factors that affect students' graduation development, this paper also takes the students' scores in compulsory courses in the first two years of university.

However, the current research still has the following two challenges:

1. Each student has multi-dimensional social relationships. However, each social relationship impacts graduation development differently, and not every social relationship impacts it. Therefore, how to find social relationships that influence each student's graduation development relationship is important. 2. After searching for the social relationships that affect the graduation development of each student, it is critical to retain the unique information in the various social relationships and embed it in the student individuals since there are still multiple social relationships with different attributes.

In order to address the challenges mentioned above, we propose the Social-path Embedding-based Transformer Neural Network (SPE-TNN) predict students' graduation development, as shown in Fig. 1 . Specifically, SPE-TNN is divided into the Social-path selection layer, the Social-path embedding layer, the Transformer layer, and the Multilayer projection layer. Firstly, the Social-path selection layer is designed to find social relationships that impact graduation development and embed them into the student's performance features through the Social-path embedding layer. Secondly, the Transformer layer is adopted to balance the weights of the students' features. Finally, the Multi-layer projection layer is used to achieve the student graduation development prediction. SPE-TNN enables administrators and educators to provide students with more accurate career guidance work, help students choose a more appropriate graduation development direction during their school years and prevent students from becoming unemployed after graduation.

The main contributions of this work are summarized as follow:

-To the best of our knowledge, this is the first work that introduces commonality clustering and individuality clustering to explore the social relationships among students. -We contribute an SPE-TNN framework, which is designed to capture social relationships between students. This model finds and embeds the social path that impacts the graduation development of students and uses the Transformer to balance feature weights. -Extensive experiments on the public dataset are performed to validate the effectiveness of our proposed method.

The task of predicting the development of students' graduation development predicts the different development directions of graduates after graduation based on their previous academic achievement or daily performance, helping education administrators to provide more targeted guidance to graduates, helping students to understand their learning status in advance and formulate reasonable learning goals. In recent years, many studies have been based on traditional machine learning methods to predict students' graduation development. For example, Hartatik et al. [9] utilized the Naive Bayes algorithm to analyze students' academic performance, thereby predicting students' graduation development tasks. In addition, Putri et al. [10] adopted the C4.5 decision tree method to improve the accuracy of graduation development predictions concerning student performance and usual performance. However, due to the complexity and diversity of student features, some machine learning algorithms have weak generalization capabilities and insufficient feature extraction capabilities, which leads to unsatisfactory performance in the prediction of graduation development when the amount of student data increases.

A'rifian et al. [29] used decision trees, logistic regression, and artificial neural networks to predict graduation development based on graduate employment datasets, among which artificial neural networks have the best effect. Compared with traditional machine learning methods, deep learning clustering social relationship, individuality clustering social relationship, and compulsory course scores. The red arrow pointed out by SPE-TNN is the most likely destination of the student predicted by the model has more robust feature extraction capabilities and higher learning efficiency in many fields [12] . Therefore, in recent years, researchers have focused more on applying deep learning to the prediction of student graduation development: Chen et al. [15] adopted Backpropagation Neural Network (BP) to analyze students' academic achievement to predict student employment. Guo et al. [16] used Generative Adversarial Networks (GAN) and Long Short-Term Memory Networks (LSTM) to process student academic achievement data to complete the task of predicting student employment. Ouyang et al. [8] utilized students' compulsory and elective course data and corresponding course credits to predict the graduation development of students based on Graph Convolutional Networks(GCN) and Attention and achieved better results.

Graph Convolutional Network (GCN) [30] is a neural network based on the graph structure, which can be used to process non-Euclidean space data [31] such as social networks, knowledge graphs, and chemical molecular structures. Khayi and Rus [32] used GCN to process students' test answers to improve the accuracy of the evaluation of test answers. In the online learning evaluation model of Karimi et al. [33] , Graph Convolutional Network (GCN) was applied to extract the features of courses and students and embed them to achieve better results. However, the disadvantage of GCN is that it assumes that the network structure of the dataset graph has been given in advance and does not distinguish between different types of nodes and edges. In this way, the absence of connections or false connections has a significant impact on the results. Because university students are in different colleges and living environments, the social circles generated by their different social relationships also have different effects on their graduation development. Not every social relationship impacts their graduation development, so we need to find the Social-Path that has a real impact on students' graduation development. Yun et al. [34] proposed that Graph Transformer Networks uses Graph Transformer Layer (GTLayer) to find Meta-Path in heterogeneous graphs, which uses an end-to-end approach to learn network representations, and automatically generates the Meta-Path of graphs through soft selection.

In addition, because GCN is not suitable for dealing with multi-relational networks, it can only embed graph nodes and ignore the expression of relations. For this reason, Vashishth et al. [35] proposed a learning model called CompGCN based on a Graph Convolutional Network. CompGCN is proposed to solve two problems. One is that GCN modelling training for multiple relationship graphs causes parameter overload due to too many relationship types; the other is that GCN only embeds the graph's nodes and ignores the expression of relationship vectors in the multiple relationship graphs. Therefore, to retain the unique information of the captured Social-Paths for graph embedding, we apply CompGCN as the Social-path embedding layer of the model. CompGCN can learn the representation of nodes and edges simultaneously, introduce edges in three directions: out, in, and self-loop, and use three learning matrices to update the vector of nodes.

In recent years, Attention mechanism [36] has been successful in many fields of deep learning, which is inspired by the fact that human beings use limited visual systems to screen high-value information when observing objects. The essence of Attention is to assign weights to different elements, highlight essential features, and improve network performance. For example, Zeng et al. [37] utilized High-Level Attention to measure the importance of different courses and optimize academic abnormal prediction tasks. In addition, Ouyang et al. [8] adopted a multi-layer Attention network to balance the weights of different student academic achievement to improve the accuracy of graduation development prediction.

The Transformer [38] is a model that uses Attention to accelerate training. The traditional CNN and RNN are abandoned in the Transformer. The entire network structure is entirely composed of the Attention mechanism. More accurately, the Transformer only consists of self-Attention and Feed Forward Neural Network. The Transformer adopts the method of Attention and adds extra Position Embedding to model the dependency relationship between sequences. Such a network structure can be trained in parallel to improve training speed, and feature extraction capabilities are more robust, and thus it has better performance.

Inspired by the above research, this paper proposed a student graduation prediction model based on students' social relationships and grades in required courses. To better address students' employment problems, we have done the following work:

-Tapping into social relationships among students.

This article brings social relations into the vision of educational researchers. As one of the essential behaviours in a student's career, social relations influence the student's study, life, and graduation direction. Therefore, in addition to student graduation projections, we can also connect social relationships when conducting other analytical work with students. Such as the potential impact of a student's social relationships on the student's academic performance can be considered when conducting academic alerts tasks, and whether the inclusion of social relationships connections will improve the accuracy of student performance prediction tasks, etc. Integrating students' social relationships into the study of student education can better help administrators and educators understand students' school status, formulate and plan better growth goals and long-term plans for students, and effectively use school information resources to improve student management in universities. -A more accurate graduation prediction model is proposed. When students reach their junior year, administrators and educators input students' required course data and social relationships in the first two years into the learned SPE-TNN model to obtain students' most likely graduation development destinations. Specifically, the SPE-TNN model incorporates students' social relationships into the grade features through CompGCN and uses the transformer to highlight the features that significantly impact students' graduation development.

The network output has higher accuracy in predicting students' graduation direction than previous work. Based on the obtained students' development direction, administrators and educators can provide students with more targeted and personalized employment guidance, effectively alleviating the problem of difficult employment for college students. The graduation prediction model provides a vital decision basis for school administrators to take precise services or precise interventions and serves as an essential path to realize the precise employment services advocated by the Ministry of Education.

Social-path selection layer: To select social relationships that impact students' graduation development, Social-path selection layer is inspired by GTLayer in the paper [34] and uses the 1*1 convolution kernel to learn Social-path in social networks based on given data, as shown in Fig. 2 .

The GTLayer first utilizes the following formula to select two Q 1 and Q 2 from the adjacency matrix A of different edge relationship types, where the calculation formula of Q is:

φ is the convolutional layer and W is the convolutional layer parameter. Q l can be expressed as t l ∈T e α (l) t l A t l , T e is the edge type set, α (l) t l is the weight of the l-th type t l of the edge in the l-th layer.

Therefore, the adjacency matrix A P composed of l adjacency matrices for the Social-Path of length l can be expressed as:

Social-path embedding layer: We follow the CompGCN strategy in [35] to introduce the Social-path embedding layer, which embeds Social-path into the student's academic achievement.

CompGCN is a graph neural network framework that can consider various relationship information to learn the representation of nodes and relationships simultaneously. Definition G = {V , R, E, X, Z} represents the multirelation graph, where V represents a set of nodes, R represents a set of relationships, E represents a set of edges, where edge (u, v, r) represents a relationship r ∈ R between vertices u and v, (u, v, r) ∈ E, X ∈ R |V |×d 0 represents the d 0 dimensional input feature of each node, and Z ∈ R |R|×d 0 represents the initial relationship feature. CompGCN defines h v as the updated representation of the node v, then the update formula of h v is:

x u and z r represent the initial features of the node u and the relationship r respectively, u, r ∈ Num(v) is the set of neighbours of node u under relation r, and W λ(r) ∈ R d 1 ×d 0 is a specific parameter of the relationship type, and

To balance the weight of student features embedded with Social-path, we apply the Transformer [38] model to process student features. The Transformer is composed of Num identical Encoder layers. Each layer comprises two sublayers, namely Multi-head Attention Mechanism and Fully Connected Feed-forward Network, each of which adds Residual connection and Normalization: LayerNorm(X + sublayer(X)). X is the input feature. The Transformer structure is shown in Fig. 3 .

The multi-head attention mechanism comprises multiple parallel Selfattentions, which can better capture relevant information through multiple calculation iterations. The main advantage of self-Attention is that it ignores the distance between different features and directly calculates its Attention weight to learn the internal structure of features. There are three different vectors in Self-attention, namely Query, Key, and Value. Multi-head attention is calculated as follows:

MultiH ead(Q, K, V ) = Concat (head 1 , head 2 , ..., head h )W 0 (4) Q, K and V are obtained by the linear transformation of the input vector. head i is the output generated by multiple Attention Head and multiplied by a network weight W 0 to obtain the final output of Multi-Head Attention. Fully connected feed-forward network: The linear transformation output by the Multi-head Attention Mechanism is obtained through the fully connected network, as follows:

In (5), W 1 , W 2 , b 1 and b 2 are the network weight and the bias value of Feed-forward network, Z is the output of Multi-head attention and Z is the final output of Feed-forward network.

The Transformer continuously passes the input features through the Multi-head self-attention mechanism and Fully connected feed-forward network to obtain the final Attention-added feature expression.

To realize the prediction of students' graduation development based on social relationships and capture the influence of students' social relationships on graduation development, we construct the relationship of the graph from two aspects. Specifically, the social relationship formed by the students who compulsorily gather in the same space for a long time according to the designated classes and dormitories is regarded as the commonality clustering social relationship. On the other hand, the social relationship formed by students who choose multiple same elective courses due to similar interests, often study together in the library and often eat together in the cafeteria is regarded as the individuality clustering social relationship.

As shown in Fig. 4 , the SPE-TNN model aims to use social relationships and student's achievement to predict graduation development. It mainly contains 1) Input layer; 2) Social-path selection layer. Automatically find the appropriate Social-path; 3) Social-path embedding layer. Embed the Social-path into the student's academic achievement; 4) Transformer layer. Calculate the weight of each student's different features, balance the feature factors that affect graduation development; 5) Multi-layer projection layers. Used for feature fusion and prediction.

We select the social relationship and student's course academic achievement as the student features of the student's graduation development prediction task.

Define the course grades of a college student S 1 as S 1 = [T 1 , T 2 , ..., T j ], where T represents the course grades of different courses, totalling j courses. The score matrix of all students can be expressed as S = [S 1 , S 2 , ..., S N ] T , and N is the total number of students. Five types of data are constructed with the rules of the commonality clustering social relationship and the individuality clustering social relationship: the dormitory dataset C 0 is composed of student dormitory information; the class dataset C 1 is composed of student class information; the elective course dataset I 0 is composed of elective course data; the library dataset I 1 is composed of the library visit record data in the campus card; the catering dataset I 2 is composed of the catering consumption record data in the campus card. Construct the social relationship adjacency matrix A = [A 0 , A 1 , A 2 , A 3 , A 4 ] according to the six datasets, as shown in Fig. 5 .

In addition, according to the division plan of Ouyang et al. [8] , the graduation development direction of each student is defined as six categories: Students who choose to continue their studies after graduation; students who choose to study abroad after graduation; students who choose to contract with the company for normal employment; students who choose to be admitted to civil servants; Students who choose to start a business after graduation; and students who still have not found a job after graduation. Therefore, we simplify these graduation development directions as postgraduate, abroad, employed, civil servant, pioneer and unemployed, which are recorded as target = {a, b, c, d, e, f }.

The student achievement matrix S and the social relationship adjacency matrix A. Output : Future development after graduation predicts Score = argmaxf (s|S,A), Where s is the probability score of each type of employment, and the highest probability is the post-graduation direction that the student may choose.

Since the social relationship of each student is rich and complex, to better predict the development direction of students' graduation through social relationships, we are inspired by [34] and apply the Social-path selection layer to find the Social-path in the social relationship network graph. The core idea of the Social-path selection layer is to apply the multi-channel 1*1 convolutional layer to assign weights to the social relationship edges. The social relationship edge with the high weight value is used as the Social-Path. The specific steps are shown in Fig. 6 .

and A I = [A 3 , A 4 , A 5 ] according to the commonality and individuality, and send them into the Social-path selection layer to find the Social-path corresponding to the commonality clustering social relationship and the individuality clustering social relationship by using (6) and (7), and get the corresponding adjacency matrix. We set the number of channels of the 1*1 convolutional layer to 3, as shown below:

T C is a set of edges of the commonality clustering social relationship type, T I is a set of edges of the individuality clustering social relationship type, α (i) t C i and α (i) t I i represent the weights of the t C i and t I i edge types in the i-layer Social-path selection layer respectively. A P C and A P I are the adjacency matrix corresponding to the Social-path of the commonality clustering social relationship and the individuality clustering social relationship, respectively.

Then concat A P C and A P I according to (8) to obtain the final output A P Re = {A P Re 1 , A P Re 2 , · · ·, A P Re 2ch } of Social-path Social-path embedding-based transformer for graduation development prediction 

After obtaining the Social-path of the student's social relationship in the Social-path selection layer, this study uses CompGCN [35] to embed the Social-path into the student's academic achievement. CompGCN can simultaneously perform representation learning on Social-path and student nodes, and better learn the expression of students' social relationship vectors. Suppose the adjacency matrix A P Re = {A P Re 1 , A P Re 2 , · · ·, A P Re 2ch } corresponds to Social-path and the student's academic achievement feature S = [S 1 , S 2 , ..., S N ] T constitutes the graph G S = {V S , R S , E S , X S , Z S }, V S represents a group of nodes of a student, R S represents a collection of student relations, and E S represents a group of edges, where (u s , v s , r s ) represents the relationship r s ∈ R S between student vertex u s and vertex v s . X S ∈ R |V S |×j represents the j -dimensional input feature of each student node and Z S ∈ R |R S |×j represents the student's initial relationship feature. At the same time, the following (9) is used to expand the student's E S and R S reversely: E S = E S ∪{(v s , u s , r s )|(u s , v s , r s ) ∈ E S }∪{(u s , u s , T S )|u s ∈ V S )} (9) And

Define the update expression of the student node v s of the first of Social-path embedding layer as h v s :

x u s and z r s represent the initial features of the student node u s and the social relationship r s respectively, u s , r s ∈ Num(v s ) is the set of neighbours of node v s under relation r s , and W λ(r s ) ∈ R d 1 ×j is a specific parameter of the relationship type, and η : (11) μ br s ∈ R S represents the learnable weight coefficient of the relation on the basis vector v s b . In addition, different weights are assigned to edges in different directions, that is, λ(r s ) = dir(r s ), which is expressed as:

After the node embedding is updated, the relationship embedding is also transformed as follows:

The W rel ∈ R d 1 ×j in the formula is a learnable transformation matrix, which projects each relationship into the same space as the node and makes it used in the next Social-path embedding layer.

Social-path embedding layer sets a total of k-layer CompGCN, then H k+1 v s is the representation of the student node v s after the k-th layer, H k+1 r s represents the representation of the relationship in which r s follows the klayer, and then use the following (14) and (15) 

This paper uses the Transformer [38] to calculate the weight of each student's different features and balances the feature factors that affect the student's graduation development.

The steps are shown in Fig. 7 . The input of the Transformer layer is S = [S 1 , S 2 , ...S N ] and N represents the number of students. By multiplying three different weight matrices W Q S , W K S and W V S to get Q S , K S and V S as the Q, K and V of the Multi-head Attention mechanism, and performs the dot product of (16) and scales it by √ d k and normalized by sof tmax function. Then execute scaled dot-product Attention multiple times to get multiple head S , connect them in series, and multiply them by a network weight W 0 S , to get the final output Z S 0 of Multi-Head Attention, as the following (16)-(18):

are the corresponding network weights in scaled dot-product Attention and i ∈ h is the number of executions. Then pass the result into the Fully connected feed-forward network and use the following (19) for linear transformation:

W S 1 , W S 2 are the network weight of the feed-forward network, and b S 1 , b S 2 are network biases. Z S 0 is the output result of the first encoder, and it is passed to the encoder of the following transformer network. Through the transformer network structure composed of multiple encoders, the final output Z S is obtained. Let S = Z S . Finally, the student feature expression S with added Attention is obtained.

The function of The multi-layer projection layer is to perform feature fusion and prediction. For each student's S i ∈ S feature, such as (20) :

tanh(x) is the hyperbolic tangent function as the activation function. θ is the network bias value, and w is the network weight.

Then the s i obtained through The multi-layer projection layer is normalized using the sof tmax function of (23), and the obtained Score is the probability value of the six categories corresponding to target. The category corresponding to the maximum probability is the prediction result.

Use the cross-entropy loss function Loss to adjust the network parameters, and use L2 regularization, as shown in (24) , and use the Adam optimization algorithm to optimize the network model.

The data set used in this article includes data from a total of 84314 students in 22 departments from 2014 to 2017. Each student's data contains information about the student while in school and whereabouts after graduation. The data are all statistics and input into the school system according to the real situation of students. This paper uses the augmented version of student dataset released in the paper [8] and [37] to test the model. The dataset contains all the information about the students, including dormitory classes, required courses and elective courses, campus card usage records, etc. Filter the valuable information in the dataset for this paper and save it to the student dataset U . U includes student achievement dataset P , dormitory dataset B, campus card dataset Y , graduation development dataset Q, and U = {P , B, Y, Q}. The student academic achievement dataset P includes CUR-NAME, TASK-NO, CURTYPE, CURDEP, CUR-CREDITH, GRADE, STUDEP, STUCLASS, STUID, STU-NAME, STUSEX represent course name, course number, course type, course affiliated college, credits, grade, student's college, class, student ID, student name, gender. The student attributes mainly used in this article includes course number, course type, student ID, class, college, and grade. The number distribution is shown in Table 1 .

The student dormitory dataset B includes building number, dormitory number, Charges, student name, student ID, class, grade, Major, student's college, Student category, School system. The student dormitory dataset attributes mainly used in this article includes building number, dormitory number, student ID, and college. The number distribution is shown in Table 2 .

The student campus card dataset Y includes the student account ID, student account, swipe time, total cost, card balance, amount of consumption, consumption type, number of consumption, trading terminal, access to records, student name, student ID, class. The student campus card attributes dataset mainly used in this article includes student ID, consumption time, consumption location, consumption type, and library visit time. The number distribution is shown in Table 3 .

The student's graduation development dataset Q includes college, student number, name, graduation employment destination, company nature, company industry, employment status, unemployed category, specific company, location, gender, education, professional, school system, nationality, political appearance, birthday, remarks. The student's graduation development dataset attributes mainly used in this article includes college number, student ID, and graduation development direction. The number distribution is shown in Table 4 .

In the graduation development dataset Q, the six types of students' graduation development direction target = {a, b, c, d, e, f }: postgraduate, abroad, employed, civil servant, pioneer, and unemployed, distributed as shown in Table 5 .

In order to obtain an input dataset suitable for the SPE-TNN model, it is requisite to perform basic cleaning of the dataset firstly. The process is mainly divided into five steps: -The first is to clean the missing data: determine the range of missing values, calculate the missing value ratio for each field, and delete student data with a high missing ratio and missing important fields, such as missing student numbers, classes, grades, etc. -The second is to clean data with wrong format and content: for example, data with inconsistent display formats such as time, date, numeric value, and halfwidth, there is data with characters that should not exist in the student field, and the content of an attribute in the student field does not match the expected content of the attribute. -The third is to clean up logically incorrect data: remove duplicate student data caused by repeated entries; remove unreasonable data, such as data with a score less than 0 or greater than 100, and data whose access time is not during library opening hours. -The fourth is to clean the attributes that are not needed:

Since the attributes of the data set used in this article are less than the actual amount of attributes of the data set, the attributes that are not needed are cleaned. For details, refer Tables 1 to 4 . -Finally, there is relevance verification: after completing the data cleaning, because this article uses multiple data sets, it is necessary to verify the relevance of the different data of each student in the student achievement dataset, dormitory dataset, campus card dataset, graduation development dataset, to ensure that each student has all the attribute fields that need to be used.

After the data cleaning is completed, in the student dormitory dataset B, the dormitory dataset C 0 is constructed from the student number, building number, and dormitory number. In the student achievement dataset P , the class dataset C 1 is constructed by the student number and class. Select the course type as an elective course and combine it with the student number to construct the String  2014  45  5280  20638  25  2015  45  5361  20722  27  2016  45  5276  20820  14  2017  45  5563  21751  17 elective course dataset I 0 . Select courses whose course types are compulsory courses and combine them with student numbers to construct S. In the student's campus card dataset Y , the library dataset I 1 is constructed by the student ID and the time of visiting the library; the catering data I 2 is constructed by the student ID, consumption time, consumption location, and consumption type.

In the student graduation development dataset Q, the student graduation development label target = {a, b, c, d, e, f } is constructed from the student number and graduation development.

Then construct the social relationship adjacency matrix set A = [A 0 , A 1 , A 2 , A 3 , A 4 ] according to the six datasets C 0 , C 1 , I 0 , I 1 , I 2 , as shown in Fig. 5 in Section 4.

We used 70% of the dataset as the training set, 10% of the dataset as the validation set, and the remaining 20% of the data S test as the test set to test results. The measurement indicators of this experiment use the general accuracy rate and recall rate. Acc represents the probability of correct prediction, as in (25) , R represents the prediction accuracy rate of a specific target, as in (26) , T u represents correct prediction, and belongs to category target = {a, b, c, d, e, f }.

To verify the validity of the SGME-TNN model, the model is compared with the following mainstream advanced model models:

-KNN [39] . This model discovers the degree of difference in student's academic achievement in highdimensional space mainly through clustering. We set the values in sequence in the test and select the best k value k = 4. -kNC [40] . Compared with KNN, KNC provides a weighted solution method for unbalanced samples, which are simpler and more efficient. During the test, the best value is the same as in the KNN model, and the best value is 4. -GBDT [41] . This model is a representative ensemble learning method, which finds the best classification point by constructing multiple attribute trees. When constructing, first use the training data to get the best GBDT model and test it on the test set.. -EBP [42] . This model is essentially an enhanced BP neural network used to extract students' academic achievement features and continuously enhance the attributes based on the labelling results. In the experiment, to obtain the network with the optimal number of layers, the number of network layers is set to [2, 4, 8, 16, 32] . Similarly, the hidden layer space is set to 64, and the embedded size is the same as SPE-TNN. -APAMT [43] . This model is a variant of Long-Short Term Memory (LSTM) and a soft-attention mechanism, which can learn the student profile-aware representation from heterogeneous behavior sequences.

In the experiment, the training data is used to train the model, and then test it on the test set. -NCF [44] . This model expands the matrix factorization by deep learning by mining potential factors. Use this framework directly for training and testing. -EESM [45] . The model uses a continuous neural network method to predict students' graduation development. In the experiment, directly enter the softmax function to normalize and output the prediction result. -GAT [46] . This model combines a graph convolutional neural network with a self-Attention mechanism. In the experiment, students are used as nodes, and social relationships are used as edges to be passed into the GAT model and output the prediction results. -HLVQ [47] . This model is a Hybrid LVQ (learning vector quantization) neural network algorithm. It combines the AdaBoost optimization technique with the LVQ algorithm to improve the accuracy of predicting student employment. In the experiment, use this framework directly for training and testing.

To verify the effectiveness and the accuracy of the SPE-TNN model, the following groups of standard other mainstream advanced models are designed for performance comparison. The results of the model comparison analysis experiment are shown in Table 6 :

We use Acc to complete the overall evaluation of the model and use R to determine the prediction accuracy of a specific target. The overall performance of the SPE-TNN model is superior to other models, with an accuracy rate of 87.07%.

Among other models, the prediction accuracy of the KNN and KNC clustering models is low, only 57.32% and 59.38%; The accuracy of the GBDT model is also low due to unobvious data features, only 65.90%; Although EBP improves the abstraction ability of the model, the 71.74% accuracy rate is not ideal because the ordinary neural network cannot handle the complex score feature data well. APAMT is a variant of Long-Short Term Memory and a soft-attention mechanism, but the 75.06% accuracy rate is not ideal; The NCF model and the EESM model take into account the hidden factors of student's academic achievement, and their effects are better, 77.12% and 78.83% respectively; The GAT model combines the GCN and the Attention mechanism and takes into account the two factors of the student's social relationship and the student's academic achievement,with an accuracy rate of 80.43%. As an improved clustering algorithm, HLVQ assists clustering by labeling student samples. Its effect is better than other models except for the model in this paper, with an accuracy rate of 82.72%.

In addition, the prediction accuracy rate of the specific target of the SGE-SANN model is also higher than that of other models. Among them, the accuracy rates of predicting the two types of targets a (for further education) and d (politics) are 83.64% and 85.51%, which are slightly lower than the overall accuracy; the accuracy of predicting the b (going abroad) target is the same as the overall accuracy, 87.10%; the prediction results of category c (employment), category e (entrepreneurship) and category f (unemployment) are relatively good, with 88.65%, 90.54%, and 89.74% respectively.

The SGME-TNN model uses the Social-path selection layer to find the Social-path that impacts the student's graduation development and uses CompGCN to characterize the student node and social relationship edge simultaneously. The social relationship is embedded in the student's academic achievement, and the transformer model is used to balance the weights of students' features. For the reasons mentioned above, the prediction results obtained by the SGE-SANN model are significantly higher than other models, which verifies the superiority of the model in this paper. 

The 1*1 convolution kernel is the core part of the Socialpath selection layer. The number of channels will affect the model's capture of Social-path to a certain extent. To obtain the best training parameters, we tested the effect of the convolution kernel on the model's accuracy under different channel numbers. In this experiment, the number of channels will be set O ∈ {1, 2, 3, 4, 5} in sequence, and the model will be trained to convergence. The experimental results are shown in Fig. 8 . Different colours in the figure represent the number O of different 1*1 convolution kernel channels. The multidimensional adjacency matrix A S ∈ R N×N×m composed of students' social relationships is transformed into a multi-dimensional matrix Q S ∈ R N×N×O through the convolution kernel, where N represents the total number of students and m is the dimension of the social relationship adjacency matrix. It can be seen from Fig. 8 that as the number of channels increases, the convergence speed is slowly decreasing. When the number of channels reaches 3, the accuracy is the highest, 87.11%, and when it continues increasing, the accuracy decreases slightly. Experiments show that when the number of channels is 3, the model can best learn the Social-path in social relationships.

In order to verify that social relationships have a positive impact on the prediction of students' graduation development and that the two parts of Social-path and Transformer have the effect of improving the prediction effect of the model, this paper includes an ablation study:

-No Social-path: This model does not use Social-path selection layer to find Social-path, but uses Social-path embedding layer to embed social relationships. Under the same training set and test set, all models are trained. The accuracy rates are shown in Fig. 9 .

The experimental results show that the SPE-TNN complete model has the best performance, which proves the effectiveness of the Social-path selection layer to find the Social-path and the Transformer to train the student's feature weight parameters. Firstly, by comparing the original model and the No Transformer model, when the lack of a transformer balances the weight parameters of the students' features, the effect of the model is slightly reduced, indicating that the model cannot learn the features of the students well without the Transformer, which reduces the accuracy indicating that Social-path has a significant influence on the model, and the accuracy of the model can be significantly improved by capturing the Social-path that has an impact on the student's graduation development in the many social relationships of students.Thirdly, by comparing the original model with the No Embedding model, it is evident that after embedding no social relationship, the model only uses students' academic achievement to predict graduation development, and the accuracy rate is lower. Finally, when the social relationship is not embedded, and no Transformer assigns weights to the student's features, the No&&No model has the worst effect on the prediction of graduation development. In the end, the experiment proved that adding social relationships improved the accuracy of graduation development prediction. Meanwhile, searching for Social-path through Social-path selection layer and using the Transformer to train student feature weight parameters have significantly improved the model's task of predicting graduation development.

In this paper, we explore the task of graduation development prediction via gathering social relationships by introducing commonality and individual groups among students. Specifically, we proposed the SPE-TNN framework. The model uses Social-path selection layer to find social relationships that impact graduation development and embeds them into students' academic achievement through Social-path embedding layer. Then it introduces the Transformer to balance the feature factors that affect the graduation development of students. Finally, the feature fusion and prediction are carried out by The multi-layer projection layer. Experiments on public datasets proved that the SPE-TNN model proposed in this paper has certain advantages over other models in accuracy and can provide more powerful support for graduates' employment guidance.

In the future, due to the richness and the complexity of social relationships, the social relationships mined in this paper may have many interfering factors, which will affect the prediction of graduation development. In our future work, we will consult more research literature, to find more representative social relationships among students, and to increase the accuracy of predicting student graduation development based on social relationships. These will help school administrators to provide better career guidance to students and enhance their employment rates.

Impact of COVID-19 pandemic on the employment of chinese college graduates and countermeasures

Decision-Making Models and career guidance. International Handbook of Career Guidance

Novel coronavirus outbreak and career development: A narrative approach into the meaning for Italian University Graduates

Iot-school guidance: A holistic approach to vocational self-awareness & career path

ISOcial media and social justice in the context of career guidance: is education enough

Smart school guidance and vocational guidance system through the internet of things

University smart guidance counselling

Elective future: the influence factor mining of students' graduation development based on hierarchical attention neural network model with graph

Prediction of student graduation with naive bayes algorithm. ICIC

Analysis of students graduation target based on academic data record using c4.5 algorithm case study: Information systems students of telkom university

Data mining for education. International encyclopedia of education

Efficient processing of deep neural networks: a tutorial and survey

Constructing a priordependent graph for data clustering and dimension reduction in the edge of AIoT

Improving random walker segmentation using a nonlocal bipartite graph

Study on the BP neural network evaluation model of employability

Graduate employment prediction with bias

Statistical mechanics on temporal and spatial activities of human

Structural investigation of supply networks: a social network analysis approach

Social network changes and life events across the life span: a meta-analysis

Multi-Path relationship preserved social network embedding

Are students really connected? Predicting college adjustment from social network usage

Relationships between social support and student burnout: a meta-analytic approach

In-class social networks and academic performance: how good connections can improve grades

Motivation predictors of college student academic performance and retention

Inferring friendship network structure by using mobile phone data

Social touch and human development

Inferring social ties from geographic coincidences

Multivariate relations aggregation learning in social networks association for computing machinery

A comparative Study on Graduates' Employment in Malaysia by using Data Mining

Semi-Supervised Classification with graph convolutional networks

Research on common point problem of camber tangent plane in N-Dimension eucilidean spaces

Graph convolutional networks for student answers assessment

Online academic course performance prediction using relational graph convolutional neural network

Graph transformer networks. Neural Information Processing Systems

Compositionbased Multi-Relational Graph Convolutional Networks

Effective approaches to attention-based neural machine translation

HHA: An attentive prediction model for academic abnormality

K-nearest Neighbors on Road networks: a journey in experimentation and inmemory implementation

Learning k for kNN Classification

Big Data Application in education: Dropout Prediction in Edx MOOCs

Using an enhanced Feed-Forward BP network for predictive model building from student's data

Jointly modeling heterogeneous student behaviors and interactions among multiple prediction tasks

Exercise-Enhanced Sequential modeling for student performance prediction

Predictive analysis of student academic performance and employability chances using HLVQ algorithm

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.