key: cord-0046964-94s7yvfs
authors: Lin, Jiayin; Sun, Geng; Shen, Jun; Pritchard, David; Cui, Tingru; Xu, Dongming; Li, Li; Beydoun, Ghassan; Chen, Shiping
title: Deep-Cross-Attention Recommendation Model for Knowledge Sharing Micro Learning Service
date: 2020-06-10
journal: Artificial Intelligence in Education
DOI: 10.1007/978-3-030-52240-7_31
sha: 4c2fe36cc12e4f62a30e17e9c42939b8aaa56f6c
doc_id: 46964
cord_uid: 94s7yvfs

Aims to provide flexible, effective and personalized online learning service, micro learning has gained wide attention in recent years as more people turn to use fragment time to grasp fragmented knowledge. Widely available online knowledge sharing is one of the most representative approaches to micro learning, and it is well accepted by online learners. However, information overload challenges such personalized online learning services. In this paper, we propose a deep cross attention recommendation model to provide online users with personalized resources based on users’ profile and historical online behaviours. This model benefits from the deep neural network, feature crossing, and attention mechanism mutually. The experiment result showed that the proposed model outperformed the state-of-the-art baselines.

As a novel online learning style, micro learning aims to utilize users' fragmented spare time by helping them to carry out effective personalized learning activities [1] [2] [3] . Such online learning activities could be formal, informal, and non-formal [4] , and online knowledge sharing is one way of non-formal learning. Quora, 1 Zhihu, 2 and Stackoverflow 3 are the most representative and successful online knowledge platforms, where users share knowledge by asking and answering questions. In the meantime, the online platforms continuously recommend questions and topics to the users based on their interests, background, and learning requirements.

As the key to the personalized online learning service, the recommendation strategy determines what information will be finally delivered to the target user [5] . As for a new online learning service in the big data era, conventional recommendation strategies, such as collaborative filtering and content-based filtering [6] , are no longer suitable for catering the personalized learning requirements. A recommender system always needs to handle and merge different types and format of information ranging from the user's profile to the resource's profiles. Moreover, higher-order feature interaction is crucial for good performance [7] . How to precisely weight different features is also vital for a recommender system, as different features have various importance levels for a personalized recommendation task [8] .

In this paper, we propose a novel model, which combines several advantages from different state-of-the-art recommender systems and offers them in a smooth one-stop manner. The rest of this paper will be organized as follows. Section 2 discusses some prior related work about recommender system used in micro learning. The proposed model is introduced and explained in Sect. 3. The relevant experiment of this study is discussed and analysed in Sect. 4. The conclusions are discussed in Sect. 5.

The recommendation problem has been investigated for many years in different domains. However, the recommendation task in online education always involves some unique requirements or characteristics [9, 10] . In one prior study [11] , the ant colony optimization (ACO) algorithm was proposed to recommend personalized learning paths to users based on the demographic information. The ontology-based method was used to add extra user's profile information and relieve the cold-start problem for micro learning service [12, 13] . Another study [14] investigated the learning path recommendation from micro learning service from an exploitation perspective. So far, there are little efforts on deep learning solutions to this problem.

Feature interaction means features involved in a recommendation task tend to influence each other with various combinations. Factorization machine (FM) [15] uses embedding techniques to model the latent features in low dimensional space and represents the pair-wise feature interactions by using the inner product. It also shows a satisfactory performance when the dataset is in high sparsity, whereas SVMs fails [15] . However, due to the high computational complexity, in many cases, only 2-order feature interactions are involved in the FM.

Deep learning has demonstrated its powerful strength in modelling non-linear transformation in various AI tasks. Besides using deep neural for a recommendation task in isolation (for example [16] ), many researchers argue that combining the advantages of deep neural networks (DNN) with classical methods such as linear model or FM could better learn sophisticated feature interactions [17] [18] [19] .

In this study, we aim to effectively combine these functionalities: mining and generating high-order feature interaction, distinguishing the importance difference of both implicit and explicit features, and maintaining the original input information in a single network. To this end, we proposed a new deep cross attention network (DCAN) model for the recommendation task of the online knowledge sharing service. The input of the model contains both user-side and question-side information, and the embedding layer maps such information onto a low dimensional space. The embedding vectors are then passed into the DNN network and crossing network separately for mining latent information and high-order feature interactions. The processed results are combined together, and an attention network is used to distinguish the importance differences of different features. Finally, the output layer is used to make predictions with weighted features.

Evaluation Metrics. As a binary classification task, the first evaluation metric used is Area Under Curve (AUC), which indicates how much a model is capable of distinguishing the two labels. Another metric used in our experiments is mean squared error (MSE), which directly reflects the prediction error of the involved models. Moreover, we also compared the binary cross entropy of the involved models.

Baselines. We compared our model with several state-of-the-art recommendation models, ranging from DeepFM [17] , AutoInt [7] , DCN [20] , AFM [21] , and FM [15] . The characteristics of used baselines are introduced in the previous sections.

The dataset is collected from an online knowledge-sharing platform, which contains around 1.8 million questions and users, and more than 4 million answers for the questions. Nearly 10 million <question, user> pairs are involved in this dataset.

Based on the experiment results from Table 1 , we can clearly see FM and AFM have lowest AUC values and highest MSE scores. These two models only involve low-order feature interactions. While others involve high-order feature interactions. Hence, highorder (complex) feature interactions are vital in the online learning resource recommendation tasks.

According to Table 1 , the AUC scores of our proposed model and AutoInt model are the highest two. These two models refine the results of high-order feature interaction via the attention mechanism [22] . Such performance improvement demonstrates that different features/feature combinations are not equally important for personalized learning service, and attention mechanism can automatically distinguish the importance differences of the latent features or the feature combinations generated by the prior layers of the network.

In this study, we proposed a deep cross attention network (DCAN) for recommending personalized online learning resources to online learners. The experiment results clearly demonstrated that our model had potential in handling complex online learning recommendation problem. More specifically, according to the experiment results with authentic online knowledge sharing data, the strengths of DCAN can be concluded into two points: 1.this model can automatically mine and generate high-order feature interactions in both explicit and implicit ways; 2. the proposed model can further distinguish the importance differences of different features. 

From ideal to reality: segmentation, annotation, and recommendation, the vital trajectory of intelligent micro learning

A survey of segmentation, annotation, and recommendation techniques in micro learning for next generation of OER

MLaaS: a cloud-based system for delivering adaptive micro learning in mobile MOOC learning

Bridging in-school and out-of-school learning: formal, non-formal, and informal education

Towards the readiness of learning analytics data for micro learning

A framework for collaborative, content-based and demographic filtering

Autoint: automatic feature interaction learning via self-attentive neural networks

FiBiNET: combining feature importance and bilinear feature interaction for click-through rate prediction

A survey paper on e-learning recommender system

A fuzzy tree matching-based personalized e-learning recommender system

An improved ant colony optimization algorithm for recommendation of micro-learning path

Ontological learner profile identification for cold start problem in micro learning resources delivery

A heuristic approach for new-item cold start problem in recommendation of micro open education resources

Exploitation of micro-learning for generating personalized learning paths

Factorization machines

Deep learning over multi-field categorical data

DeepFM: a factorization-machine based neural network for CTR prediction

xDeepFM: combining explicit and implicit feature interactions for recommender systems

Wide & deep learning for recommender systems

Deep & cross network for ad click predictions

Attentional factorization machines: learning the weight of feature interactions via attention networks

Attention is all you need

Acknowledgments. This research has been carried out with the support of the Australian Research Council Discovery Project, DP180101051, and Natural Science Foundation of China, no. 61877051, and UGPN RCF 2018-2019 project between University of Wollongong and University of Surrey. The work was also partially conducted during authors' collaborative visit to MIT and CSIRO.