key: cord-0046947-8wutd3gc authors: Mondal, Sneha; Dhamecha, Tejas I.; Pathak, Smriti; Mendoza, Red; Wijayarathna, Gayathri K.; Gagnon, Paul; Carlstedt-Duke, Jan title: Learning Outcomes and Their Relatedness Under Curriculum Drift date: 2020-06-10 journal: Artificial Intelligence in Education DOI: 10.1007/978-3-030-52240-7_39 sha: 22d648d299672d11b7e27c67f8da9aa28081808b doc_id: 46947 cord_uid: 8wutd3gc A typical medical curriculum is organized as a hierarchy of learning outcomes (LOs), each LO is a short text that describes a medical concept. Machine learning models have been applied to predict relatedness between LOs. These models are trained on examples of LO-relationships annotated by experts. However, medical curricula are periodically reviewed and revised, resulting in changes to the structure and content of LOs. This work addresses the problem of model adaptation under curriculum drift. First, we propose heuristics to generate reliable annotations for the revised curriculum, thus eliminating dependence on expert annotations. Second, starting with a model pre-trained on the old curriculum, we inject a task-specific transformation layer to capture nuances of the revised curriculum. Our approach makes significant progress towards reaching human-level performance. The LO-relationship extraction task, recently introduced in [8] , seeks to predict the degree of relatedness between learning outcomes (LOs) in a curriculum. The authors examine the curriculum of the Lee Kong Chian School of Medicine, which spans five years of education and covers about 4000 LOs; each LO is a short statement describing a concept that students are expected to master. A hierarchy, designed by curriculum experts, groups these LOs at different levels of granularity. A successful clinical encounter requires students to conceptually relate and marshal knowledge gained from several LOs, spread across years and across distant parts of the curriculum hierarchy. This underscores the need for an automatic LO-relationship extraction tool (hereafter called LReT). In our earlier work [8] , this is abstracted as a classification task, where a pair of LOs is categorized as being strongly related (high degree of conceptual S. Mondal similarity), weakly related (intermediate conceptual similarity), or unrelated (no conceptual similarity). An LReT is trained on annotated data obtained from subject matter experts (SMEs), who are both faculty and doctors. However, this curriculum is periodically reviewed and revised. Modifications are made to both content (emphasising some LOs, dropping others, merging a few), as well as organization (grouping LOs differently, re-evaluating classroom hours dedicated to each). Table 1 compares an old LO with its revised counterpart. Note that the textual formulation (hence underlying concept) of the LO has been modified. Additionally, the LO has been re-grouped under a separate set of verticals -Longitudinal Course, Module, and Assessment Type, while doing away with Clinical Block, the only vertical in the previous version. As the curriculum drifts, so do relationships between its constituent LOs. An LReT trained on one version of the curriculum may not perform well on the revised version. Re-obtaining SME annotations carries appreciable cognitive and cost overheads, making it impractical to train an LReT from scratch. We present a systematic approach towards LO-relationship extraction under curriculum drift. Beginning with the SME-labelled dataset on the old curriculum, we employ heuristics to create a pseudo-labelled dataset for the revised curriculum. With some supervision now available, we tune the existing pre-trained model to the nuances of the revised curriculum, and compare its efficacy against human performance. This aligns with existing work on domain adaptation and transfer learning [6, 10] ; both study scenarios where training and test data do not derive from the same distribution. In contrast, not only do we adapt the model to a modified domain, but also generate data pertinent to this domain, thus eliminating the need for human intervention. This bridges the gap between building a reliable LReT, and deploying it against a changing curriculum landscape. Starting with SME-annotated old LO pairs, which serves as the gold-standard dataset, we proceed in two steps. First, we define a mapping that links an LO from the old curriculum (OC) to its closest matching counterpart in the revised curriculum (RC): where sim is an appropriate semantic textual similarity metric. Intuitively, the mapping score, sim(p, M (p)) captures the extent of semantic drift in the content of an LO. Thereafter, we rely on pruning. Recall that the gold-standard dataset (D old ) consists of old LO pairs (p, q), along with an SME-annotated class label. A silverstandard dataset for the revised curriculum (D rev ) is derived by pruning the mapping scores of an old LO pair at a pre-defined threshold (τ ), while retaining its class label. Formally, (2) Effectively, we propagate the SME-label from a LO pair in old curriculum to their corresponding maps in the revised curriculum, only if the both mapping scores exceed the threshold. These pseudo-labeled instances constitute the silverstandard dataset. The base-model ( Fig. 1(A) ), trained on gold-standard LO pairs of the old curriculum, predicts posterior probabilities for Strong, Weak, and None classes. As a comparative baseline, we train a model from scratch on the silver-standard dataset, without leveraging the base-model. We then explore three approaches to adapt base-model: Feature Mapping (MF) , where we manually map features from the revised curriculum to the old curriculum, and drop features that cannot be mapped ( Fig. 1(C) ). The resultant feature set can be fed to the base-model for predicting LO relatedness in new curriculum. In this novel approach (Fig. 1(D) ), we inject a fully connected layer that transforms the revised feature set to an approximate old feature set, which can then be fed to the base-model. The silver standard dataset is leveraged to train only this transformation layer, i.e. base-model layers are frozen. Once the transformation weights converge to an extent, we unfreeze the base-model parameters and train for a few epochs to allow fine-grained updates to the entire network. Table 2a compares model adaptation techniques outlined in Sect. 3. All approaches that leverage the base-model outperform training from Scratch, to various degrees. Feature transformation with smoothing (FT-S) yields the highest macro-F1, thus establishing that a) the base-model encodes some task-specific information independent of the specific curriculum, b) the revised feature-set can be adequately modeled as a linear transformation of the old feature-set, and c) additional smoothing over parameters of the base-model allows it to learn curriculum-specific nuances. Furthermore, as shown in Table 2b , the high variance in model performance stems from the small size of training and test sets for each cross-validation split, and the macro-F1 score is sensitive to samples in the specific test split. We perform paired t-test to ascertain that except for two pairs, FT vs MF (p = 6.8 × 10 −2 ) and FT vs FT-S (p = 6.6 × 10 −2 ), differences between all other techniquepairs are statistically significant at 95% confidence interval. Finally, for a small held-out set (n = 229), we obtain annotations separately from two SMEs and compute the inter-annotator agreement (71.7% macro-F1), which serves as a skyline. As shown in Table 2d , considering one SME as groundtruth and comparing against FT-S's predictions, the human-machine agreement turns out to be 64.4%. Compared to human performance, our reported results are moderately high, with, of course, some further scope of improvement. Back to the future-sequential alignment of text representations Discovering correlated spatio-temporal changes in evolving graphs Discovering prerequisite structure of skills through probabilistic association rules mining Mathematics curriculum development Beyond knowledge tracing: modeling skill topologies with bayesian networks A review of domain adaptation without target labels Beyond basic: a temporal study of curriculum changes in a first-year communication course Learning outcomes and their relatedness in a medical curriculum Distributional semantics resources for biomedical text processing Self-taught learning: transfer learning from unlabeled data Curriculum reform: why? what? how? and how will we know it works? Tex-sys model for building intelligent tutoring systems Building domain ontologies from text for educational purposes