key: cord-0780853-e2vtv89h authors: Delgado, Pablo; Salmerón, Ladislao title: The inattentive on-screen reading: Reading medium affects attention and reading comprehension under time pressure date: 2020-09-02 journal: Learn Instr DOI: 10.1016/j.learninstruc.2020.101396 sha: b4327c5c1714739b06ad77481170028ef154cf63 doc_id: 780853 cord_uid: e2vtv89h This study explored the influence of reading media and reading time-frame on readers' on-task attention, metacognitive calibration, and reading comprehension. One hundred and forty undergraduates were allocated to one of four experimental conditions varying on the reading medium (in print vs. on screen) and on the reading time-frame (free vs. pressured time). Readers' mindwandering while reading, prediction of performance on a comprehension test, and their text comprehension were measured. In-print readers, but not on-screen readers, mindwandered less on the pressured than in the free time condition, indicating higher task adaptation in print. Accordingly, on-screen readers in the pressured condition comprehended less than the other three groups. Mindwandering and text comprehension were similar under free reading time regardless of medium. Lastly, there were no differences in readers’ metacognitive calibration. The results support the hypothesis of shallow information processing when reading on screen under time constraints. Major concerns about the utility of digital technologies in education have grown as their use becomes more and more pervasive. Scholars from different disciplines are bringing up their worries about their potential harming impact on human cognition, with especial emphasis on students' in-depth information processing and sustained attention capacity (e.g, Baron, 2015; Salmerón & Delgado, 2019; Wolf, 2018) . Several empirical studies have reported that the use of digital technologies at school can lead to negative learning outcomes. In light of these considerations, it seems reasonable to deem digital technologies as not always suitable for academic reading and learning. The conclusions of three recent meta-analyses on the medium effect on reading comprehension should be a matter of concern (Clinton, 2019; Delgado, Vargas, Ackerman, & Salmerón, 2018) . Results demonstrated that people comprehend less the same texts on screen than on paper. While the overall effect sizes found by these studies were Hedges' g = -.21 (Delgado et al., 2018) and -.25 (Clinton, 2019) , analyses of moderators identified three main qualifying factors. First, both Clinton (2019) and Delgado et al. (2018) found the on-screen inferiority to be clear in expository but not narrative texts, with g = − 0.32 (vs. g = − 0.04) and g = − 0.27 (vs. g = 0.01), respectively. Second, the effect was significant only among studies in which participants read under time constraints (g = -.26; Delgado et al., 2018) . Finally, the effect of generation may also play a role, as the medium effect increased 0.01 points each year from 2001 to 2017 (i.e., the more recent the studies, the larger the on-screen inferiority; Delgado et al., 2018) . Although from the classical approach of Cohen (1988) such effects are small, educational researchers have recently emphasized the need to interpret effects sizes in context (Funder & Ozer, 2019) . Thus, as Delgado et al. (2018) argued, an effect size ranging from − 0.21 to − 0.32 is relevant in the reading comprehension field because it represents approximately 2/3 of the yearly growth in reading comprehension during elementary school (Luyten, Merrel, & Tymms, 2017) . The fact that the on-screen inferiority particularly emerges in expository texts and that it increases under time constraints suggests that such effect arises in cognitively demanding tasks. Although literary texts can be highly complex and difficult to fully understand, comprehension of expository texts (vs. narrative) is generally considered to demand increased cognitive efforts, as they present academic knowledge usually by means of a large number of ideas, infrequent vocabulary and complex text structures. For example, linguistic analyses of a random selection of 200 narrative and science seventh grade texts from the TASA corpus found that narrative texts use more frequent words, more concrete nouns and verbs, more connectives, or higher causal cohesion (Graesser & McNamara, 2011) . Moreover, an increased efficiency is required when performing tasks under limited time. In such cases, in-depth processing in combination with time management becomes critical (Ackerman & Lauterman, 2012; Lauterman & Ackerman, 2014) . Thus, Delgado et al. (2018) pointed out to the shallowing hypothesis as an explanation for on-screen inferiority (Annisette & Lafreniere, 2017) . This hypothesis considers that the daily, massive experience of reading on digital media promotes a superficial way of relating with textual information, which in turn is changing the way we process information. Although this hypothesis originally refers to the way we read on any type of medium, evidence suggests that such effect is more salient when reading on screen. Building upon these empirical and theoretical backgrounds, our study seeks to disentangle the cognitive processes underlying shallow on-screen reading by analyzing undergraduate students' attention and meta-cognitive calibration while reading a lengthy print or digital text with or without time pressure. A major concern regarding the impact of digitalization on information processing is a decreasing ability to focus on task (Baron, 2015; Wolf, 2018) . From this perspective, reading on screen is inherently distracting as a result of frequent reading experiences based on skimming and multitasking. For example, Daniel & Woody (2013) found that engaging with competing activities when reading at home was more frequent among participants who read electronic versions of a textbook compared to those who read it in print. Nonetheless, to the best of our knowledge, no study has directly analyzed readers' attention while reading on screen, relative to reading in print. Our study is designed to fill in this gap. On-task attention has been investigated by means of mindwandering measures. Mindwandering can be defined as unconstrained selfgenerated mental activity characterized by thoughts that arise independently of the task being performed, which have been called taskunrelated thoughts (TUTs; Smallwood, 2013) . Mindwandering is part of a general process that implies attentional shifts from external to internal experiences. The most used method to capture the presence of TUTs is the probe-caught technique, where participants are periodically interrupted during the task and asked to report whether they were mindwandering. This method is considered valid and informative to assess the occurrence of TUTs (Smallwood & Schooler, 2006) , and it has been used in reading research (e.g., Dixon & Bortolussi, 2013; Feng, D'Mello, & Graesser, 2013) . Reading tasks are unique to study mindwandering, as comprehending texts involves the construction of representations of the external environment (Smallwood & Schooler, 2006) . In this type of task, the occurrence and maintenance of TUTs entails that top-down attention shifts from the text content to the individual's internal activity, causing a temporary mindless reading mode. Although recent approaches suggest that mindwandering could be beneficial to understand some passages of literary texts that, in fact, require the readers' mind to wander (see Fabry & Kukkonen, 2019) , its detrimental consequences for reading comprehension have been widely reported (Feng et al., 2013; Soemer & Schiefele, 2019; Unsworth & McMillan, 2013) . The present study tests the hypothesis that shallow processing on screen is related to inattentive reading. Examining how reading media affect readers' mindwandering would provide a direct explanation of the on-line processes responsible of shallow processing on screen. We expect that the effect of increased mindwandering may become more harmful when focused attention becomes more critical, such as when reading under time constraints (Ackerman & Lauterman, 2012; Delgado et al., 2018) . A different explanation for the shallower reading of digital texts is provided by the metacognitive deficit hypothesis. In one of the most relevant attempts to understand the underling mechanisms of on-screen inferiority, Ackerman and colleagues studied people's metacognitive calibration (Ackerman & Lauterman, 2012; Lauterman & Ackerman, 2014; Sidi, Shpigelman, Zalmanov, & Ackerman, 2017) . Calibration is deemed a product of self-regulated learning processes which refers to a monitoring skill that reflects the accuracy of learners' perceptions of their own performance (Pieschl, 2009) . Calibration tends to be poor, with learners often being overconfident (see Stone, 2002) . In a series of studies, Ackerman and colleagues consistently found that the calibration accuracy of the participants in their studies was inferior when the experimental task was accomplished on a computer relative to printed materials, both when participants read texts to answer comprehension questions (Ackerman & Lauterman, 2012; Lauterman & Ackerman, 2014) , and when they solved brief problems (Sidi et al., 2017) . As a consequence of this heightening metacognitive inaccuracy, the authors argue, outcomes were poorer when performing the tasks on screen under time constraints. This is because the influence of self-monitoring processes becomes more relevant when time management is crucial. Authors also found that, under time pressure, participants' overconfidence in their own performance was larger on screen both when reading (Ackerman & Lauterman, 2012) and when solving short problems (Sidi et al., 2017) . Potentially, the relationship between a metacognitive monitoring deficit and inattentive reading when reading on screen may be bidirectional. On the one hand, lower on-task attention could hinder monitoring, as the occurrence of off-task periods will prevent readers from accurately judging their current level of understanding. Conversely, overconfidence in one's level of comprehension could liberate cognitive resources which could be dedicated to mindwandering (Smallwood & Schooler, 2006) . Whatever the nature of this relationship, difficulties in time managing as a consequence of lessened on-task attention and the metacognitive deficit mentioned will lead to increased overconfidence on one's performance, especially under time constraints. The existence of a reading medium effect on text comprehension raises several theoretical and educational concerns. From a theoretical perspective, models of reading comprehension have accounted for a wide range of factors affecting comprehension processes, with especial attention to the interaction between individual's characteristics (e.g., decoding skills, attention capacity), task features (e.g., reading goals), and the text content and structure (e.g., text genre, complexity; see McNamara & Magliano, 2009) . Nonetheless, in spite of the empirical evidence suggesting that medium affects reading comprehension (Clinton, 2019; Delgado et al., 2018) , this factor has been consistently ignored in major theoretical models of comprehension. According to the shallowing hypothesis (Annisette & Lafreniere, 2017) and the metacognitive deficit hypothesis (Ackerman & Lauterman, 2012; Sidi et al., 2017) , we propose that screens themselves could activate an effortless cognitive style, characterized by lack of on-task attention, superficial processing, and lessened metacognitive monitoring. In this sense, a recent model of reading comprehension, the RESOLV model (Rouet, Britt, & Durik, 2017) , assumes that contextual features play a decisive role in readers' engagement with the text. According to the RESOLV model readers construct a context schema based on their interpretation of the physical and social context and on previous typical experiences within similar contexts. Reading on screen is generally characterized by quick and superficial reader-text interactions, (e.g., Liu, 2005; Pernice, Whitenton, & Nielson, 2014) , which has been called the zapping attitude to text (van der Weel, 2011). Accordingly, a digital reading medium could activate a context schema which induces a particular shallowing processing style of the written information. The influence of the reading medium on comprehension also has a substantial impact on education. If reading on screens prevents readers to fully engage with the text, either hindering their on-task attention or their metacognitive comprehension monitoring, they should not be recommended as a main source of information. Furthermore, as the negative effect of on-screen reading especially arises when reading under time constraints (Ackerman & Lauterman, 2012; Delgado et al., 2018) , special attention should be taken when conducting timed exams on screens. Recent studies analysing data from the Program for International Student Assessment (PISA) have found that the change in presentation mode from print to computerized tests in PISA 2015 had a detrimental impact on students' PISA test scores in Germany, Sweden and Ireland (Jerrim, Micklewright, Heine, Salzer, & McKeown, 2018; Robitzsch, Lü; dtke, Goldhammer, Kroehne, & Köller, 2020) . In a time when online education is becoming ubiquitous due to worldwide lockdowns associated to the COVID-19 pandemic, it is urgent to understand the underlying mechanisms of the medium effect as a first step to minimize the negative educational effects of on-screen reading. The present study aimed to replicate the on-screen reading inferiority effect under time constraints, as well as to shed light on the explanation for such effect. Following the call for using more ecologically valid materials in reading research (Mangen, Olivier, & Velay, 2019) , participants in our study read a text substantially longer than what is typical in this research field. Participants were randomly allocated to one of four experimental conditions, so that they read either in-print or on-screen, with or without time pressure. We measured text comprehension, mindwandering, and metacognitive calibration. Besides, we measured a comprehensive set of covariates to control for their potential influence on comprehension (Ackerman & Lauterman, 2012; Daneman & Merikle, 1996; Guthrie, Klauda, & Ho, 2013; Hidi, 2001; Naumann, 2015; Ozuru, Dempsey, & McNamara, 2009 ) and mindwandering (Feng et al., 2013; Fulmer, D'Mello, Strain, & Graesser, 2015; Randall, Oswald, & Beier, 2014; Unsworth & McMillan, 2013; Xu & Metcalfe, 2016) . Our hypotheses were: 1. Participants reading on screen will mindwander more than those reading the printed text, regardless of time pressure. 2. Participants reading on screen under time pressure will show poorer calibration of comprehension (i.e., increased overconfidence in their performance on a comprehension test) than the other groups. 3. Participants reading on screen under time pressure will comprehend less than the other groups. One hundred and forty first-to-fourth year undergraduate students of pedagogy, teaching, and psychology of a large Spanish university volunteered for class credit. All participants had Spanish as their native language, and the mean age of the sample was 20.46 years (SD = 1.57). All participants provided informed consent, and they were debriefed after completing the study. As indicated by a priori power analyses (G*Power 3; Faul, Erdfelder, Lang, & Buchner, 2007) with alpha and beta levels respectively set at 0.05 and 0.20, a 140-participant sample is appropriate to detect an interaction effect of medium and time-frame both on reading comprehension and on readers' calibration respectively equal to a partial eta-squared of 0.07 (minimum necessary sample size = 107) and 0.06 (minimum necessary sample size = 125; sizes of the interactive effects found by means of a similar experimental design by Ackerman & Lauterman, 2012). Text. We used a lengthy expository text on human learning and artificial intelligence that comprised two figures and 3010 words (including figure captions), distributed across four pages. We used authentic versions of the text published in the science-dissemination magazine Investigación y Ciencia, the Spanish edition of the magazine Scientific American. As can be seen in Fig. 1 , in the in-print reading condition we provided the article in the actual magazine, while in the on-screen condition we provided its pdf version on a desktop computer (screen size 17"). The pdf initially presented one page by screen, but participants were allowed to set the zoom at their own pace. 1 In case they did zoom in, scrolling down the text by using the mouse wheel was necessary. The Inflesz Scale for Spanish (Barrio-Cantalejo et al., 2008) considers the number of syllables per word and the number of sentences to estimate texts' readability. According to this index, the readability of the text was 46.69, indicating that it was Somewhat difficult, a readability category equivalent to scientific-dissemination texts or specialized press. We chose such challenging text on purpose, as we expected it would allow for enough variation in terms of students' cognitive engagement, which is critical to test the shallowing hypothesis. Moreover, a sample of 20 participants in a pilot study took 21.93 min on average to complete the reading task. Thus, we considered the text long enough to capture variability in participants' mindwandering while reading. Multiple-choice comprehension test. We constructed 21 fouralternative questions, including seven questions for each of the following three comprehension processes: text-based (i.e., a single idea explicitly stated in a single sentence), local inference (i.e., a bridging inference linking two adjacent sentences), and global inference (i.e., a bridging inference linking information located more than two sentences apart; an example of each type of question can be found in the Appendix). The four response options for each question included the target and three different distractors: near-miss (an idea located in the text that conceptually taps the target answer), thematic (a plausible answer but containing common misconceptions), and unrelated distractor (an extremely improbable answer or inconsistent with the text content; Ozuru, Best, Bell, Witherspoon, & McNamara, 2007) . We conducted exploratory factor analysis (EFA) to examine the structure of the 21 questions. We tested the existence of three factors addressing the three aforementioned comprehension processes, respectively, using the maximum likelihood method. Given the binary nature of the items (i.e., correct/incorrect responses), these analyses were based on a polychoric-transformed correlation matrix of the dataset. The EFA did not yield any acceptable model in which each factor was uniquely loaded by one type of question. Therefore, we assumed that our test assessed a single construct (i.e., reading comprehension). We then conducted EFA fixed to one factor and we excluded seven questions whose factor loading was non-significant and lower than g = .30 (Costello & Osborne, 2005) . Thus, the test finally consisted of 14 questions (factor loading ranging from 0.31 to 0.72) Test reliability was measured using the omega coefficient (McDonald, 1999 ) also based on a polychoric-transformed correlation matrix, because this index is deemed more appropriate than Cronbach's alpha for dichotomous items Mindwandering probes. The frequency of TUTs was assessed by means of the probe-caught technique (Feng et al., 2013) . While reading the article, either in print or on screen, the first author orally interrupted participants approximately each 99 s, and they immediately had to indicate whether they were paying attention to a TUT at that moment. They were previously instructed to identify on-task thoughts ("Thoughts about the text content or about how well you are understanding it") and TUTs ("Thoughts about your daily stuff, a memory from the past, something in the future, your current state of being, or any other type of thought not related with the text content nor with the understanding of it") . This measure was completed in a separated sheet of paper by ticking yes (I was wandering) or no (I wasn't wandering) for each probe. The TUT proportion on the probes was calculated for each participant, ranging from 0 to 1. Metacognitive calibration. After reading, participants predicted their performance on the comprehension test by estimating the percentage of correct answers in a continuous 25-100% scale (Ackerman & Lauterman, 2012) . Calibration for each participant was calculated by subtracting the percentage of correct answers in the comprehension test from their prediction of performance (POP), which allowed us to perform correlation analyses between this measure and the other measured variables. Besides, participants' POPs were statistically compared to the actual performance by means of repeated-measures analyses. Working memory. We used the Letter-Number Sequencing Test from the Spanish version of the Wechsler Adult Intelligence Scale-IV (Wechsler, 2008) to measure working memory capacity. In this test, the evaluator enunciates a series of alternating numbers and letters, and individuals report back the numbers from lowest to highest, and the letters in alphabetical order. Difficulty increases from a 3-item to an 8-item series. Its application procedure was adapted to a group application, and participants wrote down their responses (cf. Macedo- Rouet et al., 2019) . Two experiment administrators ensured that participants did not annotate any digit/letter while the series were being read aloud. Moreover, when writing down the responses, participants were not allowed to start writing the final elements of the series and then complete it with other elements. We constructed a self-reported 8-item questionnaire as an indicator of participants' prior topic knowledge. Participants rated their knowledge on four subtopics related to human learning (e.g., brain processes involved in human learning), and four related to artificial intelligence (e.g., computer programming), using a scale from 1 (I know nothing) to 10 (I am an expert). Cronbach's alpha was good for the items on human learning (α = .84) and acceptable for the items on topics on artificial intelligence (α = .70). As can be seen in Table 3 , participants' self-reported prior knowledge on human learning correlated with their scores on the comprehension test, whereas prior knowledge on artificial intelligence did not. These results support our decision to include participants' prior knowledge on each topic separately. Topic interest. We constructed a self-reported 8-item questionnaire on participants' topic interest. They rated from 1 (not interested at all) to 10 (very interested) their interest in the same eight subtopics they rated in the prior knowledge questionnaire. The reliability was good for the items on human learning (α = .87) and acceptable for the items on artificial intelligence (α = .75). Medium preference and use. We constructed a 5-item (two reversed) questionnaire to measure participants' medium preference for reading to learn. For each item, participants rated from 1 (I totally disagree) to 10 (I totally agree) a statement regarding the use of printed vs. digital texts Note. a After excluding outliers in comprehension. b In number of participants. for learning purposes (e.g., I understand and memorize better when I study reading an electronic text than when I read on paper). A mean score above 5 points indicated preference for paper. This questionnaire showed a good reliability level (α = .81). In addition, participants indicated at what age they started to use digital devices regularly, and how many hours a day they use them for leisure and for educational/professional purposes. Perceived text difficulty. Participants were asked to indicate, after reading, their perceived text difficulty from 1 (Very easy) to 10 (Very difficult) in a single Likert-scale item. Situational interest. Participants rated, after reading, their interest in the text content from 1 (Not interesting at all) to 10 (Very interesting) in a single Likert-scale item. Tasks were completed in one small group session (six participants maximum). All the participants in each session performed the experimental task under the same condition (i.e., same reading time-frame and reading media). Sessions were conducted in a silent room and lasted approximately 75 min. Participants first completed the self-reported questionnaires on prior knowledge and topic interest, followed by the working memory test. Then, participants were introduced to the reading task: "You are now going to read an article to learn as much as you can, because you will be asked to complete a test consisting of 21 four-alternative multiple-choice questions on the text content. Please note that you won't be allowed to go back to the text while answering the questions". The instructions for the free-time condition continued as follows: "You can read the article at your own pace. When you consider you have read enough, raise your hand and wait to be given the comprehension test. It is important that you do not disturb the other participants, so please do everything very silently". The pressured-time groups had only 16 min and 30 s to read the text, which represents the 75 percent of the mean time that the participants in the pilot study took to read the article at their own pace. The instructions for this condition continued as follows: "You must keep in mind that you have little time to read the article. You only have 16 min and a half, which is the 75 percent of the time that a group of people spent on average when reading at their own pace. I will let you know when you have gone through half the time, also when there have 4 min left, and finally when there is only 1 min and a half left. You will have to stop reading when you are told that the time is up and you will then receive the questions". Afterwards, participants were instructed in how to perform the mindwandering probe-caught task, and they were reminded to be honest when answering the probes. Participants in the pressured-time condition were probed 10 times, the last one just before reading time was over, while the number of probes for the participants in the free-time condition ranged from 10 to 19. When the reading task was finished, participants predicted their performance on the reading comprehension test, and subsequently completed it. Then, they reported their perceived text difficulty and interest on the text, and answered the medium preference and use questionnaire. Finally, those who read under time pressure were asked whether they were able to finish reading the whole text and to indicate at what point they had to stop approximately. Note. a Once outlier values (±2SD from the sample mean) replaced using winsorization. b In daily hours. Pearson correlations between all the measured variables. Note. *p < .05; **p < .01; ***p < .001. Two participants were excluded from the dataset because they did not perform the task properly. 2 Data distribution of the reading comprehension scores, TUT rate, and calibration was inspected before conducting the main analyses. Six participants were identified as outliers, as they scored in the reading comprehension test below 2 SDs from their time-frame group mean (sizeable differences in this measure existed between participants in the free and pressured-time groups). They were excluded from subsequent analyses, and therefore the final sample consisted of 132 participants, still appropriate according to the results of the power analyses reported above. Once outliers removed, sample's mean score on the reading comprehension test was 8.72 (SD = 2.63; range: 3-13) Table 1 shows the number of participants within each experimental group. ANOVA and chi-squared analyses indicated no significant differences between the experimental groups regarding participants' age, sex, grade year, and bachelor's degree (all ps > .12; see also Table 1 ). Approximately half of the participants who read under time pressure mentioned that they could not read the whole text. Among those, they had to stop at some point of the second half of the last page of the 4-page text. Chi-squared analysis showed that they were equally distributed across reading medium groups, X 2 (1, N = 66) = .55, p = .46 (n = 14 in the in-print group; n = 17 in the on-screen group). Table 2 includes descriptive information for all measured covariates for each group, and Table 3 shows Pearson correlations among all variables. Fig. 2 shows the correlation pattern among correlated variables. Outliers for each covariate were identified (±2SD from the sample mean) and replaced by the next highest or lowest score that was not an outlier (i.e., winsorization; Field, 2013) . The number of outliers for each covariate ranged from 2 to 6 cases, representing less than 5% of the data. As can be seen in Table 2 , all the covariates were then normally distributed, as values for Kurtosis and Skewness were within the ±2 range (George & Mallery, 2010) . Regarding correlations between covariates and dependent measures, scores in the comprehension test positively correlated with working memory and situational interest, and negatively with text difficulty. The TUT proportion positively correlated with perceived text difficulty, and negatively with participants' topic interest in artificial intelligence and situational interest. The remaining covariates didn't correlate with text comprehension scores or TUT proportion. Finally, participants' calibration index correlated with participants working memory. To ensure that groups were comparable, we performed a series of ANOVAs with reading medium and reading time as independent variables, and each possible covariate as dependent variable. Only two of them differed between groups. Participants under pressured time reported higher perceived text difficulty regardless of medium, F(1, 128) = 4.73, p = .03, η 2 p = .04, and self-reported prior knowledge in human learning was higher in the in-print groups than in the on-screen groups, F (1, 126) = 9.85, p < .01, η 2 p = .07. Based on the differences between groups and on the correlations between covariates and dependent measures, we included working memory, situational interest, and prior knowledge in human learning as covariates in the ANCOVA for text comprehension scores. The same covariates as well as participants' topic interest in artificial intelligence were included in the analysis of TUT proportion. In this case, controlling for working memory was a decision driven theoretically by previous evidence (i.e., working memory and mindwandering generally correlate negatively; Unsworth & McMillan, 2013; Randall et al., 2014) . Perceived text difficulty was not included as covariate in both cases due to its substantive dependence on the reading time-frame conditions (Miller & Chapman, 2001) . Finally, participants' working memory was included as covariate in the ANCOVA for metacognitive calibration (see Fig. 2 ). Means and standard deviations for the TUT proportion in each experimental condition can be seen in Table 4 . ANCOVA assumptions with respect to normality, homogeneity of variances, and homogeneity of regression slopes between the three included covariates and the dependent variable were met. Thus, a two-way (medium x reading time) ANCOVA revealed no main effect of medium, F(1, 124) = 1.21, p = .27, and a and reading time-frame, F(1, 124) = 2.09, p = .15, on the TUT proportion. However, an interaction effect between these two factors qualified their effects on mindwandering, F(1, 124) = 4.03, p = .047, η 2 p = .03. Post hoc Bonferroni-corrected pairwise comparisons indicated that, across time-frames, participants reported a significant lower TUT proportion in the in-print group under time pressure than under freetime, F(1, 124) = 5.56, p = .02, η 2 p = .04, while it was not statistically different across reading time groups in the on-screen condition, F < 1. Moreover, across reading media, the TUT proportion was significantly lower in the in-print group than in the on-screen group when reading under time pressure, F(1, 124) = 4.52, p = .03, η 2 p = .04. There was no difference between media under free reading time, F < 1 (see Fig. 3 ). Participants' POPs for the comprehension test were compared to their actual performance to examine whether their metacognitive calibration differed across experimental groups. We conducted a two-way repeated measures mixed ANOVA with medium and time-frame as between-participants factors, and calibration as a within-participants factor (i.e., participants' POPs vs. text comprehension scores). Given that calibration is our focus here, we report only results from the tests of within-participants effects. Results revealed no differences between POPs and actual performance in the whole sample, F(1, 124) = 2,88, p = .09, and no interaction effects between calibration and medium, F < 1, calibration and reading time-frame, F(1, 124) = 3.50, p = .07, and calibration, medium and time-frame, F < 1 (see also Table 4 ). Thus, participants showed to be well calibrated regardless of the reading medium and the reading time-frame. The fact that some participants under time pressure could not finish reading the text might have affected their calibration. We therefore rerun the mixed ANCOVA described above only in the pressured-time groups, with medium and whether the participants had finished reading the text as between-participants factors. There was no effect of finishing reading the text and no interaction between this factor and medium, both Fs < 1. Finally, differences on text comprehension scores were examined by means of a two-way ANCOVA. Means and standard deviations for each group can be seen in Table 4 . ANCOVA assumptions with respect to normality, homogeneity of variances, and homogeneity of regression slopes between the included covariates and the dependent variable were met in all cases except for participants' situational interest. Thus, we applied the blocking procedure (Tabachnick & Fidell, 2014) to include this covariate as an independent variable, as it allowed to focus on the effects of the independent variables of interest (i.e., medium and time-frame) once the variation of this covariate is removed from the estimated error. Following Tabachnick and Fidell (2014) procedure, a new independent variable was created by categorizing situational interest values into three levels (low, medium, high) based on percentiles of 33 and 66. We then performed a 3-way ANCOVA, with medium, time-frame, and situational interest as independent variables, and working memory and prior knowledge on human learning as covariates. Results revealed no main effect of medium, F (1, 118) = 1.00, p = .32, a significant main effect of reading time-frame, F(1, 118) = 13.36, p < .001, η 2 p = .10, indicating higher scores in the free time condition, and a significant interaction effect of medium and time-frame on comprehension outcomes, F(1, 118) = 4.82, p = .03, η 2 p = .04. Across time-frames, post-hoc Bonferroni-corrected pairwise comparisons showed that in the on-screen condition, participants who read under time pressure scored significantly lower than those who read at their own pace, F(1, 118) = 17.50, p < .01, η 2 p = .13. This was not the case for the in-print condition, where participants scored similarly regardless of the reading time-frame, F < 1. Furthermore, across reading media, participants who read under time pressure on screen scored significantly lower than those who read in print, F(1, 118) = 4.89, p = .03, η 2 p = .04, while no medium difference was found for participants reading with free Note. a Once outliers in text comprehension removed (±2SD; n = 6). b Estimated percentage of correct answers. c Percentage of correct answers. time, F < 1 (see Fig. 3 ). Given the expected negative correlation between participant's TUT proportion and reading comprehension outcomes (see Table 3 and Fig. 2) , we further explored the role of mindwandering on the effects of medium and time-frame on text comprehension scores. To that end, we re-run the ANCOVA described above introducing TUT proportion as covariate. The results showed that the interaction effect of medium and time-frame on comprehension scores lost significance, F(1, 117) = 3.22, p = .07. The present investigation examined for the first time how reading medium affects readers' on-task attention while reading an authentic, lengthy expository text. It also contributed to the research efforts on how medium affects reading comprehension and metacognitive calibration. Controlling for a comprehensive set of covariates, our findings revealed that reading on screen prevented readers from reducing their mindwandering when the task requirements called for efficient reading. Accordingly, participants reading on screen under time pressure scored significantly lower in the reading comprehension test, relative to those in the other three groups. Finally, contrary to our expectations, participants were equally well calibrated regardless of the experimental group. Next, we discuss the implications of these results with respect to the influence of mind-wandering on text comprehension, and how the lack of increased attention when reading on-screen can explain the screen-inferiority effect. Our experimental design provided a direct test of two potential underlying factors of the screen inferiority effect: inattentive reading and metacognitive calibration deficit. With respect to our first hypothesis, we expected to observe inattentive reading (i.e., higher TUT rate) when reading on screen, as compared to in-print reading, regardless of the time allocated to reading. As a consequence of this increased mindwandering, we hypothesized that disruption in reading comprehension due to mindwandering would be especially noticeable under time pressure. Yet, inattentive on-screen reading, although as such confirmed by our results, emerged in a different manner. When participants read at their own pace, they mindwandered to a similar extent, regardless of the medium. But when they read with time constraints, only in-print participants reduced the frequency of TUTs. Previous evidence has shown that learners can control the occurrence of mindwandering as the task demands call for it, especially those with greater working memory capacity (Rummel & Boywitt, 2014; Smallwood & Andrews-Hanna, 2013) . Thus, given that in the present study participants' working memory did not differ across groups, we should have observed also reduced mindwandering when reading on screen under time pressure. Nevertheless, our results indicate that on-screen readers struggle to adjust to such high task demands. Although the size of this effect in our study was small according to Cohen's benchmark (η 2 p = .04, equivalent to d = 0.41), it was larger than the effect of increased mindwandering when reading difficult (vs. easy) texts found by previous studies (OR = 1.24, equivalent to d = 0.12, in Feng et al., 2013; R 2 = .016 equivalent to d = 0.25, in Mills, D'Mello, & Kopp, 2015) . Similar findings were recently reported by Latini, Bråten, Anmarkrud, & Salmerón, 2019 , who found that in-print readers were more able than on-screen readers to adapt to the learning demands of a multiple document comprehension task. Specifically, when instructed to prepare for an exam, as opposed to reading for pleasure, in-print readers wrote longer essays and indirectly integrated better the information from different sources. Such adaptation was not present among on-screen readers. In line with previous findings (Feng et al., 2013; Soemer & Schiefele, 2019; Unsworth & McMillan, 2013) , the TUT proportion in our study correlated negatively with text comprehension scores. Results from the in-print group indicated that participants reduced their mindwandering while reading under time pressure, as compared to when reading with free time. This accommodation of readers' attention could have counteracted the detrimental effect of time pressure, resulting in similar comprehension scores across the two in-print groups regardless of the time-frame. However, that was not the case for participants who read on screen. They did not adaptively decrease the frequency of TUTs, when reading under time pressure, and their reading comprehension scores were significantly lower in this condition, which is in line with previous findings showing on-screen inferiority when reading under time constraints (Ackerman & Lauterman, 2012; Delgado et al., 2018; Lauterman & Ackerman, 2014; see) . Moreover, the fact that the interaction effect of medium and time-frame on reading comprehension outcomes lost significance when controlling for participants' TUT proportion supports the idea that the on-screen reading inferiority in our study was driven, at least partially, by the fact that participants did not decrease their mindwandering when the task demanded higher on-task attention when reading on-screen. In sum, our findings partially support the inattentive reading hypothesis by pointing to difficulties observed by on-screen readers to meet the task demands calling for increased on-task attention. Accordingly, the fact that on-screen participants in the pressured-time group didn't reduce their mindwandering provided direct evidence for a shallow processing during on-screen reading (Annisette & Lafreniere, 2017) . Unexpectedly, our results showed that undergraduates could accurately calibrate their reading comprehension regardless of reading medium and time-frame, contrary to previous findings yielding better calibration when reading in print (e.g., Ackerman & Lauterman, 2012) . It was also unexpected that our participants were well calibrated in all the experimental conditions, as it has been widely reported that learners tend to be overconfident (Stone, 2002) . The experimental procedure employed in the present study may have helped participants to make accurate predictions about their level of performance. Firstly, the available reading time in the pressured-time condition was certainly scarce, as indicated by the fact that approximately half of the participants in these groups could not reach the end of the text. Secondly, the caught-probe technique could make participants aware of their distraction while reading. Therefore, although our results indicated that not having finished reading the entire text did not provide additional support for participants' calibration, the time pressure as well as tracking their own attention might have been used by participants as cues to identify to what extent they could complete the reading assignment. The observed significant negative correlation between the TUT proportion and POPs supports this idea. Therefore, these two circumstances could have lead participants to be cautious rather than overconfident in their POPs. Thus, in spite of our results, the metacognitive deficit observed in prior studies for on-screen reading can't be discarded. So far, whether on-screen reading harms meta-cognitive calibration, and under what circumstances it occurs, if so, still remains an open question (Singer Trakhman, Alexander, & Silverman, 2018). The deficit in meta-cognitive monitoring when performing tasks on screen, especially under time pressure, reported in previous research (Ackerman & Lauterman, 2012; Clinton, 2019; Lauterman & Ackerman, 2014; Sidi et al., 2017) could be related to difficulties in reducing mind-wandering. There is no reason to consider that such deficit is constrained to metacognitive judgments. On the contrary, mind-wandering could affect other monitoring processes related to task engaging, such as meta-consciousness. Several studies have examined the meta-cognitive status of mindwandering tracking participants' awareness of their TUTs. According to the meta-awareness hypothesis (Schooler, 2002) , results showed that readers are often unaware of their mindwandering (Smallwood, 2013; Smallwood & Schooler, 2006) . Thus, a broader meta-cognitive deficit could lead not only to an increased calibration inaccuracy, but also to generate TUTs more often when reading on screen compared to printed texts. This possibility could be tested in future studies. The present study is not exempt from limitations. Although using an authentic, lengthy text is a strength of our study, we can't generalize our results to shorter texts. Furthermore, we can only ensure that the reported differences appeared when reading this particular type of text. Therefore, future research should test if the inattentive reading hypothesis can explain screen inferiority in shorter learning tasks (cf. Sidi et al., 2017) , as well as in a wide variety of text structures, difficulty, or genres. In addition, although our study included a comprehensive set of covariate measures, other individual factors, such as participants' sustained attention capacity, or a more exhaustive working memory measurement, could help to further explain mind-wandering during on-screen reading. Future studies could investigate whether the on-screen inferiority effect depends on those individual differences, to the extent that they can be related to the inattentive reading as an explanation (cf. Ben-Yehudah & Brann, 2019). Furthermore, this research line could be especially relevant among school-aged students, as they show a larger variability with respect to individual differences. It is also noteworthy that we can't rule out the possibility that reading on screen is not inattentive in itself, but its disruptive effect may be caused by the fact that reading on desktop screens involves a body position that could hinder on-task attention. Reading is deemed a cognitive activity with an embodied nature, and the physical relationship between reader and text is different between these media (Mangen & van der Weel, 2016) . In this vein, the regular body posture used while reading printed materials on a desk could facilitate the immersion in the text, increasing on-task attention, as compared to reading on screen. A recent study by Mangen et al. (2019) investigated reading a lengthy literary text on a Kindle vs. a printed book. They found that reading engagement was similar across both media, as well as most of the measures of participants' comprehension of the text. However, those who read in print constructed a better mental representation of the story chronology. Given that readers' engagement with the text was similar regardless of the medium, the authors concluded that the difference found in this mental representation was due to differences in readers' sensorimotor experience with the reading devices. If this circumstance somehow also impacts comprehension of complex and lengthy expository texts, the effect could have been even larger in our study, because our participants read on a stationary computer screen that did not allow any physical interaction with the reading device. To rule out this possibility, our study could be replicated by using tablets instead of desktop computers, given that handheld devices allow for a reader-text interaction more similar to that of reading on paper. Another issue that was not addressed by our study was the interaction between the learners' age and the impact of the reading medium on text comprehension. It is still unclear whether the negative influence of reading on screens varies over individuals' development. Most of the studies comparing comprehension outcomes and metacognitive calibration across reading media have been conducted with undergraduates (Clinton, 2019; Delgado et al., 2018) . Future investigations should examine the possible relationship between readers' age and the impact of the reading medium on attention and metacognition, given that executive functions are not fully developed until adulthood (see Best, Miller, & Jones, 2009 ). Finally, we should also note that the probe-caught technique used to measure participants' mindwandering was undoubtedly disrupting for their reading performance. In this sense, although this circumstance is equally present in all the experimental groups, we cannot rule out the possibility that it could have affected each group differently. It is possible that the on-screen/time pressure readers in our study were disrupted to a greater extent than those who read in print. If so, the conclusion would be that on-screen readers are more susceptible to the negative impact of extraneous tasks on attention. Further research should address this issue by using less invasive methods to measure readers' on-task attention, such as electroencephalography (e.g., Broadway, Franklin, & Schooler, 2015) or post-task retrospective questionnaires (e.g., Sanchez & Naylor, 2018, Experiment 2) . Our results show that reading on screen lead to inattentive reading particularly when the task demands an increase in on-task attention for efficient information processing. We argue that this inattentive reading is causing, at least in part, a shallow information processing and lower comprehension. The findings support current concerns indicating that digital technologies, under certain circumstances, hinder reading and learning (Baron, 2015; Salmerón & Delgado, 2019; Wolf, 2018) . As argued above, although the size of the on-screen inferiority effect under time pressure found in our study was small according to Cohen's (1988) benchmarks (η 2 p = .04, equivalent to d = 0.41), analyzing it in context provides a richer picture. This effect size is located at the lower bound of Hattie's 'zone of desired effects' in educational contexts (i.e., a medium effect; Hattie, 2009) , and it represents more than the yearly growth in reading comprehension during elementary school (0.32; Luyten, Merrell, & Tymms, 2017) . Our findings emphasize the need to incorporate the effect of reading medium on general models of text comprehension. Based on the RESOLV model (Rouet et al., 2017) , we could argue that the medium seems to be one of those contextual factors that influence readers' engagement with texts. Still, many factors regarding reading medium effects remain unknown. Future research should shed further light on how reading medium interact with other factors, such us individual characteristics (e.g., attentional capacity; Ben-Yehudah & Brann, 2019), task demands (e.g., task goals; Latini, Bråten, Anmarkrud, & Salmerón, 2019) , texts features (e.g., presence of illustrations; Latini, Bråten, & Salmerón, 2020) , or additional contextual factors (e.g., classroom vs reading at home; Daniel & Woody, 2012) . A major challenge for future research is to clarify under what circumstances the medium effect becomes more salient and what factors could mitigate its consequences. Furthermore, our results should be a matter of concern to educational practitioners and policy makers. There are clear educational scenarios where on-screen reading, in light of our results, should be avoided. For example, taking exams on screen could prevent students from fully demonstrating their knowledge and skills, because they may struggle to adjust their attentional focus to their full potential. Recent studies analyzing the change of the mode (from print to computer) in the PISA tests support this conclusion, as indicated by lower scores from the computer-based tests (Jerrim et al., 2018; Robitzsch, Lüdtke, Goldhammer, Kroehne, & Köller, 2020) . Moreover, mindwandering has demonstrated to exert a negative impact on academic performance (Hollis & Was, 2016; Seli, Wammes, Risko, & Smilek, 2016) . Thus, if performing learning activities on screens hinders students' ability to control the generation of task-unrelated thoughts, this will negatively impact their learning processes and outcomes. That being said, suggesting a ban on digital technologies for educational purposes would be naïve in the XXI century. They are here to stay and they offer a wide range of educational possibilities. Nevertheless, we call educational practitioners and policymakers to consider the fact that printed texts are more appropriate when it comes to in-depth reading, especially with lengthy texts. In this regard, educational systems should be especially cautious with recent campaigns supporting a complete shift from printed to e-text books. Instead, on the one hand, we should find an appropriate balance between the use of printed materials and digital technologies by means of evidence-based decisions. On the other hand, we find it necessary that academic curricula include training in the appropriate use of digital devices as learning tools in order to help students fully benefit from them. In this regard, further research efforts should address how to overcome the on-screen reading inferiority. These are major goals for education in the Digital Age. Pablo Delgado: Conceptualization, Investigation, Formal analysis, Writing -original draft. Ladislao Salmerón: Supervision, Writing -review & editing. The authors do not have any interests that might be interpreted as influencing the research, and APA ethical standards were followed in the conduct of the study. The questions used in the text comprehension test are listed below (translated from Spanish). To construct the questions and the options we followed the guidelines of Ozuru, Best, Bell, Witherspoon, and McNamara (2017) , and accordingly each question has 4 response options consisting of "one target and three type distractors: near-miss, thematic, and unrelated. Near-miss distractors have a large conceptual overlap with the target answer and the idea is located in the passage but in an inappropriate context. […] Thematic distractors are answers that are plausible but contain erroneous information based on common misconceptions (not located within the passage). […] Unrelated distractors are highly improbable, or inconsistent with the theme of the passage." (p. 408). As reported in the manuscript, we constructed three types of questions, depending on whether they addressed the comprehension of textbased information, local inferences, or global inferences (also according to Ozuru et al., 2017) . However, the performed exploratory factor analysis (EFA) did not support the assumption that our test measured three different constructs based on these three types of questions, respectively. Thus, after excluding seven questions based on the results from the EFA considering that the test measured only one factor, we assume that our test provided a general measure of participants' mental representation of the text, including both text-based and inferential comprehension, as well as their recall of what they had read in the text. Both the included and the excluded questions are listed below. Questions included: 1. Which of the following statements best represents the main idea of the text? (Global inference) a. Understanding how the human brain learns will allow us to develop robots that are as intelligent as people. 5. It has been proposed to develop robots whose artificial intelligence system would allow a better understanding of attention deficit hyperactivity disorder. To this end, such system will be designed so that its predictive processing: (Global inference) a. Imitates humans after learning by interacting with them. (Correct answer) b. Prefers unpredictable stimulation. (Near-miss distractor) c. Behaves according to what we know about said disorder. (Unrelated distractor) d. Is not able to pay attention when interacting. (Thematic distractor) 6. In order to bring artificial intelligence systems closer to human intelligence, it is proposed that it would be necessary that their learning and development process take place as a complex "waterfall system". This is to represent that artificial intelligence needs to acquire complex knowledge: (Local inference) Taking reading comprehension exams on screen or on paper? A metacognitive analysis of learning texts under time pressure Social media, texting, and personality: A test of the shallowing hypothesis Words onscreen: The fate of reading in a digital world Validación de la Escala INFLESZ para evaluar la legibilidad de los textos dirigidos a pacientes Pay attention to digital text: The impact of the media on text comprehension and self-monitoring in higher-education students with ADHD Executive functions after age 5: Changes and correlates Early event-related brain potentials and hemispheric asymmetries reveal mind-wandering while reading and predict comprehension Reading from paper compared to screens: A systematic review and meta-analysis Statistical power analysis for the behavioral sciences Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis Working memory and language comprehension: A meta-analysis E-textbooks at what cost? Performance and use of electronic v. print texts Don't throw away your printed books: A meta-analysis on the effects of reading media on reading comprehension Construction, integration, and mind wandering in reading Reconsidering the mind-wandering reader: Predictive processing, probability designs, and enculturation G* power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences Mind wandering while reading easy and difficult texts Discovering statistics using IBM SPSS Statistics Interest-based text preference moderates the effect of text difficulty on engagement and learning Evaluating effect size in psychological research: Sense and nonsense SPSS for windows step by step: A simple guide and reference Computational analyses of multilevel discourse comprehension Modelling the relationships among reading instruction, motivation, engagement, and achievement for adolescents Visible learning: A synthesis of over 800 meta-analyses relating to achievement Interest, reading, and learning: Theoretical and practical considerations Mind wandering, control failures, and social media distractions in online learning. Learning and Instruction PISA 2015: How big is the 'mode effect'and what has been done about it? Oxford Review of Education What mind wandering reveals about executivecontrol abilities and failures Investigating effects of reading medium and reading purpose on behavioral engagement and textual integration in a multiple text context Does reading medium affect processing and integration of textual and pictorial information? A multimedia eye-tracking study Overcoming screen inferiority in learning and calibration Reading behavior in the digital environment: Changes in reading behavior over the past ten years The contribution of schooling to learning gains of pupils in Years 1 to 6. School Effectiveness and School Improvement How good is this page? Benefits and limits of prompting on adolescents' evaluation of Web information quality Comparing comprehension of a long text read in print book and on Kindle: Where in the text and when in the story? The evolution of reading in the age of digitisation: An integrative framework for reading research Test theory: A unified treatment Toward a comprehensive model of comprehension Why does working memory capacity predict variation in reading comprehension? On the influence of mind wandering and executive attention Misunderstanding analysis of covariance The influence of consequence value and text difficulty on affect, attention, and learning while reading instructional texts. Learning and Instruction A model of online reading engagement: Linking engagement, navigation, and performance in digital reading Influence of question format and text availability on the assessment of expository text comprehension Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learning and Instruction How people read on the web: The eyetracking evidence Metacognitive calibration-an extended conceptualization and potential applications Mindwandering, cognition, and performance: A theory-driven meta-analysis of attention regulation Reanalysis of the German PISA data: A comparison of different approaches for trend estimation with a particular emphasis on mode effects RESOLV: Readers' representation of reading contexts and tasks Controlling the stream of thought: Working memory capacity predicts adjustment of mindwandering to situational demands Critical analysis of the effects of digital technologies on reading and learning Mindwandering while reading not only reduces science learning but also increases content misunderstandings Re-representing consciousness: Dissociations between experience and meta-consciousness On the relation between motivation and retention in educational contexts: The role of intentional and unintentional mind wandering Understanding metacognitive inferiority on screen by exposing cues for depth of processing. Learning and Instruction Profiling reading in print and digital mediums Distinguishing how from why the mind wanders: A process-occurrence framework for self-generated mental activity Not all minds that wander are lost: The importance of a balanced perspective on the mind-wandering state The restless mind Text difficulty, topic interest, and mind wandering during reading. Learning and Instruction Exploring the relationship between calibration and self-regulated learning Using multivariate statistics Best alternatives to Cronbach's alpha reliability in realistic conditions: Congeneric and asymmetrical measurements Mind wandering and reading comprehension: Examining the roles of working memory capacity, interest, motivation, and topic experience A journey around alpha and omega to estimate internal consistency reliability Changing our textual minds. Towards a digital order of knowledge Reader, come home: The reading brain in a digital world Studying in the region of proximal learning reduces mind wandering This investigation has been funded by the project "Investigación neuroeducativa sobre el efecto de superioridad del papel -AYUDAS FUNDACIÓN BBVA A EQUIPOS DE INVESTIGACIÓN CIENTÍFICA 2018". We would like to thank all the participants, Laura Royo and Vittoria Lutzu for her assistance during data collection, Prof. Jukka Hyönä for his comments on the manuscript, and the editorial Prensa Científica S.A. for kindly providing the reading materials. Supplementary data to this article can be found online at https://doi. org/10.1016/j.learninstruc.2020.101396.