key: cord-0058675-dec96jua authors: Faria, Susana; Salgado, Carla title: A Bivariate Multilevel Analysis of Portuguese Students date: 2020-08-20 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58808-3_34 sha: 296ae9219cacaf3168b5b9df7857f58f562c2ec0 doc_id: 58675 cord_uid: dec96jua In this work, we illustrate how to perform a bivariate multilevel analysis in the complex setting of large-scale assessment surveys. The purpose of this study was to identify a relationship between students’ mathematics and science test scores and the characteristics of students and schools themselves. Data on about 7325 Portuguese students and 246 Portuguese schools who participated in PISA-2015 were used to accomplish our objectives. The results obtained by this approach are in line with the existing research: the index of the socioeconomic status of the student, being a male student and the average index of the socioeconomic status of the student in school, positively influence the students’ performance in mathematics and science. On the other hand, the grade repetition had a negative influence on the performance of the Portuguese student in mathematics and science. The Programme for International Student Assessment (PISA) has attracted the attention of many researchers and educational policy makers over the past few years. PISA provides an international view on the reading, science and mathematics achievement among 15-year students from countries of the Organization of Economic and Cultural Development (OECD) every 3 years since 2000. The PISA survey is a self a self-administered questionnaire that tests student skills and gathers information about student's family, home and school background. Since PISA 2015, different from the previous studies, the assessments of all three domains were mainly conducted on computers [1] . Portugal as a founding member of the OECD participated in all editions of PISA. In the early's 2000s, Portugal's performance in PISA was one of the lowest among OECD countries. However, Portugal is one of the few OECD member countries where there has been a tendency for significant improvement in results in the three assessment areas. In PISA 2015, for the first time, Portuguese students ranked significantly above the OECD average score for reading and scientific literacy, being on the OECD average for mathematical literacy [4] . This study utilized data from the 2015 PISA Portuguese sample to investigate the factors from both student and school perspectives, that impact the mathematics and science achievement of 15-year-old students. Given the hierarchical nature of data (students nested into schools), a multilevel approach is adopted to investigating the impact of school resources and students' characteristics on performance [5] . Multilevel models simultaneously investigates relationships within and between hierarchical levels of grouped data, thereby making it more efficient at accounting for variance among variables at different levels than other existing analyses [6] . With this type of data, classic methods, such as ordinary least squares regression, would not produce correct standard errors [7] . In the literature there are many papers devoted to the analysis of potential factors that influence on student achievement using PISA data and applying multilevel analysis. Examples of the country specific analysis using PISA results can be found in [8] , which examines the relationships among affective characteristicsrelated variables at the student level, the aggregated school-level variables, and mathematics performance by using 2012 PISA Indonesia, Malaysia, and Thailand sample. Using PISA data, Giambona and Porcu study factors affecting Italian students' achievement paying attention on school size [9] . In [10] , a multilevel analysis was applied to the OECD-PISA 2006 data with the aim to compare factors affecting students' achievement across Italy and Spain. Using PISA 2012 data, Karakolidis et al. investigated the factors, both at individual and school level, which were linked to mathematics achievement of Greek students [11] . Wu et al. studied the relationship between principals' leadership and student achievement using PISA 2015 United States data [12] . In [13] , the study aimed at investigating to what degree external factors, such as cultural and economic capital, parental pressure, and school choice, are related to 15-year-old students' achievement in digital reading and in overall reading on both the student level and the school level in Norway and Sweden, using PISA data from the two countries. The paper is organised as follows. In Sect. 2, we provide a brief description of the methodological approach adopted in the paper and in Sect. 3 we present the dataset. In Sect. 4, we present the main results arising from our data analysis, while Sect. 5 contains final remarks. The model proposed for the empirical analysis is a bivariate two-level linear model in which students (level 1) are nested in schools (level 2) and the outcome variables are mathematics and science achievements. Several studies are present in literature on the mathematics and sciences achievements, where they are treated as separate, applying univariate multilevel models. Using this bivariate model it is possible to compare the associations between students' characteristics or school resources and their performances in mathematics and science achievements. Let N = J j=1 n j be the total number of students, where n j , j = 1, . . . , J is the total number of students of the j − th school. For each school j = 1, . . . , J and student i = 1, . . . , n j where y ij is the bivariate outcome with mathematics and science achievements of student i in school j; β = (β 0 , . . . , β K ) is the bivariate (K + 1)-dimensional vector of parameter; x kij is the value of the k − th predictor variable at student's level; α = (α 1 , . . . , α L ) is the bivariate (L)-dimensional vector of parameter; w lj is the value of the l − th predictor variable at school's level; b j ∼ N 2 (0, Σ) is the matrix of the bivariate random effects (mathematics and science) at school level, where the covariance matrix of the random effects at school level is given by (2) and ∼ N 2 (0, W ) is the matrix of errors, where the covariance matrix of the errors is given by ( We assume b j independent of . The modelling procedure in this study has three steps. In the first step, the null model (with no independent variables) was fitted. This model is statistically equivalent to one-way random effects analysis of variance [7] and it is motivated to partition the total variance in the outcome variable into the different levels in the data. At the second step, a student-level model was developed without variables at the school level. School variables were added to the student model at the third step. The sample was drawn from the PISA 2015 data set (see [2] ) for an overview). A complex multi-stage stratified random sampling was used to sample the Portuguese 15-year-old student population. The first stage consisted of sampling individual schools in which 15-year-old students could be enrolled. Schools were sampled systematically with probabilities proportional to size, the measure of size being a function of the estimated number of eligible (15-year-old) students enrolled. The second stage of the selection process sampled students within sampled schools. Once schools were selected, a list of each sampled school's 15-yearold students was prepared. From this list, 42 students were then selected with equal probability (all 15-year-old students were selected if fewer than 42 were enrolled). The number of students to be sampled per school could deviate from 42, but could not be less than 20 (see [3] ). The Portuguese sample of the PISA 2015 includes 7325 students nested in J = 246 schools. However, analyzing the data set, it was found that there was a school with a lack of information on many variables, so it was removed from the sample. In addition, all students who had missing values in some of the variables were removed. Thus, the final data includes 6549 students nested in J = 222 schools. Our outcome of interest were mathematics and science achievements which are scaled using item response theory to a mean of 500 and a standard deviation of 100. Because the PISA features complex booklet designs, the methods used to estimate achievement are similar to multiple imputation and result in several plausible values (PV) for each student. Following the recommendations for addressing PVs in international large-scale assessments (see [15] ), we consider all 10 PVs simultaneously as the dependent variables for the purpose of obtaining unbiased and stable estimates. We remark that in PISA 2015 data, student performance is represented by 10 plausible values each (compared with only 5 plausible values in the previous studies). When data are based on complex survey designs, sampling weights have to be incorporate in the analysis and any estimation procedure that does not take into account sampling weights provides biased results. To avoid the bias in parameter estimates and to produce nationally representative findings, sampling weights for students and schools provided by the PISA database were included in our analysis. Centering is an important issue in multilevel analysis. In this study, all predictors were centered on the grand mean at both the student and school levels. The purpose of this is to reduce the multicollinearity among variables and bias in variance estimates so that a more meaningful interpretation can be made [16] . We report some descriptive statistics for the PVs in Table 1 and Table 2 . We can see that, although plausible values fluctuate within persons, the means and standard deviations of the ten distributions are very close to each other, as one would expect given their generation from the same distributions. Figure 1 shows the scatter plot mathematics achievement versus science achievement (only for plausible value 1). It is immediately clear that there is a positive correlation between the performances of students in the two results. The correlation coefficients among ten PVs of mathematics performance and science performance were computed and the values ranged between 0.891 and 0.901. With regard to explanatory variables at the student and school level, we include in our model some of the variables that have been most frequently identified as influential in literature. Students-level variables: -Age: represents the age of the student using the information drawn from the question referring to the month of birth; -Gender: dummy variable that takes the value one if the student is male and the value zero if the student is female; -Rep: dummy variable that takes the value one if the student have retaken at least one year of schooling and zero otherwise; -Imi: dummy variable that takes the value one if the student is not native and zero otherwise; -ESCS: student's socio-economic status. This variable is is derived from several variables related to students' family background: parents' education, parents' occupations, a number of home possessions that can be taken as proxies for material wealth, and the number of books and other educational resources available at home. Greater values on ESCS represent a more advantaged social background, while smaller values represent a less advantaged social background. A negative value indicates that the socio-economic status is below the OECD mean. School-level variables: -TYPE: dummy variable that takes the value one if the school has private status and takes the value zero if the school is public; -LOC: dummy variable that takes the value one if the school is located in a town and zero if the school is located in a village; -RAP: students per teachers ratio; -PRAP: proportion of girls at school; -MESCS: average of student's socio-economic status. Some related descriptive statistics are presented in Table 3 and Table 4 . Regarding students' demographic characteristics, Table 3 shows that 50.2% of the 15-year old students were male, only 5.6% of the students were immigrants and the percentage of students who reported grade repetition was 37.0%. In fact, Portugal is one of the OECD countries with a high repetition rate before the age of 15 years old. Regarding for school-level variables, 91.0% of the schools are public corresponding to 95.7% of students (in Portugal the number of public schools is higher than number of private schools). Table 3 also shows that 55.4% of the schools were located in towns corresponding to 59.7% of students. Table 4 shows that the average age of the students was 15.78 years (SD = 0.28, minimum = 15.33, maximum = 16.33). The average student's economic, social and cultural index was approximately −0.40 (SD = 1.15, minimum = −4.15, maximum = 3.08), which means that Portuguese students have a lower economic, social and cultural index than the average of the students across all participating OECD countries of the PISA Program and a greater variability. In general, students with ESCS equal to or greater than two are very socially and culturally advantaged. We see that the average proportion of girls in schools is close to 50% and ranged between 26.83% and 68.54%. The average number of students per teacher is 10.42 students per teacher but there is a great variability of number of students per teacher in Portuguese schools. In fact, there are schools with only 1.98 students per teacher and schools with 41.42 students per teacher. The school PISA index of the economic, social and cultural status values ranged between This section contains the main results obtained by estimating the bivariate twolevel multilevel model explained above to our dataset. All statistical analyses were performed considering sample weights to ensure that the sampled students adequately represent the analyzed total population. We report average results obtained from the use of each one of the PVs. Firstly, the null model was estimated. This model allows us to explore the correlation structure of the two outcomes. Table 5 reports the estimates of regression coefficients and variance-covariance parameters of the null model. Table 5 shows that 53.53/(53.53 + 37.08) = 59.07% of the total variance in maths achievement was account for by school-level and 51.80/(51.80 + 37.08) = 58.28% of the total variance in science achievement was account for by school-level. These results show that the school explain a relevant portion of the variability in achievement. Note that two scores are highly correlated. Table 6 reports the estimates of regression coefficients and variancecovariance parameters, after removing non-significant variables at the student and school level. These models, presented theoretically in Sect. 2, are developed using the R package nlme (see [14] ). With reference to the considered explanatory variables, the student level covariates Age and Imi were not significant, that is, there is not a relevant correlation between scores and age and immigrant and non-immigrant students scored equally well in mathematics and sciences. At school level, the only significant variable was MESCS (school-level average of student socio-economic status). In Table 6 , the intercepts represent the average scores for the baseline student: female, not retook at least one year of schooling, and all the other covariates set at mean values. The performance of the baseline student is beyond the international mean of 500 in two outcomes, though the average score in Math is lower than the average score in Science. In general, the coefficients have the expected signs. Considering the results of existing studies with the same purpose as this work, being a male is positively associated with better results in Maths and Science. Students who have retaken at least one year of schooling have a lower performance in Maths and Science, meaning that these students have more difficulties than students who have not retaken, especially in mathematics. The ESCS is positively associated with the achievements and have similar coefficients in both the fields, suggesting that students with a high socio-economical level are educationally advantaged. Also, at school level, attending a school with higher mean ESCS helps to reach a higher score. Looking at the variance/covariance matrix of the random effects, we can see that the variability of the random effects are similar (18.16 vs 17.76), therefore attending a specific school has the same influence on the results in mathematics and in science. The two effects are positively correlated. The research was conducted to find out the relationship between students' achievements and the characteristics of students and schools themselves. The analysis relies on a bivariate multilevel model, thus accounting for both the bivariate nature of the outcome and the hierarchical structure of the data. Based on an analysis of PISA 2015 data, our findings are in line with the previous studies in which univariate multilevel models had been applied. The use of bivariate multilevel model allow us to test for differences in the regression coefficients, thus pointing out differential effects of the covariates on the two outcomes. As expected, the characteristics of students and schools are associated in the same way with the two outcomes: mathematics and science achievements. What is PISA? OECD: PISA databases PISA 2015 -PORTU-GAL.Volume I: Literacia Científica, Literacia de Leitura e Literacia Matemática Hierarchical data modeling in the social sciences An introduction to hierarchical linear modeling Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd edn Affective characteristics and mathematics performance in Indonesia, Malaysia, and Thailand: what can PISA 2012 data tell us? Large-scale Assessments Educ School size and students' achievement. Empirical evidences from PISA survey data Educational disparities across regions: a multilevel analysis for Italy and Spain Mathematics low achievement in Greece: a multilevel analysis of the Programme for International Student Assessment (PISA) 2012 Principal leadership effects on student achievement: a multilevel analysis using Programme for International Student Assessment 2015 data A multilevel analysis of Swedish and Norwegian students' overall and digital reading performance with a focus on equity aspects of education. Largescale Assessments Educ Core Team. nlme: linear and Nonlinear Mixed effects models. R package version 3 International large-scale assessment data: issues in secondary analysis and reporting Introducing Statistical Methods. Introducing Multivlevel Modeling Acknowledgements. This work was supported by the strategic programme UID-BIA-04050-2019 funded by national funds through the FCT.