Evaluating the achievements of computer engineering department of distance education students with data mining methods


 Procedia Technology   1  ( 2012 )  262 – 267 

2212-0173 © 2012 Published by Elsevier Ltd.
doi: 10.1016/j.protcy.2012.02.053 

Evaluating the achievements of computer engineering department of 
distance education students with data mining methods 

Baha Sena *, Emine Ucarb 

aKarabük University, BaliklarKayasi Mevkii, Karabük, 78050, TURKEY 
bMinistry of National Education, Bakanlıklar, Ankara, 06420, TURKEY 

 
Abstract 

Recently, the internet technology has become an indispensable part of life, a very useful application that cannot be earlier have 
made it possible. One of these is distance learning technologies. Due to limitations of traditional learning-teaching methods in 
classroom activities and practitioners who intend to conduct training activities in the absence of the possibility of communication 
and interaction among learners with special education units are prepared and provided a wide range of media center through a 
certain method of teaching. According to a further recognition of Distance Education, although far away from each other with the 
student who teaches the same time (synchronous) or different time (asynchronous) communications with a tool as training system 
established. The aim of this study is to compare the achievements of Computer Engineering Department students in Karabük 
University according to criteria such as age, gender, type of high school graduation and whether the students studying in distance 
education or regular education using data mining techniques. Also discussing the differences of the techniques according to the 
results and to make suggestions for which technique would be more effective. 
© 2011 Published by Elsevier Ltd. 
 
Keywords: Distance Education; Data Mining; Decision Trees; Artificial Neural Network 
 

1. Introduction 

Rapid developments in information societies has also changed interests and needs of today people while causing 
continuous changing and developments on lives of individuals and cultural, social and economic structure  of 
societies. Now, people who can reach, use and produce information are needed. Therefore, development of Distance 
Learning has been well accelerated. 
Distance Learning has many advantages. The most important of these are considered to be reproducible, 
distributable and accessible easily. Beside these advantages, integration of computer-aided systems, utilization of 
multimedia tools and techniques, reaching contents quickly and cost-efficiently over internet, increasing user 
interaction with help of new Technologies has provided the acceptance of distance learning sometimes as a support 
to formal education and sometimes as an education technique itself. 
One of the first studies on data mining applied in education was published in 1995 by Sanjeev and Zytkow. 
Researchers gathered the knowledge discovery as terms like “P pattern for data in the range R” from university 
database [1]. 
Another study on data mining applied in education was published in 2000 by Becker and his friends who are 
performed for defining and understanding the impact of changes in curriculum on students at a university in Brasil 
[2]. 

 
* Baha Sen. Tel.: +90-370-433-2021; fax: +90-370-433-3290. 
E-mail address: baha.sen@karabuk.edu.tr. 

Available online at www.sciencedirect.com

Open access under CC BY-NC-ND license.

Open access under CC BY-NC-ND license.

http://creativecommons.org/licenses/by-nc-nd/3.0/
http://creativecommons.org/licenses/by-nc-nd/3.0/


263 Baha Sen and Emine Ucar  /  Procedia Technology   1  ( 2012 )  262 – 267 

A data mining application in which defining of student characteristics are used for measuring the satisfaction of 
students at higher education was performed by Luan in 2002 [3]. 
Maltepe University students identifying characteristics had been clustered using K-means algorithm in 2005 by 
Erdoğan and Timor. In that study 722 students’ data was used and the relationship between the university entrance 
exam results and achievements was examined [4]. 
Vranić and  Skoćır  was examined  how to improve some aspects of educational quality with data mining algorithms 
and techniques by taking a specific course students as target audience in academic environments [5]. 
In the second part of this study traditional and distance education concepts were examined. In the third section a data 
mining application was developed with using data from the Karabük University Computer Engineering Department 
students. In the conclusion sharing the experiences and findings obtained from this application is intended. 
 
2. Formal and Distance Education 

Formal education is a regular education that uses programs prepared in accordance with a purpose for the same level 
of certain age group and individuals at a school building. Formal education includes institutions of preschool, 
primary, secondary and higher education [6]. Distance education is an education that is realized with educator and 
students without being in the same place. This feature of distance education provides opportunity of learning for 
anyone at any age, place, time and speed [7]. 
The most obvious difference between distance education and classical education is completing their education 
(primary, secondary and higher education) without going to school, leaving their jobs and leaving their private lives. 
 

3. Methodology 

Data mining is relatively a new technique to the world of information sciences. Successful implementation of this 
technique requires a sound methodology built on best practices. In this research study, we followed a popular data 
mining methodology called Cross Industry Standard Process for Data Mining (CRISP-DM), which is a six-step 
process [8]: 

 Problem description: Involves understanding project goals with business perspective, transforming this 
information into data mining problem description and making project plan to reach the related goals. 

 Understanding the data: Involves identifying the sources of data, obtaining an initial set of data to assess 
the information coverage of the data for the problem on hand. 

 Preparing the data: Involves pre-processing, cleaning, and transforming the relevant data into a form that 
can be used by data mining algorithms. 

 Creating the models: Involves developing a wide range of models using comparable analytical techniques 
(i.e., selecting the appropriate modeling technique and setting the parameters related to the model to 
optimal values). 

 Evaluating the models: Involves evaluating and assessing the validity and the utility of the models against 
each other and against the goals of the study. 

 Using the model: Involves in such activities as deploying the models for use in decision making processes 
(i.e., making it a part of the decision support system/process). 

A graphical representation of the methodology used in this study is shown in Figure 1. 


264   Baha Sen and Emine Ucar  /  Procedia Technology   1  ( 2012 )  262 – 267 

 
Fig.1. A graphical illustration of the methodology employed in this study 

3.1. Data 
In this study 3047 records were used which is taken by Karabük University Computer Engineering Department. 
Dataset have students' information such as age, gender, type of secondary school graduation, whether the students 
study in distance education or regular education and their lesson scores. And also dataset has information about the 
lesson taken by students in vocational lessons or cultural lessons. 
 

Table 1. The list of independent variables used in this study 
Variable Name Data Type Description 
Gender Text Students’ gender 
Age Number Students’ age 
Type of High School Graduation Text Students’ high school type 
Distance/Regular Education Text Students’ education type 
Lesson Type Text Type of lessons 

 
Scores of students which are studying in Karabük University are represented by the letter system. Score ranges of 
these letters are shown in Table 2. 

Table 2. The output variable used in the study 
Raw-Score Nominal Representation 

    90-100 A1 
    80-89 A2 
    70-79 B1 
    65-69 B2 
    60-64 C 
    0-60 F 

 
3.2. Data Mining Methods 
In this study, two popular prediction/classification methods are used (and compared to each other): artificial neural 
networks, and decision trees. These prediction methods are selected because of their superior capability of modeling 
classification type prediction problems and their popularity in recently published data mining literature. What 
follows is a brief description of these modeling techniques. 

 Artificial Neural Networks: Artificial neural networks (or NN, in short ) are commonly known as 
biologically inspired mathematical techniques, capable of modeling extremely complex nonlinear functions 
[9]. In this study, we used a popular NN architecture called multilayer perceptron (MLP) with back-
propagation type supervised-learning algorithm. MLP is capable of producing both classification and 
regression type prediction models, where the only difference is the output variable being nominal or 
numeric for classification or regression estimations. MLP is shown to be a strong function approximator for 


265 Baha Sen and Emine Ucar  /  Procedia Technology   1  ( 2012 )  262 – 267 

prediction problems, that is, given the right size and the structure, MLP is shown to be capable of learning 
highly complex nonlinear relationships between input and output variables [10].  

 Decision Trees: As the name implies, this technique recursively separates observations in branches to 
construct a tree for the purpose of achieving the highest possible prediction accuracy. In doing so, different 
mathematical algorithms (e.g., information gain, Gini index, Chisquare statistics, etc.) are used to identify a 
variable (from the available variable pool) and the corresponding threshold for that variable to split the pool 
of observations into two or more subgroups. This step is repeated at each leaf node until the complete tree 
is constructed. The most popular decision tree algorithms include Quinlan's ID3, C4.5, C5 and Breiman’s 
CART (Classification and Regression Trees) algorithms. In this study, we choose to use Quinlan’s C5 
algorithm, which is an improved version of C4.5 (a very popular decision tree algorithm used by 
researchers and practitioners since early 1990s) [11, 12, 13]. 
 

4. Results and Conclusions 

The prediction results of the two modeling methods are presented in Table 3. The results presented in Table 3 are 
the 10-fold cross validation results. Since the output variable had six nominal values, the confusion matrixes show 
6x6 square matrix. In the confusion matrixes the rows represent the actual and the columns represent the predictions. 
The right most columns show the prediction accuracies for each of the six output variable values. The overall 
accuracy of each model is presented at the bottom of the right most columns.  

As the results indicate, all of the classification methods performed reasonably well in predicting the six-value 
nominal variable. Among the two model types, decision tree algorithms produced the best prediction results with 
97.8107% overall accuracy on 10 fold holdout dataset. Decision tree models followed by artificial neural networks 
with an overall accuracy of 94,3752%. 

Table 3. Prediction results for classification methods (presented in confusion matrixes) 
Artificial Neural Network    

 A1 A2 B1 B2 C F Accuracy 
A1 171 21 0 0 0 0  
A2 24 334 15 0 0 0  
B1 0 19 536 22 0 0  
B2 0 0 16 322 13 0  
C 0 0 0 0 519 18  
F 0 0 0 0 19 920  
      Overall 94.3752% 

Decision Trees    
 A1 A2 B1 B2 C F Accuracy 

A1 197 10 0 0 0 0  
A2 12 353 4 0 0 0  
B1 0 8 555 13 0 0  
B2 0 0 5 333 2 0  
C 0 0 0 0 531 3  
F 0 0 0 0 8 935  
      Overall 97.8107% 

 
The students' ages range from 18-38 and the success chart of students based on the age is shown in Figure 2. As 
show in the graph students' success rate has inverse ratio with students’ age and the success score decreases with 
increasing age.  


266   Baha Sen and Emine Ucar  /  Procedia Technology   1  ( 2012 )  262 – 267 

 
Fig. 2. Success graphic based on age 

Figure 3 shows that the students' success is much better in the distance education or formal education. When we 
analyzed the graphic we can see that the students' scores between 65-80 are studying in the distance education and 
the students' scores between 80-100 are studying in the formal education. Also the students’ scores less than 60 are 
the most in the distance education. 

 
Fig. 3. Success graphic based on the type of education 

Looking at the students’ school type, the students which come from vocational high school are the 5% of total. 
Therefore, as shown in figure 4 students are more successful in the cultural lessons than the vocational lessons. 

 
267 Baha Sen and Emine Ucar  /  Procedia Technology   1  ( 2012 )  262 – 267 

Fig. 4. Success graphic based on the type of lesson 

References 
1. A. P. Sanjeev ve J. M, Zytkow. “Discovering Enrollment Knowledge in University Databases,” 1th Conference on KDD (Montreal. 

20-21 August 1995), 246. 
2. K. Becker, C. Ghedini ve E.L. Terra, “Using KDD to analyze the impact of curriculum revisions in a Brazilian university,” SPIE 14th 

Annual International Conference (Orlando. April 2000), 412. 
3. J. Luan, “Data Mining, Knowledge Management in Higher Education, Potential Applications”, 42nd Associate of Institutional 

Research International Conference (Toronto,Canada: 2002), 1. 
4. Ş.Erdoğan, M. Timor, “A Data Mining Application in a Student Database,” Havacılık ve Uzay Dergisi. Cilt No 2,Sayı 2: 57-64, (July 

2005), 57. 
5. M.Vranić, D. Pintar, Z.Skoćır, “The Use of Data Mining in Education Environment,” ConTEL 2007 (Zagrep 13-15 June 2007), 243. 
6. Internet: Eğitim Sisteminin Genel Yapısı, http://www.meb.gov.tr/Stats/Apk2002/3_2.htm 
7. H. E.Koçer, “Web tabanlı uzaktan eğitim”, Yüksek Lisans Tezi, Selçuk Üniversitesi Fen Bilimleri Enstitüsü, Konya, 1-100 (2001)  
8. C. Shearer, “The CRISP-DM model: The new blueprint for data mining” Journal of DataWarehousing, (2000). 5: 13-22. 
9. S. Haykin, Neural Networks and Learning Machines (3rd Ed.). (2008).  New Jersey: Prentice Hall. 
10. K.Hornik,, M.Stinchcombe and H.White, “Universal approximation of an unknown mapping and its derivatives using multilayer 

feedforward network” Neural Networks, (1990). 3: 359-366. 
11. L.Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, (1993).  San Mateo, CA. 
12. L. Quinlan, “Induction of decision trees” Machine Learning, (1986).  1: 81–106. 
13. L.Breiman, J.H.Friedman, , R.A. Olshenm and C.J.Stone, Classification and regression trees, Wadsworth & Brooks/Cole Advanced 

Books & Software, (1984). Monterey, CA.