() 1 eWorkbook: a Computer Aided Assessment System Gennaro Costagliola, Filomena Ferrucci, Vittorio Fuccella, Rocco Oliveto Dipartimento di Matematica e Informatica, Università di Salerno Via Ponte Don Melillo, I-84084 Fisciano (SA) {gcostagliola, fferrucci, vfuccella, roliveto}@unisa.it Abstract Computer Aided Assessment (CAA) tools are more and more widely adopted in academic environ- ments mixed to other assessment means. In this paper, we present a CAA Web application, named eWorkbook, which can be used for evaluating learner’s knowledge by creating (the tutor) and taking (the learner) on-line tests based on multiple choice, multiple response and true/false question types. Its use is suitable within the academic environment in a blended learning approach, by providing tu- tors with an additional assessment tool, and learners with a distance self-assessment means. In the paper, the main characteristics of the tool are presented together with a rationale behind them and an outline of the architectural design of the system. 1 Introduction In blended learning the electronic means are mixed with the traditional didactics, in order to train and to assess the learners. Learning Management Systems (LMS), enhanced with collaborative envi- ronment support, and Computer Aided Assessment (CAA) tools are more and more widely adopted in the academy. At the University of Salerno some systems and platforms have been tested to sup- port blended learning. Even if some good existing systems with LMS capabilities, like OpenUSS (OpenUSS, 2005), Chef (Chef, 2005), and Sakai (Sakai, 2005) have been used, none of the tested tools for assessment satisfied all of our needs: we needed an advanced assessment tool which could have helped the lecturers to speed up the onerous task of assessing a huge mass of learners and could have been easily integrated with the LMS systems already in use in our department. A state of the art analysis undertaken at our department, which involved several lecturers and stu- dents, allowed us to identify the following important requirements for an effective environment for developing and using assessment tests: • High reusability of the authored content. • Didactics organized in courses and classes. • Flexible access control system to the tests. • Quality tracking for the authored content. • Rich reporting section. A project for a comprehensive Web-based assessment system, named eWorkbook, was then started. The system can be used for evaluating a learner’s knowledge by creating (the tutor) and taking (the learner) on-line tests based on multiple choice, multiple response and true/false question types. Even though eWorkbook allows the creation of on-line tests for both assessment and self- assessment, it was planned above all for summative purposes. The questions are kept in a hierarchi- cal database, that is, it is tree-structured, in the same way as the file system of an operating system. In such a structure, the files can be thought of as questions, whether the directories can be thought of as macroareas, which are containers of questions usually dealing with the same subject. A macroarea can still contain other macroareas. The tutors are free to organize the tree as they wish, e.g. keeping the questions of the same course in a macroarea and further splitting it according to the 2 chapters they cover. Every item (a macroarea or a question) has an owner, which is the tutor that au- thored it. The tutors can choose whether to share their questions or not, assigning a value to the permissions associated to each item. Permissions are for reading, writing and using the items. Some other information about the questions is present in the database, such as: difficulty, quality, lan- guage, keywords, number of times the question was selected for a test and expected time for a learner to answer. The tests are composed of one or more sections. This structure facilitates the selection of the ques- tions from the database, but it is still useful for the assessment, where it can be important to estab- lish if one section is more important then another to determine the final grade for the test. There are two kinds of sections: static and dynamic. The difference between them is in the way they allow question selection. For both the static and the dynamic sections, a macroarea in the question database must be speci- fied. For a static section, the questions are chosen directly from the sub-tree located by the specified macroarea. For a dynamic section, some selection parameters must be further specified, leaving the system to choose the questions randomly across the sub-tree located by the specified macroarea whenever a learner takes a test. Didactics are organized into courses and classes: the tutors responsible for a course, manage its class and choose the tests that must be taken by the learners of that class. There are two different lists of tests within the course interface: the valuable and the self assessment test lists. Each test in the former list is used to determine the learner’s evaluation, while the latter list is just a guide for the learner to self train and assess. Prerequisites and a maximum number of attempts can be defined only for the tests in the valuable list. Different assessment strategies can be bound to a test, when it is selected for the insertion in the valuable or self-assessment list of a course. The choice of an assessment strategy affects the way in which some parameters concur to determine the grade of the test. The parameters are the following: the weight of a question in the test, the number of distractors for a question (only for multiple choice and true/false), the weight of the distractors (only for multiple response), bonus and penalty factors. An assessment strategy is a configuration, that is, an assignment of values for the parame- ters above. Some configurations are preloaded in the system and are referred to as predefined as- sessment strategies. Other configurations can be defined by the tutors and saved in his/her reserved area. We will refer to them as customized assessment strategies. A complete history of learners’ performance on tests of the valuable list is available to the tutor and to the learners themselves. Each record in the history contains the date and the time when the learner has joined a test, the amount of time needed to finish the test and some information about assessment (test score and state). The detail of the answers to each question can be seen as well and can be viewed in a printer-friendly format. The rest of the paper is organized as follows. In Section 2 the main features of the systems are de- scribed in detail. Section 3 is devoted to outlining the architecture of eWorkbook. An example of system use can be found in Section 4. In Section 5, a comparison is made with some interesting sys- tems related to ours. Some final remarks and a description of future work conclude the paper. 2 The Main Features of eWorkbook In the following subsections we will outline the main characteristics of the eWorkbook system. It is worth noting that eWorkbook was intended to be used by a large number of users, so it has a typical LMS didactics organization, based on courses and classes. A course is a place in which the tutors can publish tests and the learners can take them. Learners can only view the tests published in the courses in which they are members. The tutor manages the class and can accept or deny learners’ affiliation requests and expel a learner from the course. 3 2.1 Question Management An important matter for CAA, and more generally for e-learning, in order to accelerate the teaching and the assessment processes, is the reusability of the authored content. The on-line material needs a huge initial effort to be created, while it can be easily modified and reused later on. Therefore it is very important that existing material can be easily found, modified and selected by a tutor who wants to use it for a lesson or a test. There are two main ways to boost the reuse of learning mate- rial: 1. Good organization of material kept in an e-learning platform or CAA system. 2. Interoperability among systems and platforms, to share and exchange material. Our system was designed to have a well organized question database to facilitate the tutor in the question management, share and reuse: the question database of eWorkbook has a hierarchical structure, similar to the directory tree of an operating system. Each item in our database is a disci- plinary macroarea (internal node) or a question (leaf). The membership of a question to a given macroarea is determined by its subject: each macroarea is a container of questions that holds items dealing with a specific subject. It can be further split in other sub-macroareas, which hold questions belonging to a more specific matter. The question types allowed are multiple choice, multiple re- sponse and true/false. The tutor can choose if a question should be used for assessment only, for self-assessment only or for both of them. An effort for the interoperability has been made supporting the IMS Question & Test Interoperabil- ity specification (IMS QTI, 2005): our system can import and export information regarding ques- tions and tests through this widely known and adopted XML-based format. 2.1.1 Permissions Author’s right protection is an important matter too. An e-learning system should offer the tutor the choice to share his own material or not. In eWorkbook, the owner (the tutor who authored the ques- tion) and a permission set are associated to each item. The owner establishes the values for each field of the permission set. A permission is a Boolean value that indicates whether other users be- yond the owner can perform the action associated to that permission. For a macroarea, the value for the following permissions must be set: • ReadPermission: the permission to read the property and the contents of this macroarea. • WritePermission: the permission to overwrite the property and manage this macroarea (add a sub-item to it, delete it). • UsePermission: the permission to select a question from this macroarea for a test. For a question, the permissions are the following: • ReadPermission: the permission to read the question. • WritePermission: the permission to delete and overwrite the question. • UsePermission: the permission to select this question for a test presentation. Its default value is the value of UsePermission of the macroarea which this question belongs to. It’s worth noting that permissions are a good way to protect author’s right and to avoid that the ma- terial owned by a tutor is modified or used without his/her consensus. Other systems only give the possibility to share or not all of the tutor’s questions. A permission based system gives more flexi- bility to the system, allowing different grades of item sharing. 2.1.2 Question Metadata Each question in the database has a metadata set associated to it. Some of the parameters are de- cided by the tutor when he/she instantiate the metadata and they can be updated later, others are in- ferred by the system during its use. Inferred metadata are updated whenever a learner submits a test. Metadata are used in question selection in a way that will be clear in the sequel. The following is a list of the metadata fields: • Language: the human language in which the question is expressed. 4 • Keywords: a set of keywords that describe the content of the question. • Use: the aims the question is for. It can be self-assessment, valuable or both. • TestOccurrence: an inferred field, that is increased by one whenever this question is sched- uled for a test . • AverageAnswerTime: an inferred field. It can be used on our system because it is able to track the time spent by the learner on each question. • Difficulty: this field has both an inferred and a tutor chosen value. It’s a value between 0 and 1 that expresses a measure of the difficulty of the question, intended as the proportion of learners who get the question correct. The tutor can guess this value at the question creation time and can update it during the question’s lifecycle. The system calculates the inferred value with a simple formula. • Quality: this field is an inferred one. Its value is a measure of how well this question dis- criminates between learners. A good question should give full mark to good learners and penalize bad ones. Starting from this information, a great deal of criteria can be adopted. A solution is proposed in (Lira et al., 1990): it identifies a good question as the one which the better 20% of learners answers well and the worse 20% of learners answers incorrectly. We adopted a common solution applied in Item Analysis, calculating quality as the Pearson cor- relation between the score achieved on the question and the total score achieved on the test in which the question was scheduled. Its value is given by the following formula: 1 )( 1 )( 1 ))(( 22 − − − − − −− = ∑∑ ∑ n yy n xx n yyxx r where the following rules are valid: o -1 ≤ r ≤ 1, o x is the series of the results got on the question, o y is the series of results got on the whole test. 2.1.3 Question Quality Improvement Through Question Lifecycle In CAA systems it is important that the quality of the questions is kept high, so that the tutor can as- sess learners properly, using unambiguous questions that really distinguish between good learners and bad ones. eWorkbook adopts the statistical indexes (Difficulty and Quality, seen in the previous chapter) from Item Analysis to get information about the effectiveness of the questions. The improvement of the quality of the question requires the use of a process which allows the tutor to analyze the entire lifecycle of a question, including all its previous versions and the learners’ an- swers to them. Our question database has a Version Control System that allows tutors to change some data of the questions, e.g. text, distractors or metadata, still keeping the previous versions of the question: the upgrade of a question does not imply the erasing of the previous version. This could be an important feature for reasons bound to the history of learner’s responses to the question too: the question could already have been used in some tests before the upgrade, and the system has to remember which version of the question the learner answered. However, the Version Control System is important for reasons related to the quality of the questions: thanks to the tracking of the question lifecycle the tutor has feedback on the variation of statistical indexes over time. In this way, the tutor can modulate the difficulty of the question and make sure that the changes he/she made to it (maybe eliminating misspellings and ambiguities), affected positively the quality of the question. Other information, useful to establish the effectiveness of a question, is available: the tutor can easily inspect how many times it was selected to be presented in a test, the number and the per- centage of correct, incorrect and not answered responses and the average time needed to get the re- sponse. 5 In the light of the previous arguments, we can argue that the definition and the use of questions from the hierarchical repository for more than one session of tests, combined with the version con- trol system, allows the tutors to have a wide choice of high quality questions to select for their online tests. 2.2 Test Management A test is composed of sections. eWorkbook has two ways of selecting the questions to be presented in a test: through a static creation-time choice or a dynamic run-time one. In the first case, the tutor has to choose the questions directly during the creation of the test; in the latter case, she/he has only to specify some selection parameters, letting the system choose randomly the questions across the chosen macroareas whenever a learner takes a test. Therefore, we have two kinds of sections: a static section is an explicit selection of the questions to present performed at test creation time, while a dynamic section is a set of rules that perform a se- lection on the entire database. For a dynamic section, there are three kinds of selection rules: 1. Definition of a path in the tree. The path must start with a ‘/’ character, which identifies the root of the tree. This rule limits the selection only to the questions of the subtree specified by the path. A flag can be set, that further selects only the question at the first level in the sub- tree, without using subfolders. 2. Definition of some keywords. This rule limits the selection only to the questions that match the input keywords. Some logical connectors, in a search engine style can be used. By de- fault, the questions which contain even one of the input keywords are selected. No relevance rate is associated to the results. 3. Definition of some assertions on metadata fields. They are of the following form: . As an example, for a section, we can choose to use only those questions that have difficulty > 0.5. The same three rules are also used to statically select the questions for a fixed section through a wizard in the Web-based interface. The tutors can choose to use just one of them to select the ques- tions, or to combine them to refine or to enlarge the selection. The tutors can also choose whether to use only their material or even the one shared by the other tutors. These rules allowed us to overcome problems related to question selection: different tests for each learner can be generated still getting an objective assessment through the selection of ranges for the difficulty and the average answer time. The discrimination was decided not to be used for question selection assertions, in order to avoid the neglecting of low quality questions. Our policy was to en- courage the tutor to review low quality questions, in order to correct their anomalies and increase their quality. 2.3 Test Presentation Two different lists of tests are presented to the learner within the course interface: the valuable and the self assessment test lists. Each test in the former list is used to determine the learner’s evaluation and is characterized by an access control specified by a prerequisite expression and a maximum number of attempts. The latter list is just a guide for the learner to self train and assess: each test in it has not got any access restriction and does not affect the learner evaluation. Each test presented in a course is bound to some test execution options. These options allow the tutor to customize the test with further information which could not be available or decided at the test creation time, so we choose not to hard-code them in the test. Test execution options include the following information: • IP Limitation: an option through which the tutor can authorize or deny access to some cli- ents, according to their IP. A selection of authorized IP lists must be chosen. This option can be particularly useful for official exams, whose tests are required to be taken only by the 6 learners that physically present in a laboratory. An IP list can be defined and selected for all the PCs of that laboratory. Wildcards and IP ranges can help to define IP lists. • Assessment: a list of options that specify the numeric scale for the mark, the threshold to pass the test and the marking strategy. Details about marking strategies can be found in sec- tion 2.4. • Shuffle: this Boolean option can be checked if the tutor wants to randomize the sequence of the questions, to make it more difficult for the learners to cheat. • Access Control: this section of options is valid only for valuable tests. The tutor can choose the maximum number of attempts allowed for the test and the prerequisites for accessing it. Prerequisites establish, through a simple even powerful expression, the learner’s right to ac- cess the test. If not fulfilled, prerequisites can deny learner’s access to the test. Prerequisites for a test are based on the learner results on the previous tests in the valuable test list. The language supported for the expression is aicc_script; a string expressed in such a language has a Boolean value and it is composed of the following elements: o Identifiers: nouns that univocally identify a test in the valuable list. o Constants: values that define the state of a test (passed, completed, browsed, failed, not attempted, incomplete). o Logic, equality and inequality operators. o A special syntax to define a set and to specify at least n elements from a set. As an example: the expression test1 & 2*{test2, test3, test4} is true if the state of test1 is passed or completed and at least two among test2, test3 e test4 are passed or completed. A simple visual interface helps the tutor to define the prerequisites string without knowing aicc_script language. There is also an aicc_script-to-natural lan- guage translator to help the learner to better understand the prerequisites for a test. A better and more complete explanation of aicc_script can be found in [ADL]. An instance of test execution options is a configuration, that can be saved with a name and recalled in a second time, whenever a new test must be added. 2.4 Assessment Strategies eWorkbook provides a wide choice of predefined assessment strategies and the possibility to define a new customized assessment strategy. An assessment strategy is a set of choices of the values to give to some parameters taken into account during the test assessment process. The predefined strategies, are preloaded in the system and cannot be changed. They are at the disposal of all of the tutors. The customized strategies can be defined by a tutor, and they remain visible only in his re- served area. All the strategies calculate the final mark on the test summing the results achieved in the single questions. The maximum mark which can be obtained on a single question depends on the weight of the question. A weight is assigned by the tutor to each section of questions in a test and the weight of a question is easily calculated dividing the weight of the section by the number of questions in it. The customizable parameters are the following: • Weighting: this parameter, if set, enables the weighted assessment for a test, that is, the maximum mark got on the question depends on its weight. If a tutor wants a section to be more important than the others, he/she has to give a higher weight to it during the test au- thoring, and he/she has to choose an assessment strategy with the weighting parameter on. If this parameter is not set, all the questions equally contribute to get the mark on the whole test. • BonusOnCorrect: this parameter, if set, allows the tutor to specify a positive real factor (bo- nus) by which the mark obtained on the correctly answered questions during the assessment process must be multiplied. • PenaltyOnIncorrect: this parameter, if set, allows the tutor to specify a negative real factor (penalty) by which the weight of the incorrectly answered questions during the assessment 7 process must be multiplied. If not set, the mark obtained on the questions answered incor- rectly is zero. It is possible to choose a fair penalty, which gives to the questions answered incorrectly a mark of –(1/NC-1), where NC is the number of choices for a question. The use of the fair penalty should set to zero the mean mark for a question guessed by a learner who does not know the right answer to it. • PenaltyOnNotAnswered: this parameter, if set, allows the tutor to specify a negative real fac- tor (penalty) by which the weight of the unanswered questions during the assessment proc- ess must be multiplied. If not set, the mark obtained on the unanswered questions is zero. The following table summarizes the values given to the parameters above for each predefined strat- egy. Strategy Name Weighted BonusOnCorrect PenaltyOnIncorrect PenaltyOnNotAnswered NumberCorrect NO NO NO NO WeightedNumberCorrect YES NO NO NO GuessingPenalty NO NO YES (1) NO WeightedGuessingPenalty YES NO YES (1) NO GuessingFairPenalty NO NO Fair NO WeightedGuessingFairPenalty YES NO Fair NO The names of the strategies have been taken from (IMS ASI, 2004). As we can see, for each strat- egy, there is a weighted version. None of the predefined strategies adopts bonuses on correct or penalty on not answered questions. NumberCorrect is a ‘plain’ strategy: none of the parameters is set. Its name is due to the way in which it calculates the mark on the whole test: just summing the number of corrected answers (and scaling the result to 30 or 100). GuessingPenalty and its weighted version WeightedGuessingPenalty use 1 as factor for the PenaltyOnIncorrect parameter. This means that they subtract the entire weight of the incorrectly answered questions from the final mark on the test. GuessingFairPenalty and its weighted version WeightedGuessingFairPenalty, use the fair pen- alty, explained before. 2.5 History Tracking A complete history of a learner’s performances on valuable test list is available to the tutor and to the learner himself. The tutor can view the results achieved by all the learners in his/her classes, while the learner view is restricted only to his/her results. Each record in the history contains the date and the time when the learner joined a test, the amount of time needed to finish the test and in- formation about assessment (test score and state). To consult the history, a search engine style form must be filled. The fields of the form allow the seeker to select a course, a learner and a test whose instances must be shown. Further advanced pa- rameters, which allow to narrow the research, are: the state (terminated, not terminated) and the re- sult (passed, not passed) of the test, a date range during which the test was taken, and the number of results per page. Each instance present in the result pages, has a link to a pdf file that contains a printable version of the test with all the learner’s answers. A unique pdf file for all the instances is available as well. In such a way, all the tests can be saved or printed in one operation. 3 eWorkbook Architecture As shown in Figure 1, eWorkbook has a layered architecture. The Jakarta Struts Framework [Struts] has been used to support the Model 2 design paradigm, a variation of the classic Model View Controller (MVC) approach. Struts provides its own Controller component and integrates with other technologies to provide the Model and the View. In our design choice, Struts works with 8 Java Server Pages (JSP, 2005), for the View, while it interacts with Hibernate (Hibernate, 2005), a powerful framework for object/relational persistence and query service for Java, for the Model. The application is fully accessible with a Web Browser. Navigation is facilitated across the simple interfaces based on menus and navigation bars. User data inserting is done through HTML forms and some form data integrity checks are performed using Javascript code, to alleviate the server side processes. A big effort was made to limit the use of client-side scripts only to the standard EcmaS- cript language (ECMAScript, 2005). No browser plug-in installations are needed. It is worth noting that the system has been tested on recent versions of the most common browsers (i.e., Internet Ex- plorer, Netscape Navigator, Firefox and Opera). Figure 1 - Architecture of eWorkbook The Web Browser interacts with the Struts Servlet that processes the request and dispatches it to the Action Class, responsible for serving it, according to the predefined configuration. It is worth noting that the Struts Servlet uses the JSP pages to implement the user interfaces. The Action Classes in- teract with the modules of the Business Layer, responsible for the logic of the application. The Business Layer accesses to the Data Layer, implemented through a Relational Data Base Manage- ment System (RDBMS), to persist the data across the functionalities provided by Hibernate frame- work. 3.1 Controller Layer This layer has many duties, among which are: getting client inputs, dispatching the request to the appropriate component and managing the view to return as a response to the client. Obviously, the Controller layer can have many other duties, but those mentioned above are the main ones. In our application, following the Struts architecture, the main component of the Controller layer is the Struts Servlet, which represents the centralized control point of the Web application. In particu- lar, the Struts Servlet processes each client request and delegates the management of the request to a helper class, that is able to execute the operation related to the required action. In Struts, the helper 9 class is implemented by an Action Class, that can be considered as a bridge between a client-side action and an operation of the business logic of the application. When the Action Class terminates its task, it returns the control to the Struts Servlet that performs a forward action to the appropriate JSP page, according to the predefined configuration. To reduce the effort to maintain and customize the application, we chose to limit the use of the JAVA code in the JSP pages, using as an alternative the Struts taglibs. In this way the Web design- ers are able to work on the page layouts without shouldering the programming aspects. Finally, thanks to the use of the Struts framework, eWorkbook has the complete support for the internation- alization of the Web-based interface. Even if, in its earlier releases, it only came with the English and Italian versions, the translation is quite an easy duty: to add a new language version all that our system needs is the translation of some phrases in a .properties (plain text) file. The Web pages are returned to Web browsers in the language specified in the header of the request. 3.2 Business Layer This layer contains the business logic of the application. In any medium-sized or big-sized Web ap- plication, it is very important to separate the presentation from the business logic, so that the appli- cation is not closely bound to a specific type of presentation. Adopting this trick, the effort to change the look & feel of eWorkbook is limited to the development of a new user interface (JSP pages), without affecting the implementation of the other components of the architecture. As mentioned before, every Action Class of the Controller Layer is able to execute an operation of the business logic of the application. To this aim, the Action Classes interact with four different subsystem of the Business Layer (see Figure 1). These subsystems are: 1. User Management Subsystem (UMS): this subsystem is responsible for user management. In particular, it provides insert, update and delete facilities. 2. Question Management Subsystem (QMS): this subsystem manages the question database of eWorkbook and controls access to it. It is composed of two modules: a. Question Database Manager: this module allows the management of the hierarchical structure of the question database. Each internal node in it is a disciplinary macroarea, while each leaf is a question. This module allows the insertion, update and deletion of a macroarea and/or a question from the database. b. Access Permission Manager: this module controls access to the question database. For each node of the question tree it is necessary to specify the owner (i.e., the tutor who authored the macroarea or the question) and a permission set. The owner estab- lishes the value for each field of the permission set. 3. Test Management Subsystem (TMS): this subsystem manages the test repository of eWork- book. To achieve this, we have divided this subsystem into four modules: a. Authoring Manager: this module permits to create a new test, defining the questions that compose the test and the test execution options. The Authoring Manager also al- lows the publishing of an existing test in one or more courses; b. Assessment Manager: this module performs the test evaluation and manages the as- sessment strategies; c. Execution Manager: this module manages the test execution. To aim this, the Execu- tion Manager gets a test instance from the Authoring Manager and performs the nec- essary operation to present it to the user. At the end of the test execution this module passes the control to the Assessment Manager to valuate the test; d. History Manager: this module manages the history of a learner’s performance and a test’s execution. 4. Course Management Subsystem: this subsystem manages the courses. In particular, it allows the insertion, update and deletion of a course. It is worth noting that all the subsystems described above access to one or more business objects to manipulate information that is stored in the database. The Hibernate framework is used to manage 10 those business objects that accede to the data layer across an appropriate mapping. The target of this mapping is to transform a relational database (stored in the data layer) in a light OO database; in this way it is possible to manage the data exploiting the advantages provided by the OO paradigm. 3.3 Data Layer This layer contains the information stored in a RDBMS. It is worth noting that eWorkbook is not closely bound to a specific RDBMS, but supports much of the most popular RDBMS (i.e., MySQL (MySQL, 2005), Firebird (Firebird, 2005), etc). All that eWorkbook needs, to be used with a dif- ferent RDBMS, is the modification of the connection URL in the Hibernate configuration file: the creation and initialization of the DB is an automatic process. 4 An Example: The English Knowledge Test eWorkbook was installed on the Web Server of the Faculty and successfully tested for the latest sessions of the English Knowledge Test, which is mandatory in our university. In our faculty, the system was used to replace the traditional oral exam with an on-line structured test, more suitable for assessing a huge mass of students. The test is aimed at evaluating learners’ reading comprehension. The syllabus of the exam is com- posed of twenty passages taken from the textbooks of some ordinary exams. On the day of the exam each learner takes a randomly chosen passage on which his/her test is based. The time to complete the test is fifteen minutes, during which the student has to answer twelve questions. A sixty-seat laboratory is available for the exams, an adequate number of users to test the system in a typical academic usage scenario. 4.1 Question and Test Authoring In eWorkbook, the tutors can edit the question database through a simple visual Web based inter- face. This is quite similar to the my computer browser program which allows an operating system user to edit the file system structure. As shown in Figure 2, the interface is split in two views: one on the left, which shows the question database tree, and one on the right, which shows in an HTML form the attributes of the selected item, so that they can be easily changed. Every sub-tree on the left view can be expanded or collapsed using the ‘+’ and ‘-‘ image controls close to the macroarea icon. A set of buttons, shown in a proper toolbar, allows the tutor to execute various tasks on the items. Each user views only the macroareas on which he has the UsePermission set to true. If an ac- tion is not allowed, the corresponding button is shown greyed. 11 Figure 2 - A Screenshot of the Question Database Structure The publication of a question in the database can be done through a wizard interface provided by our system. The wizard consists of a sequence of screens where the tutor must insert the question, the distractors to the question, the metadata and some assessment information. The publication of a question bank is possible too: it is done by importing the question definition, from a text file or an XML text expressed in an IMS QTI (IMS QTI, 2005) conformant format. A new macroarea, named English Test, was added to the root of the tree. A new course with the same name was activated as well. In the macroarea English Test twenty (one for each passage) sub- macroareas were added. In each of them, several questions were added. All the permissions for the new added macroareas and questions were left. A new test was created for each passage. Every test is composed of three sections of five questions each. The difficulty is increasing over them: an easy section containing four multiple choice ques- tions with difficulty between 0 and 0.5; a medium one containing four multiple choice questions with difficulty between 0.2 and 0.8; a difficult one containing four multiple response questions with difficulty between 0.3 and 1. All the tests were added to the valuable list of the English Test course, limiting the execution of the tests only to the computers with an IP address in the range of the labo- ratory in which the exam takes place. The same test list was also published in the self-assessment section. To encourage the students to get trained, a small part of the questions used for the exam were also used for the self-assessment tests. A screenshot summarizing the test’s feature is shown in figure 3. 12 Figure 3 - A screenshot of the Test Details 4.2 Assessment Policy and Test Results The WeightedNumberCorrect assessment strategy has been chosen to evaluate the tests: to the easy, medium and difficult sections have been given, respectively, 25%, 35% and 40% of the total score. The score has been calculated in a /30 scale, with 18 as a passing threshold. So doing, we consider a student as worthy to get the exam if he/she gets all of the easy and the medium questions and just one of the difficult ones. Figure 4 - A Screenshot of the Test Execution 13 Figure 5 - The Test Pdf Format All the students interested in taking the exam are asked to obtain an account on the system some days before the exam itself. Once the learner takes a test, a timer starts to measure the time he/she spends on that attempt. If he/she hasn’t already done it before, he/she must deliver the test as the timer expires. Even the time spent on each question is recorded. Once the test is delivered, a table summarizing test results is shown. Two screenshots of the test execution and some pages of the test pdf format are shown, respectively, in Figure 4 and Figure 5. At the moment, several exam sessions have been done. The mean pass rate is between 60% and 70% of the students. Some items with poor discrimination have been modified through the sessions. We finally got good discrimination on most of the questions. 5 Related Work Several different assessment tools and applications to support blended learning have been analyzed, starting from the most common Web-based e-learning platforms such as WebCT 4.1 Campus Edi- tion (WebCT, 2005), Blackboard 6 (Blackboard, 2005), Click2Learn Aspen 2.0 (Aspen, 2005), EduSystem (EduSystem, 2005), and The Learning Manager 3.2 (The Learning Manager, 2005). The analysis has been carried out both by exercising the systems and by studying literature surveys and benchmark analyses (EduTools, 2005). Special emphasis has been placed on evaluating the existing systems with respect to the requirements identified in the previous section. In the CAA literature we can find two main categories of assessment systems: those which auto- matically generate questions from the lecture material, and those which make use of a pre-populated question database from which questions are chosen randomly. The first kind of systems, often re- quires the prior creation of a knowledge structure, like a concept graph or an ontology, as for the system described in (McAlpine, 2005). Other systems of this type (Mitkov, 2003) use Natural Lan- guage Parsing to extract information from a text and generate the questions. Using these techniques, 14 it is hard to bet on the good quality or readability of the generated questions. Such drawbacks often relegate the use of this kind of systems only to experimental purposes. The systems which involve the tutor in the task of creating a set of questions to be stored in a data- base, prove to be more reliable and consequently are used more for official exams, in order to ob- tain an objective assessment. Those systems, such as the ones described in (Li & Sambasivam, 2003) and in (Lister & Jerram, 2001), sometimes use an XML test configuration file to define some rules for the question selection. In question database based systems, the challenge is to give a good organization to the database, to avoid question replication, and to use a good question selection pro- cedure in order to assess learners’ skills on the desired subjects. Some systems, like Claroline (Cla- roline, 2005) just use a plain container to keep questions. In Moodle (Moodle, 2005) and (Capuano et al., 2003), the question database is partitioned in sets, often called categories or macroareas, in order to have a per-subject organization of the questions. In (McGough et al., 2001) and (Gusev & Armenski, 2002) a hierarchically structured organization of the database is exploited. In (McGough et al., 2001), a tree is associated to a lesson and each of its branches is used for assessing learners on a part of the lesson. A leaf in this tree is a set of questions. In (Gusev & Armenski, 2002) a more complete but complex system is described, where questions are classified exploiting similarities among them. Only a few systems adopt some kinds of author’s right protection. Claroline and Moodle let the tu- tor choose whether to make his/her questions visible to other tutors or not. Few systems among the analyzed ones have some forms of quality control of the questions. An in- teresting feature is the opportunity to judge a question or a test analyzing the learners’ responses to it. Starting from this information, many criteria can be adopted. In particular, Hicks (Hicks, 2002), reporting his experience with a large class at the University of Newcastle upon Tyne, identifies a good question as the one to which the better 20% of learners answers well and the worse 20% of learners answers incorrectly. In (Lira et al., 1990) the degree of difficulty of a test is calculated us- ing the maximum possible (max) and minimum possible (min) score and the average score (avg) of the class according to the following formula: ((avg – min) / (max – avg)) * 100. eWorkbook has a complete tracking system to judge the quality of a question: every time a signifi- cant change is made to a question, a new version of it is generated. For each version of the question, all the history of the learner’s answers is kept. From a statistical analysis, explained in detail in sec- tion 3.1.4, we can guess the quality of the question and its improvement over time. The attempt to judge difficulty and quality of question items is not a new subject. Two main theories are notewor- thy: Item Analysis and Item Response Theory (Hambleton & Swaminathan, 1985). Unfortunately, it is quite uncommon to find an assessment system that uses one of the effectively. Some explanations and a comparison between them can be found in (Fan, 1998). As for question selection from a large database to compose tests, two algorithms were analyzed: the proposals of (Sun, 2000) and (Hwang et al., 2006).The former is aimed at constructing tests with similar difficulties. The difficulty is calculated using Item Response Theory model. The latter takes into account other parameters too, such as discrimination degree, length of the test time, number of test items and specified distribution of concept weights. Most of the analyzed systems are complete LMS. The assessment tool is an integral part of them. eWorkbook was thought to be used by the large number of users of our university, so we gave to the didactics an organization in courses and classes, to support multiple channels in which to publish the tests. As for a means for sequencing and control access to the tests, none of the tools analyzed has a flexi- ble system. The system described in (Li & Sambasivam., 2003) permits the learner to sit an exam many times, until a minimum acceptable score is achieved. In (McGough et al., 2001) the questions are grouped into sets, and the strategy to pass a set, and consequently access to the next, is to give the correct response to 3 answers in a row for that set. 15 6 Conclusions and Future Work In the paper, we have presented eWorkbook, a system for the creation and deployment of assess- ment and self-assessment tests. The proposed system can significantly accelerate the assessment process, thanks to the reusability of the authored content. We achieved reusability allowing the tu- tors to share their questions with other tutors and adopting a hierarchical subject-based question da- tabase. Such an organization makes it easier to find, modify and select the questions for the tests. The system is even able to interoperate with other CAA systems that support IMS QTI specification. The chance to mix fixed banks of questions with randomly chosen question sections, gives the tutor the chance to get the right compromise between an objective assessment and the sureness to include a wide coverage of subjects. Author’s rights are protected through the use of separate permissions for reading, writing and using the questions. The use of eWorkbook can help tutors in keep high the quality of the assessment, thanks to the Ver- sion Control System. This system tells the tutor if the changes he/she make to the questions posi- tively affect the quality of the question. Other feedback information on questions are available too. Our effort to make the application portable and usable makes it especially suitable for the academic use for which it was conceived, even though it is still a good choice in different environments. The wide choice of assessment strategies and the possibility to extend that choice with new user-defined strategies, help the tutor to tailor the test evaluation to the competency and skill level of the class. The learner can self-assess and fully reap the benefits of blended learning. The definition of access rules, like prerequisites and attempt limitation, compels the learner to follow the right learning path. The report section is rich with information and fit out of charts and tables. The tutor can have a complete and deep control over the performance of the class and the learners even on a single macroarea, and over the effectiveness of the authored resources. The system has been used for the English knowledge test by the students and the teachers of our faculty. The testing has shown that teachers, also with very little technical skills, can easily use eWorkbook to create assessment tests thus fully taking advantage of blended learning. Nevertheless, a more accurate evaluation of the effectiveness of the approach is foreseen for the current academic year. Moreover, future work will be devoted to test the scalability of the system with a larger num- ber of simultaneously on-line users. Other interesting developments are planned as future work. Al- though multiple choice, multiple response and true/false are the most common and widely adopted question types, and they are enough to arrange structured online tests, we are working in order to support other types of questions (e.g. fill-in, matching, performance, sequencing, likert, numeric) and questions based on external tools, like those proposed in (Hicks, 2002). Other efforts will be spent to introduce multimedia elements, like images, video and sound, and rich text capabilities in the rendering of the questions. Finally, case studies to consider pedagogical implications of eWork- book will also be carried out. eWorkbook is distributed under GNU GPL license: its source codes are completely available to the community and can be downloaded from http://sourceforge.net/projects/eworkbook. Acknowledgements We would like to thank the anonymous reviewers for their detailed, constructive, and thoughtful comments that helped us to improve the presentation of the results in this paper. References ADL (2001). The SCORM Content Aggregation Model, Version 1.2. Advanced Distributed Learn- ing Initiative, http://www.adlnet.gov Aspen (2005), Click2Learn Aspen, http://home.click2learn.com/en/aspen/index.asp Blackboard (2005), http://www.blackboard.com 16 Capuano, N., Gaeta, M., Micarelli, A., Sangineto, E. (2003). An intelligent web teacher system for learning personalisation and Semantic Web compatibility. Proceedings of the Eleventh In- ternational PEG Conference, St Petersburg, Russia Chef (2005), CHEF: CompreHensive collaborativE Framework. http://chefproject.org Claroline (2005), http://www.claroline.net Dublin Core (2005), Dublin Core Metadata Initiative, http://dublincore.org ECMAScript (2005), Standard ECMA-262, ECMAScript Language Specification, http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf EduSystem (2005), http://www.mtsystem.hu/edusystem/en EduTools (2005), http://www.edutools.info/course/index.jsp Fan, X. (1998). Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/Person Statistics. Educational and Psychological Measurement, 58 (3), pp. 357- 381. Firebird (2005), http://firebird.sourceforge.net/ Gusev, M., Armenski, G. (2002). onLine Learning and eTesting. In Proceedings of 24th Interna- tional Conference on Information Technology Interfaces, Cavtat, Croatia, pp. 147-152 Hambleton, R.K., Swaminathan, H. (1985). Item Response Theory - Principles and Applications. Kluwer Academic Publishers Group, Netherlands Hibernate (2005), http://www.hibernate.org Hicks C. (2002). Delivery And Assessment Issues Involved in Very Large Group Teaching. In Pro- ceedings of IEE 2nd Annual Symposium on Engineering Education Professional Engineer- ing Scenarios, London, UK, pp. 21/1-21/4 Hwang, G.J., Lin, B.M.T., Lin T.L. (2006). An effective approach for test-sheet composition with large-scale item banks, Computers & Education, Vol. 46 (2), pp. 122-139 IMS ASI (2004), IMS Global Learning Consortium, IMS Question & Test Interoperability: ASI Outcomes processing, Final Specification Version 1.2, http://www.imsglobal.org/question/index.html IMS QTI (2005), IMS Global Learning Consortium: IMS Question & Test Interoperability: IMS Question & Test Interoperability Specification, http://www.imsglobal.org/question/index.html JSP (2005), JavaServer Pages Technology, http://java.sun.com/products/jsp Li, T., Sambasivam, S.E. (2003). Question Difficulty Assessment in Intelligent Tutor Systems for Computer Architecture. Information Systems Education Journal, Vol. 1 (51) Lira, P., Bronfman M., Eyzaguirre J. (1990). Multitest II - A Program for the Generation, Correc- tion and Analysis of Multiple Choice Tests. IEEE Transactions on Education, Vol. 33 (4), pp. 320-325 Lister, R., Jerram, P. (2001). Design for web-based on-demand multiple choice exams using XML. In Proceedings of IEEE International Conference on Advanced Learning Technologies, Madison, Wisconsin, USA, pp. 383-384 McAlpine, M. (2005). Principles of Assessment. CAA Centre, University of Luton, http://www.caacentre.ac.uk/dldocs/Bluepaper1.pdf 17 McGough, J., Mortensen, J., Johnson, J., Fadali, S. (2001). A Web-based Testing System with Dy- namic Question Generation. In Proceedings of 31st ASEE/IEEE Frontiers in Education Conference, Reno, NV, USA, pp. S3C - 23-8 vol. 3 Mitkov R. (2003). Computer-Aided Generation of Multiple-Choice Tests. Proceedings of the HLT- NAACL 2003 Workshop on Building Educational Applications Using Natural Language Processing, pp. 17 - 22 Moodle (2005), http://www.claroline.net MySql (2005), http://www.mysql.com OpenUSS (2005), http://www.openuss.org Proctiæ, J., Bojiæ, D., Tartalja, I. (2001) test: Tools for Evaluation of Learners' Tests - A Develop- ment Experience. In Proceedings of 31st ASEE/IEEE Frontiers in Education Conference, Reno, NV, USA, pp. F3A - 6-F3A-12 vol. 2 Sakai (2005), Sakai Project, http://www.sakaiproject.org Struts (2005), The Apache Struts Web Application Framework, http://struts.apache.org Sun, K. T.. (2000). An effective Item Selection Method for Educational Measurement. In Proceed- ings of International Workshop on Advanced Learning Technologies, Palmerston North, New Zealand, pp.105 - 106 The Learning Manager (2005), http://thelearningmanager.com WebCT (2005), http://www.webct.com