About the Author(s)


Bertram Haskins Email
Department of Information Technology, Nelson Mandela Metropolitan University, South Africa

Reinhardt A. Botha symbol
Department of Information Technology, Nelson Mandela Metropolitan University, South Africa

Citation


Haskins, B. & Botha, R.A., 2017, ‘Aligning mathematics with tutoring platform topics’, The Journal for Transdiciplinary Research in Southern Africa 13(1), a428. https://doi.org/10.4102/td.v13i1.428

Original Research

Aligning mathematics with tutoring platform topics

Bertram Haskins, Reinhardt A. Botha

Received: 09 Mar. 2017; Accepted: 02 June 2017; Published: 08 Aug. 2017

Copyright: © 2017. The Author(s). Licensee: AOSIS.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Dr Math was a mobile mathematics tutoring service, used by school learners across South Africa. The wealth of historic data available, with regard to the conversations between tutors and learners, may contain valuable insights as to which mathematics topics are most frequently encountered on the Dr Math service. This alignment may serve as an indicator of the utility of an online tutorial service as a reflection of the curriculum covered by learners, and as an extra avenue of support. This study makes use of automated means to rank the topics discussed on the Dr Math service and to align them with the topics encountered in the South African National Senior Certificate final examinations. The study finds that there is a close alignment with regard to the observations of the Department of Basic Education on factors influencing the performance of the learners. The topics most frequently discussed on the Dr Math service also align closely with the topics with which the learners have most difficulty in their final exams.

Introduction

Dr Math was a mathematics tutorial service accessible on cellular phones. This service allowed South African school learners to get in touch with human mathematics tutors. The tutoring service cost the learner no more than the data charges applicable to using an online chat platform. The mobile nature of the service also allowed the learners to gain access to the tutors from wherever they found themselves at a given moment.

The Dr Math service is, unfortunately, now defunct, but during its operation it stored all of the conversations between the tutors and learners as anonymous text files. These text files contain a wealth of information, to be mined. Using Dr Math and its historic system logs as a case study may reveal subtle alignments between the tutoring process and the real-world educational process.

Objectives of the study

Dr Math was and is still by no means the only tutoring service available. Although the availability of tutoring services, such as Dr Math, may be seen as a blessing for providing much needed extra help and information to South African learners, it is yet to be determined whether the tutoring help provided by such a service aligns with the South African educational curriculum in any shape or fashion. To that end, this study makes use of Dr Math as a case study.

There are many ways in which both to mine and to interpret the data from the learner–tutor conversations on Dr Math. As the Dr Math service was available to all South African learners, it provided a feasible means of gathering an overview of which mathematical topics are problematic to the school learners in South Africa. Gathering this information would make it possible to determine whether the learners on Dr Math requested help on the same topics with which learners are struggling during examinations. Techniques used in natural language processing (NLP) may be used to discover whether an alignment exists between these topics. If such an alignment exists, then it follows that these online tutoring services align with aspects of the South African mathematics curriculum and, as such, provide a useful extra avenue of support to school learners.

To address this hypothesis, the study proposes to answer a single research question, namely how closely do the mathematical topics that are most frequently discussed on the Dr Math service align with the performance of learners in National Senior Certificate (NSC) Paper I Mathematics examinations? In order to address this issue, it is necessary firstly to determine how learners perform, with regard to specific topics, in their NSC Mathematics final exams. Furthermore, it is necessary to determine which mathematical topics occur most frequently on the Dr Math service.

Research approach

This study has been conducted in five phases. The first phase constitutes the gathering of background information, by means of literature study. The background information consists of the historic performance data (in the form of rankings) from the South African NSC Mathematics final examinations, as well as information on how the conversations on the Dr Math service may be processed automatically to extract useful information.

This study aims to use the equations found in mathematical textbooks as a means to search through the historic Dr Math logs. By aligning these equations with the chapters and high school grades in which they are found, it is possible to align Dr Math conversations according to mathematical topics. In order to perform this task, several equations need to be processed. Thus, the second phase of the study discusses how the equations were captured and converted to a form that may be used as search terms.

In the third phase of the study, an automated process is used to extract the equations embedded in the conversations on the Dr Math service. These equations are saved in a list, to be aligned with the equations identified in the second phase. The Dr Math conversations are all recorded in anonymous text files which serve as source data for this study. Our study makes use of the Dr Math text logs for the years 2010–2013, which contain 248 993 individual lines of conversation, to provide a historical overview of the equations, and associated topics, discussed on the service.

In the fourth phase of the study, the equations from phases 2 and 3 are aligned. These alignments are used to determine in which grades an equation is encountered, as well as to provide a ranked list of the most frequently encountered topics on the Dr Math service. The final phase of the study attempts to draw an alignment between the NSC rankings created in phase 1 with the Dr Math rankings compiled in phase 4.

Background and related work

To place the conversations on the Dr Math service in context, this section, which serves as the first phase of the study, firstly discusses the performance of learners on the NSC examinations. Furthermore, an overview is provided of what the Dr Math service is as well as the means by which its historic conversations may be processed automatically.

Historic learner performance

The main goal of this study is to determine whether the topics addressed by the learner–tutor conversations on the Dr Math service reflect the topics with which the South African school learners struggle most on their exams. The South African Department of Basic Education publishes yearly diagnostic reports, which contain a breakdown of the issues faced by learners in the final NSC examination papers of all subjects.

The 2011 NSC examination results for Mathematics report that many of the errors made by learners have their origins in a poor understanding of the basic and foundational mathematical competencies, which have been taught in earlier grades (Department of Basic Education 2012). These concepts include algebraic manipulation, factorisation, the solution of equations and inequalities. The authors of the report make the observation that candidates struggle to form a conceptual understanding of the topics presented. This may stem from their only attempting to answer those mathematical questions and forms put to them in a classroom environment and may explain why they face difficulty when confronted with varying forms of the same concept. This aligns with the observations made by Mhlolo, Venkat and Schafer (2012).

In their study Mhlolo et al. (2012) conclude that many opportunities to have learners gain a deeper understanding of mathematical concepts or to relate these concepts to real-world concepts are missed because the teachers do not have the knowledge or capacity to use metaphors or analogies as teaching instruments. Many learners do not have the necessary background to form the connections with these abstract concepts without their being tied to real-world foundations. Setati (2008) lends further support to this argument by stating that one of the reasons for this lack of connection may be the language in which the learners are educated. If the language in which they are taught differs from their home language, most of their effort is focused on simply understanding what the teacher is saying and not specifically on the mathematical concepts being conveyed.

The main underlying conclusions of the 2012 (Department of Basic Education 2013) and 2013 (Department of Basic Education 2014) reports are very similar to those of the 2011 report, namely that the learners are struggling with the fundamental concepts of mathematics. Both reports comment that the learners should solve more non-routine problems and not just those found in the normal classroom setting, that is, those found in textbooks and historic question papers. The 2012 and 2013 reports both provide breakdowns of both Paper I and Paper II according to learners’ average performance on each question and on what content or mathematical concept was covered in the questions. This study focused exclusively on the Paper I topics, as the Paper II topics, in the document which was used as the source for the 2012 data, accidentally replicated the Paper I averages for the same paper. For the purposes of the study the Paper I question averages were reworked into rankings for both 2012 and 2013, using the average performances as an indicator of which topics were most problematic to the learners. The topic names for both years were reworked so that they could be aligned properly. These topic rankings, listed in Table 1, serve as the baseline for comparing the topics identified from the conversations on the Dr Math service.

TABLE 1: Paper I topics ranked according to lowest performance.
Dr Math

The Dr Math service was devised by the Meraka Institute of the South African Council for Scientific and Industrial Research. Initially, the service was an attempt to see if high school learners would use their own cellular phones to contact mathematical tutors. Eventually, the service grew to accommodate the requests of tens of thousands of school learners.

The tutors on the service were all volunteers, who lent their time freely. The popularity of the service ensured that there were generally many more learners accessing the service than available tutors. This led to availability issues which were partially addressed by providing a queuing system (Butgereit & Botha 2010) and methods for detecting the presence of equations in learner queries (Haskins & Botha 2012). The current study is not the first attempt at determining the topics in Dr Math conversations. A prior study by Butgereit and Botha (2011) used topic spotting to provide the Dr Math tutors with links to supporting documentation with regard to the specific topic identified. The free nature of the service has ensured that a wealth of data have been generated as a result of the conversations between the tutors and school learners.

The statements in Box 1 demonstrate the range of statements that are found in the text logs. Even though the text files consist of learner–tutor conversations regarding mathematics, identifying mathematical equations found in these texts may not always be a simple task. Scrutinising these texts manually, to determine the types of mathematical equations they contain, is not a feasible endeavour.

BOX 1: A few example queries taken from the historic logs of Dr Math.
Natural language processing

The conversations on the Dr Math service constitute a form of natural language, called microtext. Microtext may be defined as the short snippets of text used in modern digital forms of communication (Hovy et al. 2013). This form of text consists of various misspellings, informality, varied grammar and non-language forms, such as emoticons. An emoticon is a set of keyboard symbols used to represent a facial expression, such as a smiley face. They are icons that represent emotion. A study by Xue et al. (2011) surmised that NLP tools and techniques may be applicable to microtext-based content.

Natural language processing is a diverse research field, concentrating on both textual and vocal user input. This field pursues the elusive question of how we understand the meaning of a sentence or a document (Feldman 1999). Natural language text-processing systems are thus concerned with the translation of potentially ambiguous natural language queries and texts into unambiguous internal representations on which matching and retrieval can take place (Liddy 1998). The Dr Math conversations represent such ambiguous statements which require translation to highlight or extract their hidden mathematical equations.

There have been other studies that used NLP techniques to address the problem of mathematics in text. Adeel, Cheung and Khiyal (2008) proposed a prototype search engine, enabling a user to search for mathematical formula content. To index and retrieve mathematics, they make use of a combination of regular expressions and keywords to perform template matching. Because mathematics makes use of specific operators and operands, it may be possible to devise patterns from known equations. This study attempts to make use of a similar principle, in that it may be possible to use NLP techniques to convert the Dr Math statements into equations and then use these patterns to classify these equations. There are many techniques that are applicable to matching or identifying strings. Similar to the study by Adeel et al., this study makes use of regular expressions for this purpose.

Regular expressions

Regular expressions are a means to identify valid strings. The strings are matched against a series of patterns. Each individual pattern is referred to as a regular expression. Regular expressions are made up of a series of wildcard and constant characters, examples of which are shown in Table 2. The wildcard characters are special characters, which are used to represent one or more varying (i.e. interchangeable) characters in a given regular expression.

TABLE 2: Example of regular expression wildcards and constants.

The constant characters in the pattern have to match explicitly, whereas the wildcard characters provide a degree of variance. Regular expressions have been used in text processing to remove HTML tags when processing web pages (Li 2011), to identify Chinese cultural terms in an online search (Zhenjun & Xiangyu 2009) and to aid in the automatic annotation of electronic documents (Djioua et al. 2006).

By using the basis of regular expressions, it is possible to build representative patterns of the equations that may be found in the Dr Math text logs. These patterns would not only be applicable to specific mathematical equations, but would also allow various equations to be identified using the same pattern, thereby extending the applicability of the patterns beyond the equations from which the patterns were initially generated. Figure 1 shows how a single regular expression could map to multiple equations.

FIGURE 1: A single regular expression and some of the equations to which it could map.

In Figure 1, the first ^ character denotes that the match must start from the beginning of the input string. The $ character species indicates that the match must stop at the end of the input string or at the end of a line in the input string. The \( and \) statements denote single instances of opening and closing round brackets and the \^ statement signifies a single instance of the exponent operator. The [a-zA-Z0-9]+ statement signifies that the specific statement may be replaced by any lowercase or uppercase alphabetic character or numeric character. The associated plus symbol specifies that the match should extend for at least one character, but that it may be any number of characters, that is, the number 4 could match, but so too could the constant–variable combination of 45124z.

Finding representative equations

As part of the second phase of the study, it was necessary to source example equations as a means of comparison. To facilitate the equation capture, a software application was created to allow any equations from various sources to be captured manually. Notational guidelines were created to facilitate the capture of equations. This ensured that the equations were captured in the same format as equations created by the automated process. Five high school Mathematics textbooks were consulted as sources of equations, each representing a specific grade in high school (Carter et al. 2010; Goba et al. 2011; Goba & Van der Lith 2008; Van der Lith, 2008, 2010).

Every captured equation was saved and any incidental spacing removed. For each subsequent entry, the application checked the list of existing equations to ensure that no duplicates were captured. This technique yielded 3145 unique equations. Each of the equations was saved along with details regarding in which high school grade and chapter in the textbook it was first encountered. This study makes use of the chapter names from the textbooks as our general mathematics topics. Some of the chapter names, such as patterns, functions and algebra, are repeated across different grades. Using only the unique chapter names as topics yielded a total of 22 topics.

To provide some idea of the range of equations found in the textbooks and also the format in which they were captured, a few example equations are shown in Figure 2. It is interesting to note that the longest of the captured equations is 67 characters in length, but the average equation is only 13 characters in length. This may indicate that learners tend to query the tutors on simpler equations.

FIGURE 2: Example equations taken from mathematics textbooks.

The software application used to capture the mathematical equations was used to generate a regular expression matching every equation captured from the textbooks, using the wildcards and constants listed in Table 2, automatically. Character spacing is disregarded by the regular expressions. This process yielded 3145 regular expressions. As each of the 3145 equations was already tagged with information regarding the grade in which it was encountered, as well as the associated chapter name, it was possible to tag the regular expressions with this same information.

Processing the Dr Math conversations

As part of a related study, an automated system was developed which is capable of processing an input statement from the Dr Math text files to extract any perceived mathematical equation and to structure it in a representative form. The processing of these text files serves as the third phase of this study. The process makes use of various concepts found in the fields of natural language and text processing, not only to extract equations that are explicitly stated, but also those that may be structured in an unexpected manner.

The equations may be structured in unexpected ways as the learners make use of the same forms of language to phrase their questions as they would in chatting with their peers, that is, by leaving out many words or using abbreviated forms. Another influence on the structure of these messages could be that the keypads of most feature (non-smart) mobile phones, which were still used by many school learners during the period 2010–2013, do not lend themselves to entering mathematical equations properly. Figure 3 contains a few examples of the statements found on the Dr Math service and the outputs provided by the automated process. The year 2013 was the first year in which smartphone sales overtook feature phone sales globally (Gartner 2014). Modern-day tutoring platforms would most probably be based on smartphones. It should thus be noted that the spelling and structure of the statements provided by the learners might look more structured as the learners would have access to the built-in dictionary functions of the phones. These properly spelled and grammatically correct messages would simplify the translation and equation extraction process, as the decreased variance in message spelling would shrink the search space of the problem.

FIGURE 3: Dr Math statements and their associated equations.

Determining the validity of the equations

Even though the automated process provides a convenient and quick way of processing the Dr Math text logs, the results of the process would only be of use if it could be proven that the process approximates similar judgement to that of a human performing the same translations. To that end, two tests were conducted using human volunteers. The first phase of the tests consisted of having three human volunteers scrutinise 1000 entries found in the Dr Math text logs. They had to perform a simple coding process, stating whether they believed the entry to be translatable or not. Coding is a process in which participants record data according to rules supplied as part of the study. These same entries were processed (coded) by an initial phase of the automated process. This coding process forms part of a simple content analysis task. Content analysis is a research technique for making replicable and valid inferences from texts to the contexts of their use (Krippendorff 2004:18). Three human participants were used because, if required, it allows a majority decision rule to be applied in case of coding discrepancies. This step was not necessary for the purposes of this study.

The levels of agreement between the three human coders and the automated process was calculated using Krippendorff’s α Krippendorff’s α is a very general measure of intercoder agreement, which allows for uniform reliability standards to be applied to a great diversity of data (Krippendorff 2004:221). The calculations result in values in the range of 0 to 1, with 1 signifying complete agreement between coders and 0 signifying complete disagreement. Krippendorff suggests that values between 0.667 and 0.8 may be used for drawing tentative conclusions (Krippendorff 2004:241). The results of the individual α calculations are listed in Figure 4. To show that the level of agreement reached by the coders is not random, Figure 4 includes the level of agreement that the individual coders may have reached by chance (θ). These results show that the automated process is able to reach a fairly high level of agreement with two of the human participants, but not with the third. The two human participants who were in close agreement with the automated process were also not able to reach a high level of agreement with the third participant. These initial results were encouraging in indicating that the automated process provides a fairly close approximation to the ability of human participants to identify Dr Math statements containing mathematical equations.

FIGURE 4: Comparison of calculations between the automated process and individual coders on agreement (α as the first value) and the level of agreement they could have reached by chance (θ as the value in brackets).

A further round of tests was conducted to validate whether the automated process could deliver acceptable translations. To that end, the two human participants who were in relatively high agreement in the first test phase were asked to translate a series of 250 Dr Math statements into mathematical equations. The automated process was tasked to do the same. The equations created by the automated process were compared to those created by the human participants across various metrics. The metric calculations between the two human participants were used as a baseline for the tests.

In all metric calculations, the automated process was able to meet or to exceed the results of the human participants. Furthermore, the metric results calculated between the two human participants showed a moderate level of correlation. This level of correlation was matched by the automated process. These results serve as validation that the automated process is able to identify and to extract equations at a level similar to that which may be expected by a human participant.

Extracting the list of equations

To facilitate the tests required for the current study, the statements found in the Dr Math text files were structured according to the year in which they were captured. This provided four separate sets of text files to process, for each of the years 2010 to 2013. The automated process was applied to each statement for a given year. If the process yielded an equation, the equation was saved to a list of equations for the given year. Table 3 lists how many statements were processed for each year and how many equations were extracted for the given year.

TABLE 3: Statements processed and equations identified per year

In total, 248 993 unique statements were processed from the Dr Math text files. From these statements, a total of 36 141 equations were extracted. These equations serve as the source data for identifying the mathematical topics encountered on the Dr Math service.

Initial distributions and rankings

In the final phase of the study, the equations from the second and third phases were aligned by using the regular expressions generated in the second phase to match to the equations generated in the third phase. Regular expressions may be used to form either partial or complete matches to strings. For this study, the regular expressions were used to identify equations generated from the 2010 to 2013 Dr Math text logs with which they formed a complete match. A specific equation may be aligned with various topics found in multiple grades. Although, during processing, all the possible topic associations of an equation were saved, this phase of the study only makes use of the lowest grade (and topic) with which an equation was aligned. This was performed in an attempt to illustrate what the fundamental principles are that are discussed during tutorial sessions. Table 4 illustrates the number of these unique matching equations for each topic across all four years of the data set.

TABLE 4: Number of equations matched to the 22 topics for the period 2010–2013.

From these alignments the chart shown in Figure 5 was generated. The chart illustrates the distribution of equations, per year, according to the grade when they were first encountered in the textbooks. All distributions in this section are expressed as a percentage of the total number of expressions processed for a given year. The results from the four years show a relatively similar pattern, with most of the equations from each year having first been encountered in Grade 8, followed by Grades 9, 12, 10 and 11. In all cases the fewest new equations are encountered in Grade 11.

FIGURE 5: Distribution of where equations are first encountered by grade for the years 2010–2013.

These results may be interpreted in two ways. The first interpretation is that most of the learners seeking guidance from the Dr Math tutors are in the lower high school grades, namely 8 and 9. The second interpretation is that the results are representative of the fact that most of the learners struggle with concepts of which the basic structure was covered in Grades 8 and 9 and that all other concepts are related to these concepts. This interpretation aligns with the observations made in the diagnostics reports created by the South African Department of Basic Education, as discussed above, that most of the learners who wrote the NSC examinations struggle with the basic concepts taught in Grades 8 and 9.

To further investigate this phenomenon, Figure 6 ranks the most frequently occurring topics for each year. Some of the topics were not encountered frequently across the Dr Math text logs; thus, Figure 6 only shows the 10 topics most frequently encountered. It is also important to note that there may be some overlap across some of the topics, but they were kept separate so as to have a direct alignment with the individual chapters of the textbooks used as the data source of the study.

FIGURE 6: The 10 most frequently encountered topics (as measured by the number of equations) for the years 2010 to 2013.

Across the data sets for all four years, the two most frequently encountered topics are those of patterns, functions and algebra, and numbers, operations and relationships. The topic of patterns, functions and algebra deals with such basic concepts as factors and exponents, whereas numbers, operations and relationships address the simplification of algebraic expressions, roots and fractions. As these topics are fairly basic, it further supports our prior theory that they are flagged so frequently simply because their base understanding is a requirement for other mathematical topics.

Alignment with National Senior Certificate results

For the first round of tests, the distributions and rankings were calculated according to the lowest possible grade in which a concept was encountered. From the tests it became clear that most of the concepts aligned with the basic concepts covered in Grades 8 and 9. For the second set of tests, it was necessary to determine whether some of the equations could also match to regular expressions representing higher level concepts. An example of this is the equation 3x − 2y = 5 which may map to equations linked to the topics of patterns, functions and algebra, encountered in Grade 8, and solving and graphing linear equations, encountered in Grade 10, in the textbooks.

To do this the data sets were reprocessed to include all possible matches for an equation on any grade or topic, but the equations matched to the topics of patterns, functions and algebra, and numbers, operations and relationships were relegated only to those instances when the equations could only be aligned to these specific two topics. From these alignments the chart shown in Figure 7 was generated.

FIGURE 7: Distribution of equations by grade on all possible alignments for the years 2010–2013.

With these changes made it is apparent that, based on the equations, the learner queries are distributed fairly evenly between concepts first encountered in Grades 8, 10 and 12. This is to be expected as Grade 8 constitutes the initial contact of the learners with high school-level mathematics topics. New topics are introduced in Grade 10, as this is when the learners make their subject choices for the remainder of their high school career. Finally, Grade 12 represents the outcome-level for high school learning. As such, it is to be expected that a fair number of learners would require help in preparing for their final exams.

For the final comparison, and the final phase of the study, the 22 topics used to distinguish the Dr Math conversations were summarised into the 6 topics identified on the NSC Paper I examinations. Furthermore, for this comparison, only the results from the 2012 and 2013 Dr Math text files were used. This was performed in order to perform a direct comparison with the NSC Paper I topic rankings for 2012 and 2013. The comparison between these rankings and the same topic rankings from the conversations in the Dr Math text logs are shown in Table 5.

TABLE 5: Ranked topics from the NSC examinations compared with the ranked topics from Dr Math.

The NSC rankings indicate which topics learners received the lowest marks for on average in the examinations, whereas the Dr Math rankings indicate which topics were most frequently encountered. There is a definite alignment between the first two ranked topics of calculus, and functions and graphs. Topics 3 and 4 are reversed in order between the NSC and Dr Math results. The same can be said for topics 5 and 6. This further demonstrates the closeness in rankings, as topics are never more than a simple swap away from being aligned.

Conclusion

This study set out to answer the question: How closely do the mathematical topics that are most frequently discussed on the Dr Math service align with the performance of learners in NSC Paper I Mathematics examinations? In order to answer this question, two objectives were addressed.

Firstly, the historic diagnostic reports from the South African Department of Basic Education were consulted, to determine which issues were most pressing. From these reports, the average performance of the students on specific questions (and their associated topics) was used to create a ranked list of six topics in the order in which the learners performed the worst.

Secondly, the historic text logs of Dr Math were processed by automated means to determine which topics occur most frequently. Initially, it was observed that the learners struggled with the basic mathematical concepts covered in Grades 8 and 9. This aligns with the observations made by the Department of Basic Education. The Department of Basic Education lists a lack of exercise on the topics, beyond that which may be found in textbooks and prior question papers, as one of the reasons that the students have such a low level of conceptual understanding of the topics.

Further processing of the data also revealed that the topics most frequently discussed on the Dr Math service aligned relatively closely with the topics which the learners found most problematic on their NSC final examinations. This serves to answer the research question of the study. Furthermore, the close alignment between the topics discussed on Dr Math and the topics of the examination shows that an online tutorial system has utility because it reflects aspects of what is covered in the South African Mathematics curriculum. With the highest tutorial focus being on aspects considered challenging in the NSC reports, this also has the effect of independently validating the results of these reports.

Although the 2011–2013 NSC diagnostic reports were consulted, only the 2012 and 2013 NSC diagnostic reports contained the average performance figures for each question and its associated topic. As the topics listed were very general, a future study may analyse the question papers themselves to ascertain a greater list of topics, which would provide an even greater insight into the alignment with the topics discussed on Dr Math. This kind of information could be of value to the Dr Math tutors as it may allow them to present the learners proactively with questions on those topics, which they would not encounter in their daily classroom activities. In addition, it may address both the conceptual understanding of the learners and the concerns of the Department of Basic Education with regard to the amount of mathematical experience the learners gain outside of the classroom.

Acknowledgements

Competing interests

The authors declare that they have no financial or personal relationships which may have inappropriately influenced them in writing this article.

References

Adeel, M., Cheung, H.S. & Khiyal, S.H., 2008, ‘Math GO! Prototype of a content based mathematical formula search engine’, Journal of Theoretical and Applied Information Technology, 4(10), 1002–1012.

Butgereit, L. & Botha, R., 2010, ‘A busyness model for assigning tutors to pupils in a mobile, online tutoring system: A look at C3TO’, in Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT), ACM, New York, 11–13 October, pp. 350–355.

Butgereit, L. & Botha, R., 2011, ‘A model to identify mathematics topics in MXit lingo to provide tutors quick access to supporting documentation’, Pythagoras, 32(2), 79–85. https://doi.org/10.4102/pythagoras.v32i2.59

Carter, P., Dunne, L., Morgan, H. & Smuts, C., 2010, Study and Master Mathematics Grade 9, 12th edn., Cambridge University Press, Cape Town, South Africa.

Department of Basic Education, 2012, Report on the national senior certificate examination 2011: National diagnostic report on learner performance, Department of Basic Education, Pretoria, South Africa.

Department of Basic Education, 2013, National senior certificate examination national diagnostic report on learner performance 2012, Department of Basic Education, Pretoria, South Africa.

Department of Basic Education, 2014, 2013 National senior certificate examination diagnostic report, Department of Basic Education, Pretoria, South Africa.

Djioua, B., Flores, J.G., Blais, A., Desclés, J.P., Guibert, G., Jackiewicz, A. et al., 2006, ‘EX-COM: An automatic annotation engine for semantic information’, in Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference, AAAI Press, Burgess Drive, pp. 285–290.

Feldman, S., 1999, ‘NLP meets the Jabberwocky: Natural language processing in information retrieval’, Online-Weston Then Wilton, 23, 62–73.

Gartner, 2014, Gartner Says Annual Smartphone Sales Surpassed Sales of Feature Phones for the First Time in 2013, viewed 11 May 2017, from http://www.gartner.com/newsroom/id/2665715

Goba, B., Morgan, H., Press, K., Smuts, C. & Van der Walt, M., 2011, Study and Master Mathematics Grade 8, 9th edn., Cambridge University Press, Cape Town, South Africa.

Goba, B. & Van der Lith, D., 2008, Study and Master Mathematics Grade 10, 2nd edn., Cambridge University Press, Cape Town, South Africa.

Haskins, B. & Botha, R., 2012, ‘Identifying suitable mathematical translation candidates from the logs of Dr Math’, in A. de Waal (ed.), Proceedings of the 23rd Annual Symposium of the Pattern Recognition Association of South Africa, PRASA, Pretoria, South Africa, 29–30 November, pp. 16–23.

Hovy, E., Markman, V., Martell, C. & Uthus, D., 2013, ‘Preface’, in Proceedings of the 2013 AAAI Spring Symposium on Analyzing Microtext, AAAI, Palo Alto, CA, 25–27 March, p. vii.

Krippendorff, K., 2004, Content analysis: An introduction to its methodology, 2nd edn., Sage, Thousand Oaks, CA.

Li, F., 2011, ‘A web text extraction method based on regular expressions and text density’, in Proceedings of the 2011 International Conference on Information Management, Innovation Management and Industrial Engineering, vol. 1, IEEE Computer Society, Washington, DC, 26–27 November, pp. 287–290.

Liddy, E.D., 1998, ‘Enhanced text retrieval using natural language processing’, Bulletin of the American Society for Information Science and Technology, 24(4), 14–16. https://doi.org/10.1002/bult.91

Mhlolo, M.K., Venkat, H. & Schafer, M., 2012, ‘The nature and quality of the mathematical connections teachers make: Original research’, Pythagoras, 33(1), 1–9. https://doi.org/10.4102/pythagoras.v33i1.22

Setati, M., 2008, ‘Access to mathematics versus access to the language of power: The struggle in multilingual mathematics classrooms’, South African Journal of Education, 28(1), 103–116.

Van der Lith, D., 2008, Study and Master Mathematics Grade 11, 5th edn., Cambridge University Press, Cape Town, South Africa.

Van der Lith, D., 2010, Study and Master Mathematics Grade 12, 9th edn., Cambridge University Press, Cape Town, South Africa.

Xue, Z., Yin, D., Davison, B.D. & Davison, B., 2011, ‘Normalizing microtext’, in D.W. Aha, D.W. Oard, S. Ramachandran & D.C. Uthus (eds.), Analyzing microtext, vol. WS-11-05, pp. 74–79, AAAI, Menlo Park, CA.

Zhenjun, Y. & Xiangyu, J., 2009, ‘A simplified application of regular expressions: With the extraction of Chinese cultural terms as an example’, in ISECS International Colloquium on Computing, Communication, Control and Management, vol. 1, pp. 439–442, IEEE, Sanya, China, 8–9 August.