key: cord-0333131-kllqoipw authors: Kadner, Florian; Keller, Yannik; Rothkopf, Constantin A. title: AdaptiFont: Increasing Individuals' Reading Speed with a Generative Font Model and Bayesian Optimization date: 2021-04-21 journal: nan DOI: 10.1145/3411764.3445140 sha: 5a167ae22bd4649ddede27517e294739b6966fda doc_id: 333131 cord_uid: kllqoipw Digital text has become one of the primary ways of exchanging knowledge, but text needs to be rendered to a screen to be read. We present AdaptiFont, a human-in-the-loop system that is aimed at interactively increasing readability of text displayed on a monitor. To this end, we first learn a generative font space with non-negative matrix factorization from a set of classic fonts. In this space we generate new true-type-fonts through active learning, render texts with the new font, and measure individual users' reading speed. Bayesian optimization sequentially generates new fonts on the fly to progressively increase individuals' reading speed. The results of a user study show that this adaptive font generation system finds regions in the font space corresponding to high reading speeds, that these fonts significantly increase participants' reading speed, and that the found fonts are significantly different across individual readers. The quick brown fox jum ps over the la zy dog. What color was the fox? Language is arguably the most pervasive medium for exchanging knowledge between humans [1] . But spoken language or abstract text need to be made visible in order to be read, e.g. in print or on screen. Traditionally, the transformation of text to the actually visible letters follows a complex process involving artistic and design considerations in the creation of typefaces, the selection of specific fonts, and many typographic decisions regarding the spatial placement and arrangement of letters and words [2, 3] . A fundamental question concerning the rendering of text is whether the way text is displayed affects how it is being read, processed, and understood. Scientifically, this question lies at the intersection of perceptual science and cognitive science. Numerous empirical investigations have measured how properties of written text influence its perception in reading both on paper [4, 5, 6] and on electronic devices [7, 8, 9] . Nevertheless, overall results of these empirical investigations on the relationship between reading speed or reading comprehension and font parameters are mixed and in part contradictory. More recently, due to the pervasive use of electronic text that can be read on different devices under different viewing conditions, adaptive methods for font rendering and text display have been developed [10, 11, 12] . Moreover, fully parameterizable fonts intended to generate new fonts have been developed for multiple purposes [13, 14, 15, 16] . Some of these fonts are intended to help designing new fonts [13] , some have been developed to enhance readability for individuals with low vision [14] , and others have been developed to increase readability on small electronic devices [16] . Here, we close the loop by adaptively generating fonts and optimizing them progressively so as to increase individual's reading speed. The system is based on a generative font space, which is learnt in a data driven fashion using non-negative matrix factorization (NMF) of 25 classic fonts. We use this font space to generate new fonts and measure how a generated font affects reading speed on an individual user by user level. By utilizing Bayesian optimization, we sample new fonts from the generative font model to find those fonts that allow the reader to read texts faster. We demonstrate the feasibility of the approach and provide a user study showing that the found fonts indeed increase individual users' reading speed and that these fonts differ between individuals. 2 Related work Traditionally, the development of a typeface is a complex creative process influenced by aesthetic, artistic, historic, economic, social, technological and other considerations [2] . As such, designing of typefaces and typography underlie individual judgements, which may be highly variable between individuals and across time, see e.g. [17] . Nevertheless, readability has often been a central goal in the creation of typefaces, as expressed by Adrian Frutiger: "... optimum readability should always be foremost when developing a typeface" [18] . What constitutes readability in this context and how to measure legibility and readability, has been debated extensively [19, 20] . The scientific and empirical investigation of the relationship between how rendered text looks and how it is processed lies at the intersection of cognitive science and perceptual science and is particularly relevant in psycholinguistics. The interested reader is refereed to reviews providing an overview of this broad body of work [4, 5, 7, 6] . Of particular relevance for the present study are investigations addressing how font features such as size, boldface, serifs, or different typefaces affect text readability. In this context, readability has been operationalized in different ways with subject's reading speed such as words per minute being the most common way. Note that within typography, the effect of a typeface's design and the glyph's shape is usually termed legibility and not readability. Perceptual science has long investigated how features of typefaces and individual fonts affect letter perception e.g. with a series of experiments by Peterson and Tinker since the 1920ies [21] . Since then, numerous features affecting readability of text have been investigated, including font size [22] , serifs versus sans-serif typefaces [23] , lightness contrast [24] , color contrast [25] and many others. Perceptual science has also investigated how the spacing between adjacent letters influence letter recognition. Crowding describes the phenomenon that the perception of letters depends on their spatial separation as a function of foveal eccentricity [26] . It has been shown that crowding can account for the parafoveal reading rate [27] . Similarly, the effect of the vertical spacing of words on readability has been investigated [28] . A number of studies have particularly focused on how readability of fonts changes with age [29] and how to make fonts more readable for readers with medical conditions [30] . Because of the importance of text in electronic communication, the field of HCI has investigated the effect of fonts on readability particularly when rendering text to monitors and other electronic devices. Such studies include investigations of the effect of font size and type on glance reading [31] , measurement of user preferences and reading speed of different online fonts an sizes [32, 8] , comparisons of the effect of font size and line spacing on online readability [33] , quantifying the interaction of screen size and text layout [9] , and investigating feature interaction such as color and typeface in on-screen displays [34] . Nevertheless, integrating all these results provides a mixed picture. Several studies concluded that increasing font size improves readability [33] while others reported that this holds only up to a critical print size [35] . While early studies reported improvements in readability with increased font weight [36] , other studies reported no significant effect [37] . While the presence of serifs did not significantly affect the speed of reading [23] , font type did but only modestly [38] . Complicating the interpretation of some of these results is that most studies analyzed reading data across subjects, while some studies found more complex interactions of font features at the individual subject level [39] . Similarly, it remains unclear how reading text in print relates to reading on screen [40] . Finally, reading research has utilized many different reading tasks ranging from RSVP letter detection to comprehension of meaningful text [41] . The introduction of digital font has generated new techniques and challenges for adaptively rendering and setting text particularly on electronic devices, both from the point of view of the development of such algorithms [42] and from the point of view of the typographer [3] . Parametric tools intended to help designers in developing new fonts have been introduced, e.g. [43] . Parametric techniques have been developed with the aim to increase readability, both in print [10] as in rendering on screen, e.g. ClearType adjusts sub-pixel addressing in digital text to increase readability on small displays [11] . Knuth's metafont goes a step further, as it can specify a font with one program per glyph, which in the case of the Computer Modern font family generates 72 fonts through variation of parameters. More recent developments include OpenType, which uses multiple glyph outline that are rendered to variable parametric fonts [12] as well as adaptive fonts, which have recently gained popularity in responsive design for the web [44] . The ubiquity of hypertext has similarly lead to adaptive grid based layouts [45] , which affects readability of text. Some of these systems are based on optimization algorithms that try to formulate and include font parameters [46] . On the other hand, the development of computational and algorithmic art [47] and design [48] have extended the possibilities of creation in many fields. Parameterized generative font models allowing the parametric generation of new fonts have been created [49, 13] . Other approaches aimed at describing characters as an assembly of structural elements resulting in a collection of parameterizable shape components [50] . A prototype parametric font program called Font Tailor was specifically designed for users with visual impairments to customize a font to increase readability [14] . Dynamic fonts' [51] shapes are instantiated every time a glyph is rendered allowing for parameterized variability, e.g. in emulating handwriting [15] , and fluid typography [52] changes glyphs' shape on screen thus blurring the lines between typography and animation. Some dynamic fonts have been design specifically to increase readability on small portable electronic devices [16] . Our data driven approach to obtain a generative font space is most closely related to previous approaches of unsupervised learning of fonts [53] and recent systems based on Generative Adversarial Networks (GAN), e.g. [54, 55, 56, 57, 58] . Differently from [53] , which used a polyline representation of letters, we used a pixel based representation as the current study renders text only to a monitor up to a size of 40 pt. Recent work has explored the use of optimization methods from machine learning to evolve and adapt designs with explicit and implicit measures of users' preferences, which in general can be seen as a field belonging to interactive machine learning [59] . The difficulty lies in the availability or construction of an appropriate quantitative criterion that can be maximized and captures humans' explicit or implicit preferences. Note that this is different from the supervised learning of selecting fonts from many design examples [60] . One such area is the automatic design of interface layouts. Some approaches optimize layouts with respect to hand coded rules that are aimed at incorporating design criteria [61] . Off-line systems collect a large data set of user preferences or abilities and then approximate the user through a function, e.g. in ability-based user interfaces [62, 63] . More recent work trained a neural network to predict users' task performance from a previously collected data [64] . Closed-loop adaptive systems, i.e. a system that parameterically changes to optimize some interaction criterion on-line while the user is actively interacting with it, are much rarer and have been used predominantly in the context of game design [65, 66, 67 ]. One particularly attractive and powerful method for optimizing an unknown function is Bayesian optimization. It has a long tradition in many fields of research including optimal design of experiments [68] including closed-loop design of experiments [69] . The power of Bayesian optimization lies in its ability to use statistical methods in modeling and efficiently optimizing "black-box" functions. Bayesian optimization has been used in several recent HCI systems both for open-loop optimization [70] as well as in closed-loop systems, e.g. in the context of computer game design [71] . Current research also includes the application of Bayesian optimization in the field of computational design involving crowdsourcing [72] . In a preliminary user study we wanted to test whether reading speed was related to typefaces and font parameters at an individual subject's level. Seven subjects participated and read 250 texts each, whereby their reading speed was measured. All subjects had normal or corrected to normal vision and were German native speakers. They were seated at a distance of approximately 50 cm in front of a 24 inch monitor, which had a resolution of 1080 pixels horizontally. Participants were instructed to read the texts attentively and as quickly as possible, but only once in total. The texts were Tweets, i.e. messages or status updates on the microblogging site Twitter, which are limited to 280 characters, written in German. The tweets were randomly downloaded from the Twitter pages of major news platforms and cleaned up from all mentions and hashtags. In order to ensure that subjects did understand the content of the texts, they were instructed to categorize each tweet after reading. To this end, eight categories were randomly displayed after subjects had completed reading a tweet. Participants were instructed to select as many categories applying to the content of the tweet as they thought appropriate. The tweets had been labelled previously independently by two of the authors of this study. A tweet was considered as labeled correctly by a participant if at least one correct label was selected for each text. A 5 × 5 × 2 factorial design was chosen to investigate the influence of different fonts on reading speed. Each trial consisted of a combination of a different typeface (Arial, Cambria, Century Schoolbook, Helvetica and Times New Roman), a font size (10,20,30,40,50 pt) and whether the bold font was chosen. Each combination was presented exactly five times during the experiment, resulting in a total number of 5 × 5 × 2 × 5 = 250 trials. The measurement of reading time was started by subjects pressing a button, which also revealed a text, and stopped by a second button press, which also revealed the screen for the categorization. The time returned was used to compute the reading speed as the number of words read per minute. The distribution of all measured reading speeds can be found in Figure 7 . To analyze the influence of the various factors, a Bayesian ANOVA was carried out on a subject by subject basis using JASP [73] . The Bayes Factors of the models of each subject are shown in Table 1 . While there was substantial evidence in most subjects that font size influenced reading speed, font weight showed a substantial influence only in two subjects. For one subject, the analysis showed substantial evidence for an interaction between font size and bold face. These results provide evidence for individual differences in the magnitude and directions of the respective influencing factors. The results therefore provide a first indications that the factors influencing reading speed may differ on an individual subject basis. To generate fonts on a continuum, we first require a parametric generative font model. Here we used unsupervised learning to obtain a continuous font space. Specifically, we chose non-negative matrix factorization (NMF) [74] as a method for dimensionality reduction of fonts. The idea behind this procedure is to approximate a matrix X with the product of two matrices W, H so that the following relationship applies: X WH. This factorization is done under the constraint, that the approximation minimizes the Frobenius-Norm ||X − WH|| F and that all entries of W, H are non-negative. The columns of W then represent the dimensional features and H contains the weights to combine these features to reconstruct the rows in X. One advantage of the strictly positive components of NMF with respect to other methods such as Principal Component Analysis or Transformer Networks is that the basis functions can be thought of as printer's ink on paper and therefore as elements resembling actual glyphs. Another recent approach for font generation employs GANs, see e.g. [54, 55, 56, 57, 58] . Even though this technique gives good results at the letter and word level, current implementations suffer from problems in alignment and kerning in continuous texts. Due to these difficulties, we were not able to employ GANs that could generate qualitatively satisfactory texts even though this approach will certainly be very promising in the future. In order to cover a basic set of fonts including fonts with serifs and sans-serif fonts, we selected a list of classic and popular typefaces [75] , containing a total of 25 classical fonts. The typeface names are shown in their respective font in Figure 2 . To perform the NMF, we generated a grayscale image of size 2375 × 51 pixels containing all 26 letters in upper and lower case, the numbers from zero to nine, the German umlauts, brackets, question and exclamation mark, dot, comma, hyphen, colon, semicolon, slash and quotation marks. Such an image was generated for each individual font using a font-size of 40 pt. Letters were arranged side by side. The image data was then concatenated together with information about the alignment of each glyph in the font, obtained from the respective information in the TrueType-file. The resulting font vectors form the rows of our matrix X, on which we performed the NMF. To choose the appropriate number of dimensions, cross-validation was performed. As recommended by Kanagal & Sindhwani [76] , we masked our data using Wold holdouts and fitted the NMF to the rest of the data using weighted NMF with a binary weight matrix. The Wold holdouts are created by splitting the matrix into 16 blocks and randomly selecting four blocks, one for each row, to hold out. Ten random Wold holdouts where created and for each we calculated the reconstruction error when choosing between one and five NMF components. The mean reconstruction error showed that a low testing error can be obtained between one and three NMF components and that the model starts overfitting the data for four or more components. For our subsequent experiments we therefore decided to use three components to ensure a sufficiently rich font representation, while also avoiding overfitting to the data. The restriction to three dimensions had the additional advantage of allowing an individual font to be represented by a very low dimensional vector with only three entries, which reduces the computational burden for the subsequent Bayesian optimization process described in the following. The representation of the 25 fonts in the three-dimensional NMF space is shown in Figure 3 . Through this representation it is now possible to synthesize new fonts as a linear combinations of the three learnt font basis vectors. Accordingly, a font can be represented by a point in this three dimensional space. Figure 4 shows exemplary the influence of the three dimensions for the upper case letter K. Inspecting the letter renderings suggest that the first and third dimension are related to the scaling of letters in the vertical or horizontal directions, whereas the second dimension is related to the presence and strength of serifs. This generative model allows not only creating new fonts, but also the interpolation between actual fonts. Figure 5 shows the changes in three characters (capital K, lower case y and number 3 ) resulting from linearly moving in euclidean space between the points corresponding to the fonts Frutiger and Sabon. Note that the changes are gradual and smooth. To be able to generate texts with a synthesized font we generate TrueType-Font (TTF) files on the fly. A linear combination of the basis vectors obtained through NMF yields a vector that contains The SVG font then was converted into the TTF format using FontForge [78] . At this point, the alignment information is directly inserted into a TTF file using the fontTools python package [79] . The entire process of font generation works automatically and in real time. To generate new fonts that increase participants' reading speed, we need to select an optimization technique that finds corresponding regions in the three dimensional font space. The underlying assumption is that fonts change smoothly within the generative font space and that individual's reading speed is also changing smoothly along similar fonts. The objective function to be maximized is an individual's reading speed as function of a specific font, which in this case is represented by a point in our three dimensional font space. Since the font space is infinitely large and only a limited number of texts can be represented, we decided to use Bayesian Optimization, because it is a wellknown optimization technique to find extrema of complex cost functions when the number of samples that can be drawn is limited [80] . Bayesian optimization starts with an a priori uncertainty across the three-dimensional font volume and selects successive points within that volume, for which the reading speed is evaluated through a reading experiment. Since the choice for this prior distribution is hard in general, a common choice is the Gaussian process prior due to its flexibility and tractability [81] . The central assumption of the Gaussian Process is that any finite number of points can be expressed by an Multivariate Gaussian Distribution. The idea now is to take one of these points and assume that it is the value of the underlying unknown function. With the marginalization properties of the Multivariate Gaussian distribution it is now possible to compute marginal and conditional distributions given the observed data point. By obtaining the reading speed for this font through an experiment the algorithm reduces the uncertainty at that location in font space. To select the next best font for testing, a balance has to be struck between exploration, i.e. selecting a region in font space for which the uncertainty about reading speed is large, and exploitation, i.e. selecting a region in font space where the reading speed is expected to be high, given previous reading experiments. This selection process is handled by the so called acquisition function and its corresponding parameters. Thus, Bayesian Optimization proceeds by sampling more densely those region, where reading speed is high, and more evenly in those regions, where uncertainty is high. The overall closed-loop logic of the experiment (see Figure 1 ) was to have the Bayesian optimization algorithm generate a font in the generative font space and use the reading speed of individual participants to generate new fonts to increase the reading speed. Eleven subjects (5 females, 6 males; age M = 24, SD = 2.64) participated in the experiment. All participants were German native speaker and had normal or corrected to normal vision. Due to the COVID-19 pandemic, it was not possible to invite subjects to the laboratory, so they all performed the experiment on their private computers at home. Participants were acquired from graduate and undergraduate students of the research group and received the necessary materials for participation by e-mail. All subjects received a detailed, multi-page instruction to keep influences like sitting position, viewing distances, room lighting, etc. as similar as possible. In addition to detailed instructions and information on the experiment, the subjects also received an executable file containing the experiment. Subjects were naive with respect to the mechanics of font generation and selection. Subjects read a total of 95 texts, which were taken from a German Wiki for children's encyclopedia texts. These texts were chosen because they are easily understandable for adult native speakers, so that the content of the texts had minimal effect on reading speed. Furthermore, the texts were chosen so that they were comparable in length, i.e. the number of words for the first 94 texts was on average 99.7 with a minimum of 91 and a maximum of 109 words. Additionally, a single text with only 51 words was selected to check whether the reading speed deviated significantly depending on text length. The texts were presented in a random order for each participant. Subjects were instructed to read the texts attentively and as quickly as possible but only once in total. To check that the texts were read and processed in terms of content, the subjects had the task of detecting words from previously specified categories while reading. Each individual trial started with the introduction of the category for the next text, e.g. before reading the text, it was indicated that words from the category animals should be detected subsequently. The category only referred to the next text, and a new category was selected for each trial. Each text's category had been previously independently labelled by two of the authors of this study and each subject was given the same category for a text. For each text there were between one and ten words belonging to the instructed categories (mean 3.07, SD 2.05). Once a subject had read the category for the next trial, they could use the space bar to display the text and begin reading. When the text was displayed, the time measurement for the reading speed also started. To avoid an accidental start, the task could only be initiated a few seconds after the category had been displayed (this was indicated to the participants by a red/green signal). Each time a term matching the previously specified category was detected, the participants could press the space bar to indicate this. As an example, in the short sentence "The quick brown fox jumps over the lazy dog.", when reading the words fox and dog, the user had to press the space bar to indicate that the objects were recognized for the category animals. Once they had finished reading the text they could press the Enter key to stop reading and thus also stop the timing. In addition, a multiple choice question was asked after having read eleven texts at random trials throughout the experiment, in order to additionally test subjects' text comprehension. The question always referred to the last text read and six possible answers were given, of which exactly one was correct. After solving one of the multiple choice questions, participants received feedback on their answer. This feedback consisted of the average reading speed, the number of correct detections, and the correctness of the multiple choice question. These three components were combined into a score to further motivate participants to read quickly, but also correctly and attentively. In order to investigate whether regions of the generative font space exist which allow higher reading speeds, subjects read the 95 texts in different synthesized fonts. Since the three-dimensional space is infinitely large and only a limited number of texts can be presented, the Bayesian global optimization with Gaussian process was used in order to sample only 95 fonts. The target function to be optimized is the reading speed as parameterized by the three dimensions. The process was implemented with [82] and the following configuration was selected: We choose the Upper Confidence Bound (UCB) as acquisition function for balancing exploration and exploitation of new fonts [83] . The exploration parameter was set to κ = 5. We started with ten random initialization points and chose the noise handling parameter as α = 0.001 for the exploration strategy. We chose a Matern Kernel with parameter ν = 2.5 for the covariance function and the L-BGFS-B algorithm for the optimization of the kernel parameters. Figure 6 shows an example for the sampling process through the Bayesian optimization algorithm. As combinations of NMF components that have too high or too low magnitudes would for sure lead to unreadable fonts, we constrained the exploration space. If the sum of the three components is too low, the font is only faintly visible. If instead it is too high, the font is too bold and not readable either. Therefore, we empirically decided to constrain the magnitude of every dimension to be between 0 and 13 and the sum of all magnitudes to be between 7 and 20 units. It could happen that, in spite of these constraints, generated fonts were not readable. In such cases, subjects had been instructed to reset the corresponding trial via a previously defined key. Afterwards, the same text was presented in a new font, ensuring that all participants had read all 95 texts. This happened on 94 trials in total across all subjects (mean 8.54, SD 3.79). The Bayesian optimization algorithm nevertheless received the data that this linear combination returned a reading speed of zero words per minute. In order to rule out the possibility that the texts, despite the selected constraints, led to differences in reading speed in terms of content, the average reading speed in words per minute was determined for each text over all participants. A Bayesian ANOVA was used to check whether there was a significant difference in mean value and thus a dependence of the reading speed on the respective text. The Bayes Factor B 01 = 360.36 gives decisive evidence for the null hypothesis that there was no significant differences between the texts. A second possible confounder relates to the length of the texts, which participants read. By including a text with 51 words, we were able to check that the reading speed as measured in words per minute did not significantly depend on the chosen length of texts. Indeed, the reading speed for the shorter text (mean: 257.37 words per minute) was within the 95% confidence interval of reading speeds of the 94 remaining texts for all individual participants (mean: 265.02, SD: 89.82). Another confounder refers to the detection task and the number of key presses required during a trial. In order to investigate whether the detections and motor behavior have an influence on the measured time, the Bayesian Pearson Correlation Coefficient between the number of words belonging to the instructed category and the corresponding reading time was computed. For this purpose the reading speeds were averaged over the number of expected detections to exclude the influences of all other factors influencing reading speed. No significant correlation was found (r = −0.223; BF 10 = 0.47) thereby confirming that the number of words belonging to the instructed category did not have a significant effect on reading speed. It might be argued that although reading speeds with AdaptiFont may be improved, reading speeds are still larger for traditional fonts. To exclude this possibility, we compared the reading speeds over all texts and subjects between the preliminary study, in which classic typefaces were used, and the main study using AdaptiFont. A Kolmogorov-Smirnov test indicates that the two distributions are indeed significantly different (D = 0.367; p < .001) and a Bayesian Two-Samples t-Test showed that the mean reading speeds for AdaptiFont were significantly higher than for the traditional typefaces (BF +0 = 6.782). Figure 7 shows the corresponding distributions. Finally, we asked whether the optimization throughout the experiment actually reduced uncertainty about reading speed within the font space. For this, the point to be sampled next by the optimization process was considered. At this point, the variance of the Gaussian Process was integrated within the neighborhood of a font with radius r = 0.225 before and after reading a text in the corresponding font. This radius is exactly half the distance between the two closest classic baseline fonts. For each iteration, the variance change in this region was now examined. Bayesian optimization indeed resulted in an average variance reduction of 943. 65 Figure 7 : Histograms of measured reading speeds for the preliminary (orange) and the main study (blue). A kernel density estimator was fitted to both histograms. distribution can be found in Figure A. 1. To detect regions in the generative font space with higher reading speeds a clustering method was used, which includes the reading speed as fourth dimension in addition to the three font dimensions for each individual participant. We chose the OPTICS algorithm [84] to cluster the data points, which is a density based clustering procedure, which orders the points of a data set linearly, so spatially nearest neighbours can be clustered together. This has the advantage over other methods, such as k-Means, because the number of clusters does not have to be specified in advance and allows detecting outliers that are not assigned to any cluster. The latter feature was useful in the analysis of our data, as it excluded trials with deviations in reading speed due to lapses in attention. As a free parameter of the algorithm, the minimum number of data points that must be present in a cluster must be defined. Following the recommendations of [85] , this parameter can be determined using the heuristic of using twice the number of data dimensions minus one, i.e. (2×Dimensions)−1, so that in our case we decided to set the parameter to n = 5. For visualization purposes, Figure 8 shows the best clusters, i.e. the clusters with the highest mean average reading speed for each of the 11 subjects in the generative font space. The clusters are represented by an ellipsoid, with the center of all associated data points and the standard error in all three dimensions as main axes. In order to better compare the optimized synthesized fonts with the traditional fonts, we additionally considered their position in font space with respect to each other. The optimized fonts for all subjects lay within a region bounded by the traditional fonts Rockwell, Myriad, Optima and News Gothic. To obtain an indication of the similarity and difference between individuals' optimized font, we computed the mean pairwise distance between individuals' optimized fonts. This distance is 3.99 units in the font space, which is comparable to the distance between Avantgarde and Universe. By comparison, the distance between slowest and fastest fonts for each subject was on average 11.3 units. In order to compare these distances within the font space, we computed the distances between classic fonts, with the smallest distance (0.45) between Helvetica and Univers, the largest distance (18.8) between Univers and Bembo, and the average distance (9.96), which was comparable to the distance between Optima and Minion. Figure 9 : Found clusters for every subject based on the OPTICS algorithms and the corresponding mean reading speed together with their standard errors. The mean reading speeds of all cluster were rank ordered for each subject. Note that the standard errors of reading speed were very small compared to the mean reading speeds for each cluster and subject. To check whether the font clusters correspond to significantly different reading speeds, a Bayesian ANOVA was calculated. Figure 9 shows the mean reading speeds for each of the found font clusters together with the corresponding 95% credible interval for each subject. The Bayes Factors are reported in Table 2 , where every Bayes Factor gives decisive evidence against the null hypothesis, so that the clusters differ significantly in their average reading speed. A MANOVA established statistically significant differences in reading speed between clusters of highest reading speed of the subjects, F (10, 75) = 3.24, p < .001; Pillai Trace = 0.91. The corresponding ANOVA for every single dimension was performed resulting in three significant differences Figure 7 : The distribution generated with AdaptiFont is not only statistically significantly different, the reading speed with Adaptifont is also statistically significantly faster on average. All subjects showed improved reading speed. Finally, figure 10 shows a pangram written in the respective font of the center of the cluster of highest reading speed for three exemplary subjects. The fonts are actual TrueType-fonts and can be used as such. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. We presented AdaptiFont, a human-in-the-loop system to increase readability of digitally rendered text. The system uses a learned generative font space and Bayesian optimization to generate successive fonts that are evaluated in terms of individual user's reading speed. A preliminary study gave first indications that individual differences in readability exist in terms different font features. With AdaptiFont, variation in font features are captured in a specific font space learned through NMF. Using this representation, we explore the reading speed of individual users with the help of Bayesian optimization. The results of the user study show the feasibility of the approach, both in terms of the required data to find fonts that increase individual's reading speed as well as the statistically significant magnitude of the improvement in individual's reading speed. Finally, significant differences between subjects exist in the font regions that contain fonts that are associated with high legibility. The generated fonts are actual TrueType-font files, which can be installed and used. Although reading speed maximizing fonts found in this study were significantly different across individual subjects, it cannot be concluded that our system finds one single font that maximizes reading speed unchangeably for a single subject or across all subjects. Rather, our system can be understood as dynamically and continuously creating fonts for an individual, which maximizes the reading speed at the time of use. This may depend on the content of the text, whether you are exhausted, or perhaps are using different display devices. The empirical data we obtained in the experiments provides clear evidence that Adaptifont increased all participants' reading speeds. How these optimized fonts are related across subjects or within subjects across days or viewing contexts are very interesting further empirical questions, which can be systematically addressed by utilizing Adaptifont in future research. While the current study demonstrates the feasibility of interactively optimizing fonts through Bayesian optimization, it has a number of limitations, which should be addressed in further research. First, while reading speed is a natural target for optimizing fonts, other criteria are conceivable, such as text comprehension, aesthetic appeal, or memorability either of texts or fonts. Second, ample research has demonstrated cognitive and linguistic influences on reading speed, which were not taken into account here. Third, future generative font spaces, e.g. based on GANs, may provide a more expressive font synthesis with smaller approximation errors to classic fonts and better alignment and kerning. Fourth, while the current study maximized reading speed at the individual subject level, it is straightforward to use the system to maximize optimization criteria across multiple subjects. Fifth, the generative font space based on NMF may result in synthetic fonts that violate typographic rules. Including typographic constraints or interactively including a typography expert in the font synthesis process may yield an additional avenue for the development of new fonts. While constant, variable, and parametric fonts have been designed and developed in the past, Adaptifont is an interactive system, which generates new fonts based on a user's interaction with text. In this paper, we used a generative font model and Bayesian optimization to generate fonts, which progressively increased users' reading speeds. Adaptifont can be understood as a system that dynamically and continuously creates fonts for an individual, which in this study maximized the reading speed at the time of use. Evaluation of the system in a user study showed the feasibility of the approach. Adaptifonts can be utilized to address further scientific questions such as how the optimized fonts depend on sensory, linguistic, or contextual factors or how optimized fonts are related across individuals. The system does in no way detract from other work on generating fonts, both from the point of view of developing such algorithms or from the point of view of the typographer, as developing typefaces can be guided by different motivations and with different goals. Under https://github.com/RothkopfLab/AdaptiFont you can find the repository with additional material. Provided are usable fonts in the TrueType-font (.ttf) format from the centroids of the clusters with highest reading speed for all subjects, as well as a script that can be used to generate fonts from the learned NMF components. The science of words Typographic design: Form and communication Classical typography in the computer age Psychophysics of reading-i. normal vision Typeface features and legibility research Visual factors in reading How physical text layout affects reading from screen A comparison of popular online fonts: Which size and type is best The influence of screen size and text layout on the study of text Dynamic optical scaling and variable-sized characters Cleartype sub-pixel text rendering: Preference, legibility and reading performance Microsoft corporation. Opentype® specification Synthesis of parametrisable fonts by shape components Adjustable typography: an approach to enhancing low vision text accessibility Random fonts for the simulation of handwriting Reading and learning smartfonts Typographic landscaping: Creativity, ideology, movement Adrian Frutiger-Typefaces: The Complete Works The concept of readability Legibility and large-scale digitization Studies of typographical factors influencing speed of reading Does print size matter for reading? A review of findings from vision science and typography Serifs and font legibility Letter recognition at low contrast levels: effects of letter size Interaction effects in parafoveal letter recognition Crowding and eccentricity determine reading rate Reading speed benefits from increased vertical word spacing in normal peripheral vision. Optometry and vision science: official publication of the American Academy of The effect of age and font size on reading text on handheld computers The legibility of typefaces for readers with low vision: A research review Utilising psychophysical techniques to investigate the effects of age, typeface design, size and display polarity on glance legibility A study of fonts designed for screen display Make it big! the effect of font size and line spacing on online readability The effect of color and typeface on the readability of on-line text Psychophysics of reading. xviii. the effect of print size on reading speed in normal peripheral vision Boldness as a factor in type-design and typography The effect of letter-stroke boldness on reading speed in central and peripheral vision Psychophysics of reading. xv: Font effects in normal and low vision Wider letter-spacing facilitates word processing but impairs reading rates of fast readers Reading from computer screen versus reading from paper: does it still make a difference? A broader look at legibility Digital typography Feature-based design of fonts using constraints Better web typography for a better web Adaptive gridbased document layout Review of automatic document formatting The semiotic engine: notes on the history of algorithmic images in europe Aesthetic computing Infinifont: a parametric font generation system Parameterizable fonts based on shape components Dynamic fonts One form, many letters: Fluid and transient letterforms in screen-based typographic artefacts Learning a manifold of fonts Few-shot font generation with localized style representations and factorization Few-shot compositional font generation with dual memory Multi-content gan for few-shot font style transfer Large-scale tag-based font retrieval with generative feature learning Stefann: Scene text editor using font adaptive neural network Interactive machine learning Modeling fonts in context: Font prediction on web designs Creating personalized documents: An optimization approach Improving the performance of motor-impaired users with automatically-generated, ability-based interfaces Automatically generating personalized user interfaces with supple Optimizing user interface layouts via gradient descent Automatic playtesting for game parameter tuning via active learning Optimally designing games for cognitive science research Adapting interaction environments to diverse users through online action set selection Taking the human out of the loop: A review of bayesian optimization The automatic neuroscientist: A framework for optimizing experimental design with closed-loop real-time fmri Crowdsourcing interface feature design with bayesian optimization Designing engaging games using bayesian optimization Computational Design with Crowds JASP (Version 0.13.1)[Computer software Generalized nonnegative matrix approximations with bregman divergences 25 classic fonts that will last a whole design career Rank selection in low-rank matrix approximations: A study of cross-validation for nmfs George Williams and the FontForge Project contributors. Fontforge A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning Practical bayesian optimization of machine learning algorithms Bayesian Optimization: Open source constrained global optimization tool for Python Gaussian process bandits without regret: An experimental design approach Optics: Ordering points to identify the clustering structure We would like to thank Nils Neupärtl and David Hoppe for their constructive comments. The work is funded by German Research Foundation (DFG, grant: RO 4337/3-1; 409170253).