id author title date pages extension mime words sentences flesch summary cache txt cord-020848-nypu4w9s Morris, David SlideImages: A Dataset for Educational Image Classification 2020-03-24 .txt text/plain 2276 145 51 Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. Born-digital and educational images need further benchmarks on challenging information retrieval tasks in order to test generalization. While document scans and born-digital educational illustrations have materially different appearance, these papers show that the utility of deep neural networks is not limited to scene image tasks (Fig. 1) . The related DocFigure dataset covers similar images and has much more data than SlideImages. In this paper, we have presented the task of classifying educational illustrations and images in slides and introduced a novel dataset SlideImages. ./cache/cord-020848-nypu4w9s.txt ./txt/cord-020848-nypu4w9s.txt