key: cord-0059550-2brmqu7s authors: Singh, Shivam Kumar; Chakrabarti, Sujit Kumar; Jayagopi, Dinesh Babu title: Automated Testing of Refreshable Braille Display date: 2020-10-09 journal: Human-Centric Computing in a Data-Driven Society DOI: 10.1007/978-3-030-62803-1_15 sha: 5c964307a5a09eb6858fe68e9a94d83c8ccb7966 doc_id: 59550 cord_uid: 2brmqu7s A majority of visually impaired population of India and other developing economies live in poverty. Accessibility without affordability has little meaning to this population. Assistive technology has great potential to make education accessible to this population, e.g. through refreshable Braille display devices. However, most existing solutions in this space remain out of reach for these users due to high cost. Innovation in data science and software engineering can play an important role in making assistive technological solutions affordable and accessible. In this paper, we present a machine-learning based automated testing approach that has played an important role in enabling us to design one of the most affordable refreshable Braille display devices of the world. The key component of our approach is a visual inspection module (VIM) created using Convolutional Neural Networks (CNNs). In our experiment, our model was able to detect malfunction of a Refreshable Braille display with 97.3% accuracy. Our model is small enough to be run on a battery-powered computer in real- time. Such accurate automatic testing methods have the potential to significantly reduce the cost of RBDs. Visual impairment is a global health issue. it is estimated that globally, there are 441 million visually impaired people encompassing range of impairment from mild levels to blindness. Over 90% of these live in developing countries like India [5] . It is estimated that India has more than 62 million people with some form of visual impairment out of which more than 8 million suffer from permanent blindness [25] . Education and integration of visually impaired people is a fundamental challenge that we need to solve. In this quest, we have taken help of many assistive technologies, one such technology is Refreshable braille display. RBD is an electromechanical device that allows a visually impaired person to read contents of a text file using a refreshable tactile feedback reader. One of the most important components of this device is the actuator which powers a retractable pin either up or down based on electrical voltage given by the control unit. Eight of these retractable pins combine to make a cell and multiple cells combine to make a complete display. As discussed, every cell having eight mechanical components means that they are prone to degradation and failure at the time of manufacturing and after heavy use. While manual testing is possible, it is very expensive and time-consuming, and also possible only much later in the production cycleafter the product integration. In the production stage, manual testing becomes a significant bottleneck in scaling up the production. Therefore, it is very important that the testing process be automated if we want to scale up the production process and make the product economically viable. This paper proposes a new method to test an RBD using a digital image processing technique based on deep learning. In this method, we capture an image of the cell being tested. After processing this image, we feed this to our convolutional neural network [22] (CNN) which predicts a value. We compare this value to input we gave to the cell. If they both match with each other, we declare the cell to be error-free. We have tested both traditional feature engineering and modern feature learning approach as explained in Sect. 4.1 and we have also experimented with multiple Neural Network architecture and hyper-parameters as explained in Sect. 4.2. After performing these experiment we have chosen feature learning approach with an architecture with excellent accuracy and acceptable inference time. As a first step, we created a dataset by capturing images of different configurations of a cell. In all, 4096 photographs were taken for 256 different cell configurations with 16 different illumination conditions. The dataset was further augmented to over 12000 images using standard techniques. About two third of this dataset, labelled with the cell configurations they corresponded to, were used to train a CNN model to identify the label for any given image. One third of the set was used as validation set. We tested the model on 768 images. Our model identified the input correctly with over 97.3% accuracy. Each identification took close to 0.06 s, and a scheme with multiple photographs of the same RBD image input, say 50 such images will take 3 s, which is well within the acceptable limits for real-time deployment within a production line. We also benchmark our results against traditional feature engineering based approaches and show the efficacy of using a feature learning based approach (i.e. CNN in our case). The paper is structured as follows; in Sect. 2, we relate our work with current literature. In Sect. 3, we explain our approach. In particular, we discuss the test architecture in Sect. 3.1 and the data set creation process we followed in Sect. 3.2. In Sect. 4, we discuss the experiments conducted to validate our approach and our CNN model and its architecture. In Sect. 5, we conclude the paper with a summary and a discussion on future work. Data science has come to the forefront in combating the challenges related to disability and inclusiveness through assistive technologies. A team at MIT in collaboration with NUS developed self driving wheelchair, and design was improved by researchers from College of Engineering and Computer Science, California State University at Northridge. [1, 17] . Navigation assistance is another area that has seen a significant boost in last 5 years because data from multiple sources like GPS, accelerometers, gyroscopes, and cameras is helping us solve this issue. Efforts have also gone in improving RBDs using Data Science like integrating optical character recognition (OCR) into the machine so that it can recognize and display any text [9] . Reliability of the cells is major concern in RBDs. Innovations in the direction of designing reliable RBD cells has a long history going back as far back as 1950 s [7] . There have been made many more attempts in the line of design improvement [4, 16, 26, 28, [31] [32] [33] 35] . In the quality assurance of any product a necessary compliment to design is testing. Better, faster, cheaper and effective testing is central to early discovery of faults thus preventing client site failures. This paper focuses on automated testing of RBD cells. Automation of testing using visual inspection is done in various industries like automobile, lumber, bottling, textile etc. with great results since 1980s [10] . Visual based inspection of PCB is also prevalent [18] . In automobile sector, inspection of parts like brake cylinder, camshaft, cylinder bore etc. are being done using visual inspection [23] . Automated Visual Inspection (AVI) is also used for analysis of radiographic images like X-ray from as early as 70s and 80s [36] . Inspection for the glass and ceramic industry and the inspection for the food and packaging industry is also vision based [10] . With the advent of deep learning, testing based on visual inspections have become ubiquitous as neural networks have become particularly good at feature extraction and pattern recognition. Many industries have recently adopted this type of testing. One of the major areas that benefited from this is transportation sector. Visual inspections are used to find crack and anomaly in bridges, tunnels and railway tracks [15] . AVI has also found a fundamental place in healthcare. It is also used by doctors to complement them while verifying different medical reports. AVI makes it easier to detect diseases like Skin Cancer [13] and Parkinsons Syndrome [24] . AVI is used to automate the testing process of printed control boards. It has been found that use of this technology has proved to be highly efficient [34] . Classification of solder joints have also been done using visual inspection by the help of neural networks [20] . Visualbased automated testing is becoming prevalent now. Testing of Printed Circuits boards (PCB) shares a lot of similarity with testing of an RBD therefore, we have decided to take this approach over other approaches for our case. In this paper, we propose a visual-based automatic inspection method for RBDs. We systematically compare, the traditional feature engineering versus modern feature learning based methods. After exploring the architecture and the hyperparameter space, we suggest an architecture with an excellent accuracy and acceptable inference time. In this section, we describe the test architecture of our proposed solution for visualbased automatic inspection of RBDs. We describe the dataset curation process towards building the model to predict the bit pattern from the visual input. Figure 1 shows the overall architecture of our test setup. The test setup has three main components. The first component is the RBD that is to be tested. The second is the visual inspection module (VIM). The third is the comparator (X). At the time of testing, an input (i.e. a byte corresponding to the character to be displayed) is given to the cell. The Braille character displayed by the cell as a result is the actual output of the system. Due to mechanical faults, there is a probability that this output may deviate from the input. It is the purpose of the testing system to find these deviations and notify when and where they happen. The approach is explained below: 1. For each input, one photograph of RBD module (consisting of two cells as shown in Fig. 2) is taken by the digital camera mounted on the system. 2. The pre-trained CNN unit analyses each image and predicts the output that is displayed by the cell. This is the actual output O. 3. The comparator compares O with I. If they match, the cell is considered to be working as expected. Our main objective is to design a reliable VIM that runs in real-time. To design this VIM, we created a image dataset first. We trained a CNN model using this dataset. After getting the desired accuracy, we use this model as our multiclass classifier to predict if an RBD is displaying the correct value or not. Our comparator compares this predicted value and original input (ground truth) to test if the RBD is working correctly or not. In Sects. 3.2 to 3.4 we will explain the complete process of creating the dataset and in Sect. 4 we will explain the our experiments with different CNN architectures. A comprehensive dataset is needed in order to train a neural net-work. However, a dataset for RBDs does not exist, which meant that we had to create our own dataset. To create a dataset we need to capture pictures of all the combinations of a braille cell in different lighting conditions. One thing worth noting here is that in our design, two cells combined form one module i.e. each module has 16 pins. Even if one cell of the module malfunctions, we have to replace the whole module. Therefore, both cells are given the same input and tested at the same time. We came up with a comprehensive plan to efficiently capture all the images. There are 8 pins on one cell of an RBD which means that total 256 i.e. 2 8 configurations are possible. We need to capture images in various lighting conditions hence we will use 4 different colored LEDs in four top corners of the box. With 4 different LEDs we will get 16 i.e. 2 4 lighting conditions. The design of the data creation enclosure can be seen in Fig. 3 . 2 4 images for every configuration means that we will get 2 12 i.e. 4096 images. However, in practice it was found that 4096 training examples were not enough to train a CNN as the parameter space is huge, therefore, we needed to synthesize artificial data too. To create additional synthetic data we rotated and changed the colour patterns of all the images. Each image was rotated by 5 and −5°and slight variations were introduced in the colour pattern. These variations ensures that our dataset has diverse range of illumination conditions. Rotation ensures that model will perform well even if images are captured from different angles because it will be an invariant model [14] . After doing this we got 48 images for each configuration bringing total no. of images to 12288. In Fig. 4 and we can see a captured image and a synthesised image. In order to feed the data to a deep learning model, we need to make it as useful as possible as the real-world data is often incomplete, inconsistent, and lacking in certain behaviours or trends, and is likely to contain many errors [11] . However, in our case, the data was created in a controlled environment, therefore, it did not need extensive pre-processing. However, a series of steps were taken to make it more refined and useful. All the images were converted to gray-scale because when we tried the same experiment with RGB scheme, results were marginally better but significantly more computationally intensive. This also helped as it boosted the prediction time significantly. A Gaussian blur was also applied to make the images a little bit smoother [6] . Each image when captured was of size 640 Â 480, it was reduced to size 100 Â 100. 100 Â 100 image contains enough features needed by a network and it is significantly faster to process than a 640 Â 480 image. We compared feature engineering approach versus feature learning based approach. Our engineered features were extracted as follows: we used Oriented FAST and Rotated BRIEF (ORB) [30] feature detector (which is an alternative to SIFT and SURF as they are patented). ORB uses FAST keypoint detector to determine the key points [29] . Then a Harris corner detector is applied to find top points. After finding top points, BRIEF descriptor is used [8] . ORB can detect both corners and blobs and it is rotation invariant and resistant to noise. Finally, we tried two classical ML multi-class classification models i.e. logistic regression (LR) and decision tree classifier (DT). As regards, feature learning, we used a CNN on the same data, the exact details of the architecture is explained in the subsequent section. Overall, results suggest feature learning is better. Our CNN model performs much better than engineered features along with shallow ML models. Details of the performance can be found in Table 1 . We tried multiple CNN architecture combinations before deciding on the final architecture. Hyper-parameters are the parameters that affect other parameters of the model. In our model, hyperparameters are: number of layers, size of each layer, kernel size of a convolution layer, number of filters in a convolution layer, size of Maxpooling, activation functions, batch size and number of epochs. We experimented with models with one and two convolution layers. Kernel sizes were picked from {3, 5}, number offilters was picked from {32, 64}. We also experimented with ReLu [12] and sigmoid [12] activation functions. However, sigmoid suffers from vanishing gradient problems. So we used ReLu. We found out that model with two convolution layers with Kernel size as 3, number offeatures as 64 and activation function as ReLu performs better than other models. We trained our network on different combinations of hyperparameters and see what works best for us. Different hyper-parameters were tested and we monitored the loss function on both Validation set and Training set on Tensorboard [2] . Table 2 shows results from one such experiment. Model 1 is the chosen model, Model 2 had only 32 filters in convolution layer, Model 3 had only one convolution layer and Model 4 has 32 filters with kernel size of (5 Â 5). Model 1 performed the best. In our final model (Model 1) the first and second layers are convolution layers with ReLu activation function and Maxpooling [12] . Third and fourth layers are fully connected, after flattening. We have used Adam [21] as our optimiser with batch size of 32. Figure 5 shows architecture of the model. After trying out many combinations, we found the best balance between speed of prediction and accuracy with these hyperparameters (Table 3) After training the final model on training set of 8068, cross validation set of 3258 and test set of 768 images we got the training set accuracy: 99.6%, validation set accuracy: 99% and test set accuracy: 97.3%. This shows that our algorithm has both low bias and low variance. There are two important factors that we need to consider while prediction i.e. confidence in the prediction and time elapsed while prediction. While we have over 97% accuracy, we would still like more confidence and for that we can take multiple samples and test all of them all. We have to make sure that process is as quick as possible because this will help us test a cell at the assembly line itself. Our model takes 2.8 s to predict 50 images and compare if it matches with the test input or not. This does not take into account the time taken by the camera to capture 50 images as image capturing task is easily parallelizable. Affordability is at the heart of accessibility. There are many examples reported in literature where data science and computing are used in the design of accessibility features [1, 3, 17] . To the best of our knowledge, there is not enough reported work reporting how data science and computing can significantly bring down the manufacturing cost, thus adding affordability to accessibility. In this paper, we have presented a method for automated testing of RBD cells based on deep learning that has played an important role in bringing down the manufacturing cost of an RBD. In our experiments, our method performed with an accuracy of 97.3%. We believe that our model can be used on any type of Refreshable Braille Display. Also, note that in the production environment parameters like illumination and camera position/angle can be much more closely controlled. This makes it likely that the figures obtained in our experiments are more conservative in terms of accuracy and speed than those of a production environment. Currently, our method tests one module at a time. We can extend the method to test a complete RBD consisting of 7 such modules at one time, which will significantly enhance the testing speed. There are 2 different approaches to solve this problem: Semantic Segmentation and Sliding window Method. State of the art semantic segmentation models like Mask-RNN, Faster R-CNN [27] are too slow (5-8 FPS) for our use case as we need to test the modules on a moving conveyor belt. Sliding window method does not work well without borders around each module. Therefore, we need to slightly change the design of the module itself by putting contrasting color borders. To test the entire RBD display at a time, we also have to create a dataset to train our neural network for testing multiple cells at a time. However, this means that the number of input combinations will go to 2112 which well beyond feasible range. Test generation techniques like T-wise coverage [19] can be used to bring down the input space within feasible range. A. Dataset. The entire dataset has been released in public domain and can be found at https://www.kaggle.com/shivam3376/refreshable-braille-display-cell. B. Code. We are also releasing the code to train and delpoy the model. It can be found at https://github.com/shivamkumarsingh114/Automated-testing-of-RBDs.git. Smart fm trials self-driving wheelchair Tensorflow: a system for large-scale machine learning Trailcare: An indoor and outdoor context-aware system to assist wheelchair users Refreshable braille now and in the years ahead Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis The opencv library. Dr Dobb's Reading apparatus (1950). US Patent 2 BRIEF: binary robust independent elementary features An open source tesseract based tool for extracting text from images with application in braille translation for the visually impaired Automated visual inspection: a survey An overview on data preprocessing methods in data mining Data classification with deep learning using tensorflow Dermatologist-level classification of skin cancer with deep neural networks Data augmentation: how to use deep learning when you have limited data Deep multitask learning for railway track inspection Reading and writing machine using raised patterns Human-machine interface for a smart wheelchair A method for automating the visual inspection of printed wiring boards Software Testing: A Craftsman's Approach Visual inspection system for the classification of solder joints Adam: a method for stochastic optimization Imagenet classification with deep convolutional neural networks The application of optics to the quality control of automotive components Automated diagnosis of parkinsonian syndromes by deep sparse filtering-based features Global estimates of visual impairment: 2010 A compact electroactive polymer actuator suitable for refreshable braille display Faster R-CNN: towards real-time object detection with region proposal networks 49.2: rotating-wheel braille display for continuous refreshable braille Machine Learning for High-Speed Corner Detection ORB: An efficient alternative to SIFT or SURF EAP actuators aid the quest for the 'holy braille' of tactile displays Alphanumeric 'displays' for the blind: a technology search Electromechanical transducer for relief display panel Automatic optical inspection for surface mounting devices with IPC-A-610D compliance Conductive Electroactive Polymers: Intelligent Materials Systems Results of X-ray television inspection of electronic parts Acknowledgement. This work was supported by Karnataka Innovation Technology Society, Dept. of IT, BT and ST, Govt. of Karnataka