key: cord-0572246-3r607k4w
authors: Darapaneni, Narayana; Tanndalam, Arjun; Gupta, Mohit; Taneja, Neeta; Purushothaman, Prabu; Eswar, Swati; Paduri, Anwesh Reddy; Arichandrapandian, Thangaselvi
title: Banana Sub-Family Classification and Quality Prediction using Computer Vision
date: 2022-04-06
journal: nan
DOI: nan
sha: 889e248c3604cbf6153692887ce466421600c532
doc_id: 572246
cord_uid: 3r607k4w

India is the second largest producer of fruits and vegetables in the world, and one of the largest consumers of fruits like Banana, Papaya and Mangoes through retail and ecommerce giants like BigBasket, Grofers and Amazon Fresh. However, adoption of technology in supply chain and retail stores is still low and there is a great potential to adopt computer-vision based technology for identification and classification of fruits. We have chosen banana fruit to build a computer vision based model to carry out the following three use-cases (a) Identify Banana from a given image (b) Determine sub-family or variety of Banana (c) Determine the quality of Banana. Successful execution of these use-cases using computer-vision model would greatly help with overall inventory management automation, quality control, quick and efficient weighing and billing which all are manual labor intensive currently. In this work, we suggest a machine learning pipeline that combines the ideas of CNNs, transfer learning, and data augmentation towards improving Banana fruit sub family and quality image classification. We have built a basic CNN and then went on to tune a MobileNet Banana classification model using a combination of self-curated and publicly-available dataset of 3064 images. The results show an overall 93.4% and 100% accuracy for sub-family/variety and for quality test classifications respectively.

Recognition of fruit type and quality and its automation is important in smart agriculture to increase production efficiency. As of 2018-19, total production stood at 313 million tons. Among fruits, India ranks first in the production of fruits like bananas (25.7%), papayas (43.6%) and mangoes (40.4%). This high production serves both domestic consumption, high amounts of exports and also processed food. Domestic purchase and consumption are being slowly penetrated by eCommerce delivery giants like BigBasket, Grofers and Amazon Fresh who rely on technology not just for the commerce platform but also in entire supply chain, packaging and delivery. At the same time, retailer shops are also gradually adopting technology at quality checks, weighing and point of sale. Less manual interventions at various stages both to save cost and reduce manual handling are gaining importance, more so after COVID-induced behaviors became the norm.

One of the important use-case in this technology adoption for both eCommerce and Retailer scenario we are exploring is identification of fruit using computer vision, ability to decipher the sub-family (or variety) of the fruit and predict the quality of the fruit. For the purpose of this work of research the fruit we picked is Banana given its popularity in consumption and number of varieties available. The three main use cases we have incorporated in this work are:

 Identify Banana from a given image  Determine sub-family or variety of Banana  Determine the quality of Banana Currently this is done completely manually by most of the large eCommerce players and at Retailer shops. We intend to solve this problem using Deep-Learning techniques available. Successful development of the solution can be leveraged to integrate with existing ecosystem of products and methods in use currently. The idea can be extended to multiple fruits, vegetables and other inventory products and multiple products simultaneously. It can be a primary step for overall inventory management automation, quality control, quick and efficient weighing and billing which all are manual labor intensive currently. The major scientific contributions we present in this paper are as follows: 

To predict the quality of fruit or vegetable using images there are several approaches published. Learning from the references mentioned in this section. The key takeaway from the existing work was the model architectures and hyper parameters. In [6] [5] have tried to judge the quality of lemons using the images from Fruit 360 dataset using a publicly-available dataset of 2690 images. Due to lack of data they have used GANs to enrich their dataset. It proposed the use of pretrained VGG16 as a model with a small batch size to accommodate it on commodity hardware. It was found that appending a 4096 neuron fully connected layer to the convolutional layers leads to an image classification accuracy of 83.77%. The model was then trained on a Conditional Generative Adversarial Network on the training data for 2000 epochs, and it learned to generate relatively realistic images. Grad-CAM analysis of the model trained on real photographs shows that the synthetic images can exhibit classifiable characteristics such as shape, mould, and gangrene. A higher image classification accuracy of 88.75% is then attained by augmenting the training with synthetic images.

Mazen [11] In this paper, an automatic computer vision system is proposed to identify the ripening stages of bananas. First, a four-class homemade database is prepared. Second, an artificial neural network-based framework which uses color, development of brown spots, and Tamura statistical texture features is employed to classify and grade banana fruit ripening stage. Results and the performance of the proposed system are compared with various techniques such as the SVM, the naive Bayes, the KNN, the decision tree, and discriminant analysis classifiers. Results reveal that the proposed system has the highest overall recognition rate, which is 97.75%, among other techniques. Classification of Banana Fruits Using Deep Learning, 2020, Ahmed F. Al-daour, Mohammed O. Alshawwa. [20] In this paper, machine learning based approach is presented for identifying type of banana with a dataset that contains 8,554 images, out of which 4,488 images were used for training, 1,928 images for validation and 2,138 images for testing. A deep learning technique that extensively applied to image recognition was used. Using 70% from image for training and 30% from image for validation their trained model achieved an accuracy of 100% on a held-out test set, demonstrating the feasibility of this approach.

We explored using existing dataset like Fruits 360 to use some existing available data. However, on assessment of existing open datasets, we realized that to make this useful for the Indian context and also to address the quality prediction use-case, we needed some new data which needs to be collected from scratch. We addressed this by procuring different sub-families of bananas from the market, clicked pictures of them and created our own dataset required for the project. We created about 1500+ images of the banana dataset using these pictures. The subfamilies that were used are Elakki, Red Banana, Robusta, Nendram and Hill Banana. We further augmented this by using about 800 banana images data from Fruits 360 dataset for Robusta and Red Banana. We also added some negative use-case images of other fruits, so that we can address the first use-case of identifying Banana from the rest of the fruits. For this purpose, we clicked pictures of fruits like sweet lime, pomegranate and apple and added them as Other Fruits in the dataset. All the pictures were clicked with plain background so that the noise level in learning is reduced, and below Table 1 

Dataset created for the project has a total of 3064 images and 6 classes. The number of images for each of the class is available in the graph below. Some examples of the images from our dataset are also shown below in Figure 1 and Figure

We split the data collected into train and test data folders for us to have some clean unseen data that we can use for testing. We have ensured a suitable naming convention that would assist us in labelling our data using respective folder names. We used Image Data Generator's flow from directory method to read data and label them for all three data sets of train, validation and test. Around 76% data was used for training, around 19% data was used for validation and approximately 5% data was used for testing purposes.

Image augmentation techniques are applied at random on images to add diversity and increase data volume. The operations performed in augmentation are rotation, shifting and flip (horizontally and vertically) before providing it to the model. Figure 1 are image samples of augmentation applied on the dataset:

Our model development process has two stagesfirst we implemented a base CNN model with below mentioned architecture and second phase was implementing transfer learning using various available pre-trained models such as MobileNet, VGGNet, EfficientNet, ResNet & InceptionNet. The following sections describe further details on both stages of the process. Additionally, we have made the following observations as part of the model building and tuning process • Increasing the number of epochs improves the accuracy • Early stopping helps to reduce the execution time • Changes in batch size improves execution time and memory requirement • Augmentation helps in avoiding overfitting of data

We have built a base CNN model using Tensorflow Keras library. Image size is kept as 256 x 256 pixels. We onvolutions. The structure of MobileNet is shown in Figure 2 .

have 1 input layer, 4 number of convolution layers, 4 number of max-pooling, 3 number of batch-normalization layer, 1 drop out layer, 3 Dense layers and then finally 1 output layer with 6 classes using softmax for classification. We defined model checkpoints to store weights after each run and stored the output in a history file. On executing this for 10 epochs, we received a model validation accuracy of 75.1%

To further improve the accuracy, we tried a few transfer-learning models like MobileNet, VGGNet, EfficientNet, ResNet & InceptionNet and the performance of which are shown in the results section of this paper.

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks.

So, transfer learning was the go to strategy for improving model performance. The number of available architectures to train goes beyond count. Comparing all architectures to each other is a difficult task. After having tried the following MobileNet, VGGNet, EfficientNet, ResNet & InceptionNet, we were able to conclude that Inception Net, VGG Net and MobileNet architectures have properties making them worthwhile evaluating for this project.

MobileNet is a type of Convolution Neural Network Architecture used for Image Classification. It is a mobilefirst model, which helps with higher accuracy consuming lower computation resources. MobileNet Architectures not only reduce the model size but improve prediction speed 10x with comparable accuracy. They use regular Convolutional layer but only once (at the beginning), all other layers use Depth-wise Separable Convolution. Depth-wise Separable Convolution is a combination of two Conv Layers -Depth-wise Convolution & Pointwise Convolution. The depthwise convolution filter performs a single convolution on each input channel, and the point convolution filter combines the output of depthwise convolution linearly with 1 * 1 c convolutions. The structure of MobileNet is shown in Figure 2 . 

In this work of research, we trained MobileNet with transfer learning using weights from a model trained with the ImageNet weights. For our model we have used pre-trained MobileNet model and replaced the last layer with three additional dense layers with 1024, 512 and 256 neurons, followed by an output layer for classification into the 6 classes of the banana variety in our dataset. The weights of the model built is also used for another of our use case of identifying and classification of the quality of banana for one of the varieties.

We set the first 20 layers as non-trainable and from 21 st and layer up to the last layer as trainable. The architecture is depicted in the Figure 4 The network was trained for 16 epochs. The models are trained using TensorFlow [1] , with the implementation of MobileNet provided by Keras. The standard Adam Optimizer is used, with learning rate of 1e -3 . Table 2 lists the model attributes.

Optimizer Adam Loss Categorical CrossEntrophy 

To improve the accuracy of the base model, we tried a few transfer-learning models like MobileNet, that gave a validation accuracy of 91.6% and test accuracy of 93.4%. For VGGNet, we have a validation accuracy of 94.1% and test accuracy of 91.4%. For EfficientNet, we were able to achieve a validation accuracy of 84.3% and test accuracy of 80.1%. For ResNet, we were able to achieve a validation accuracy of 67.8% and test accuracy of 70.1% and for InceptionNet, we were able to achieve a validation accuracy of 91.4% and test accuracy of 92.1%. Out of the above tried architectures, Inception Net, VGG and MobileNet gave the best test results. We have chosen MobileNet as our model as it gave the highest test result and consumes smaller model size.

Validation Accuracy Test Accuracy Table 3 Validation accuracy with different models 

As explained in the previous section, our model is based on the MobileNet architecture and have covered the following three use cases successfully i) Identify Banana from a given image ii)

Determine sub-family or variety of Banana and iii)

Determine the quality of Banana.

For the first and second use case of classifying a given image, the Base CNN model gave a validation accuracy level of 91.2%. Among all the transfer-learning models evaluated, best validation accuracy was attained using our model is at 91.6%. Test accuracy using our model stands at 93.4%. Table 3 is a comparison of model-size and accuracy levels attained using various models. The classification report Table 4 and confusion matrix Table 5 Table 4 Classification report for sub-variety use case 

For better visualization of the performance of the model we have implemented Grad-CAM technique. Gradientweighted Class Activation Map (Grad-CAM) for a particular category indicates the discriminative regions used by the CNN to identify that category. This technique makes CNN models more transparent by visualizing the regions of inputs that are most important for predictions from these models. It used the class-specific gradient information flowing into the final convolutional layer of a CNN to produce a coarse localization map of the important regions in the image. 

At present, Retail stores rely on cashiers or self-service checkout systems to process the customers' purchases. While most products have barcodes that can be scanned, and hence the checkout time has already been minimized, fruits and vegetables are commonly processed manually. The cashier or the customer need to physically identify and inspect the class of product being bought and look for it in the system before scanning. Visual inspection is labor intensive and prone to human errors and variability.

By adding the ML approach to this process using the model developed by us in this project we can completely automate the process without the need of human intervention for any of the sub-tasks involved.

A short flow of the solution would be as follows: Place one product at a time from basket onto the conveyor belt → click a picture of the item and weigh the product simultaneously → backend model identifies the fruit → assign weight and corresponding price of the item and send to invoice/billing page → Pack and checkout.

We have used raw images (with a white background) without any pre-processing techniques and thereby saving few more seconds of response time. In case of retail stores, the picture clicked would have a black (or any other color of conveyor belt) and can be used as is. A basic camera would suffice as for the hardware requirements. This greatly reduces labor costs and billing counter checkout time while also providing a good customer experience.

We have also included the use case of the model identifying the quality of bananasgood/bad. This would help the procurement and quality check team to classify and remove products that are over-ripe or not of optimum quality. This use-case is a value add to help decrease production costs and increase the quality of the product. External defects such as surface and skin defect are one of the most influential factors in the commercial quality of fruit, and this is the issue we addressed in our work.

Another feature is identifying and classifying bananas when present with a bunch of other fruits. This would help if the consumer by chance places more than one product/ if two products are placed very close to each other on the belt while the picture is clicked in the billing process.

A problem, directly related to the purchase of fruits in retail stores, is that fruits can be inside a plastic bag/ paper bag or any other form of wrap/container. While we have addressed the issue of identifying a single banana/bunch of bananas, we are yet to include the option of capturing these images if present inside any other form of wrapping. Since the regular practice is of first billing followed by packing into bags, we have not explored this option and could be explored as part of future work to make this more practical to use in the market. An addition to further work would be to include quality check for varieties other than Elakki and to test for two or more varieties of bananas in the same frame. Another drawback to be considered is that our model would classify fake bananas or any object with the shape and color of a banana as a real fruit. This is due to the fact that we have assumed only real fruit images would be fed into the model and have limited its working to identifying skin/surface level features/defects only. Further on the model deployment and production, creating a REST api, predicting and maintenance are part of future work of this paper.

For this project we have explored various options to come up with a solution using computer vision. With the help of transfer learning using other advanced architectures such as VGG Net, ResNet, Inception Net, EfficientNet and Mobile Net on top of a base CNN model and after rigorous testing we converged on classification using MobileNet as the best option. There are three main points we would like to discuss as a closing note of this paper, 1) Hardware requirements -As for the hardware required for deployment in businesses, the model can be deployed on commodity hardware and is suitable for mid-size retail stores as they can leverage this product without huge setup costs. 2) Part of overall inventory management -The model can be used as part of overall inventory automation.

First by the procurement team for quality pass and also it could be integrated with billing counter system to auto detect the type of banana and then calculate the price as per the weight. 3) Scalability and Future Vision -Overall the model can be scaled and trained to predict and classify other varieties of fruits, vegetables and other objects in retail outlets.

Fruit classification for retail stores using deep learning

VegFru: A domain-specific dataset for fine-grained visual categorization

A hierarchical grocery store image dataset with visual and semantic labels

Ripeness classification of bananas using an artificial neural network

Fruit quality and defect image classification with Conditional GAN data augmentation

Fruit classification using computer vision and feedforward neural network

Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers

Tomato quality evaluation with image processing: A review

Classification of sweet onions based on internal defects using image processing and neural network techniques

Fruit and Vegetable Identification Using Machine Learning for Retail Applications

Ripeness Classification of Bananas Using an Artificial Neural Network

Fruit recognition from images using deep learning

Automatic fruit classification using deep learning for industrial applications

Implementation of fruits recognition classifier using convolutional neural network algorithm for observation of accuracies for various hidden layers

Banana classification using deep learning

An Effective Pomegranate Fruit Classification Based On CNN-LSTM Deep Learning Models

Convolutional Neural Networks (CNN) for Detecting Fruit Information Using Machine Learning Techniques

Fruit recognition from images using deep learning 2

Automatic Fruit Classification Using Deep Learning for Industrial Applications

Classification of Banana Fruits Using Deep Learning