LinkedAI

This article shows the use of data-centric convolutional neural networks for the count of grape clusters from images of vineyards at 1 or 1.5 meters from the clusters found in Embrapa Wine Grape Instance Segmentation Dataset — Embrapa WGISD, however, there are only 300 images in total in this dataset, so Python’s Flip library was used to create synthetic images with bunches of grapes that increase the size of the data set.

For this task, the Faster RCNN model with ResNet 50 backbone presented by PyTorch was used with pre-trained weights and 2 classes called background and cluster, additionally, the SGD optimizer was used, a learning rate of 0.005 and a decrease in the Learning Rate when it reached the stability. of loss. On the other hand, 260 synthetic images were created with the backgrounds and objects presented in the following figure, and to increase the variability of the images, rotation on the y-axis of both the objects and the backgrounds and a change in brightness were allowed. of 70%, keeping the parameter “force” in False so that these changes are random.

Finally, the histogram equalization was performed to highlight the clusters and increase the number of both original and synthetic images as recommended in the work presented by Santos, et. al.

‍

‍

‍

40 images were taken from the original data set without any processing to evaluate the final performance of the models, leaving a total of 540 original images to train, a total number that would be maintained for all experiments with synthetic images to make the results obtained in each comparable experiment and the number of original images will be varied to observe scenarios in which there are fewer original images and the data set is completed with synthetic images.

250 epochs were trained for 5 different data sets

In the first one trained with 100% of the original images.
In the second with 100% synthetic images.
In the third a 50:50 ratio was used, that is, 260 images originals and 260 synthetic images.
For the fourth a ratio of 70:30 was used, with 70% of synthetic images and 30% of original images and finally in the last experiment a ratio of 30:70 was used with 30% of synthetic images and 70% of original images.

The results obtained are presented in the following table together with examples of the qualitative results of each one, the metric presented is the average of the absolute difference of the detected clusters over the total of clusters present in the image.

‍

‍

Results with predictions in red and annotations in blue.

‍

As seen in the results, the original model performs very well in the cluster count in the test dataset, on the other hand, the results of the dataset with 100% synthetic images get an error 3 times greater than the original. However, in experiments where a 50% relationship between the images or a greater number of synthetic images is handled, a mean error is obtained that is quite close to the original. Finally, the experiment with 30% synthetic images obtained a lower error than the original model, that is, better performance in the cluster count in the test data set.

Therefore, it is possible to conclude a great contribution of synthetic data in the performance of Deep Learning models since in cases where you have a small data set, you can increase its size and maintain an error close to a large data set of original images plus, better performance is possible with a suitable combination of original and synthetic images.

‍

Final results with predictions in red and annotations in blue.

‍

Grape clusters detection using Deep Learning and synthetic images