May 19, 2023
Artificial Intelligence (AI) has made great strides in recent years, thanks to advances in deep learning, which have enabled the creation of increasingly complex models capable of identifying objects in images and performing tasks that were previously deemed impossible. However, these models require large amounts of data to be trained and improve their accuracy, which leads us to wonder: where do these data come from, and how can we ensure they are accurate? Although the task may seem daunting, the results obtained are impressive, and the potential of AI is enormous.
Deep learning models for synthetic data generation: GANs and Autoencoders
Specifically, technology is advancing by leaps and bounds, and we have moved from models of synthetic data generation such as Generative Adversarial Networks (GANs) and Autoencoders, which are two types of deep learning models used in data generation tasks.
A GAN consists of two neural networks: a generator that produces synthetic samples and a discriminator that tries to distinguish between generated and real samples. Both networks are adversarially trained, meaning the generator tries to deceive the discriminator while the discriminator tries to correctly identify the samples. Over time, the generator learns to produce synthetic samples that are increasingly difficult to distinguish from real samples.
On the other hand, an Autoencoder is a model that consists of a neural network that is trained to encode the input into a lower-dimensional representation (a smaller image) and then decode it back into the original output. The goal of training is to minimize the difference between the original input and the decoded output, forcing the model to learn an efficient representation of the input data. Once the autoencoder is trained, it can be used to generate new synthetic samples by feeding the model an image and decoding it into a synthetic output.
In summary, while GANs use adversarial competition between the generator and the discriminator to generate synthetic samples, Autoencoders use a neural network to encode and decode input data and generate new synthetic samples from the learned parameters. These two models have been widely used in recent years to generate additional samples in large data sets. However, throughout 2022, several releases were made on a new type of models that seek to generate synthetic images:
All of them are based on the diffusion model proposed in a paper in 2015. Each of these models has the ability to generate images from text inputs, meaning you tell the model what you want to see, which has a high impact since it is possible to generate a set of image data from scratch. The only one of these four models that can be used freely is Stable Diffusion, which is a machine learning model developed by the company Stability AI, which generates high-quality digital images from natural language descriptions.
The model differs from other similar models, such as DALL-E, by being open-source, free, and not artificially limiting the images it produces. It can be used to generate text-guided image-to-image translations and enhance images.
How to use Stable Diffusion to generate synthetic images?
There are several options to use Stable Diffusion, however, access to an NVIDIA graphics card (GPU) is required to run it. For free, you can directly access the model using the following Google Colab notebook, just follow the steps described there and you can start generating images. On the other hand, if you have access to an NVIDIA GPU locally, it is ideal to install the model to be able to run it and generate the desired images.
To do this, it is recommended to follow the following steps:
22. Copy that URL and open it in your browser, for example in Google Chrome.
23. Now you can start entering text to generate images in the box above.
BONUS: Generating more synthetic data with Stable Diffusion and FLIP
Suppose you have an agriculture project where you need a dataset of strawberries to later label them and train an AI model that detects the location of all strawberries in images. However, there are not enough data available for this task. This is where Stable Diffusion comes into play to generate artificial strawberries, and then FLIP, an open-source library for computer vision tasks developed by LinkedAI, can further increase these data, creating more information for training.
Generating synthetic strawberry images with Stable Diffusion.
Let’s open Stable Diffusion and enter the following text input: “a realistic photo of strawberries”.
Resulting in the following output:
High-quality images, quite similar to real strawberries. Take a close look at each image and delete the ones that didn’t turn out well. Out of the 20 generated images, we will keep 17, which are the most realistic.
FLIP is a library that basically performs data augmentation using various computer vision techniques such as rotation, resizing, image combination, etc. To use it, you need to have images with the object cropped and a transparent background. You can use any software to separate the object from the background. In this case, I’m using the LinkedAI platform to achieve this.
The idea is to generate cutouts of the objects in .PNG format as shown below.
Subsequently, we create 2 folders in the FLIP repository, specifically in the path examples/data, which should already be cloned on your computer. The folders should be named “backgrounds” and “objects”. In the “objects” folder, we create a subfolder called “strawberry” and we place all the strawberry .png cutouts inside.
On the other hand, we open Stable Diffusion again and create images for the artificial background. In my case, I used the following prompts:
Positive: an empty chopping board, top view.
Negative: fruits, vegetables.
Negatives are used so that the model does not add things to the image that we do not need. We select the best images and move them to the “backgrounds” folder.
We should have something like this in “objects/strawberry”:
And this is how it looks in backgrounds.
Finally, we run the FLIP example file, and we will achieve something like this.
In fact, we can have a file ready to train an object detection model in the Google Cloud Vision API.
In conclusion, the use of synthetic data is becoming a promising solution for labeling artificial intelligence models using images, and deep learning models such as GANs and Autoencoders have been widely used to generate this data. Additionally, new models like DALL-E, Imagen, Midjourney, and Stable Diffusion have been developed, which have the ability to generate images from text inputs.
In particular, Stable Diffusion is an open-source and free machine learning model that enables high-quality digital image generation from natural language descriptions. The generation of synthetic data using these models can significantly reduce the cost and time required to label large datasets, which can improve the efficiency and accuracy of artificial intelligence models.