LinkedAI

One of the biggest challenges in creating these models is the need to obtain a large number of images and then manually label these data sets, which can be a costly and time-consuming process. In this regard, the generation of synthetic data has become a promising alternative to facilitate the image labeling process and reduce associated costs. In this article, we will explain the importance of generating synthetic data for AI model labeling using images, as well as the different methods and techniques that can be used to generate such data.

Specifically, technology is advancing by leaps and bounds, and we have moved from models of synthetic data generation such as Generative Adversarial Networks (GANs) and Autoencoders, which are two types of deep learning models used in data generation tasks.

A GAN consists of two neural networks: a generator that produces synthetic samples and a discriminator that tries to distinguish between generated and real samples. Both networks are adversarially trained, meaning the generator tries to deceive the discriminator while the discriminator tries to correctly identify the samples. Over time, the generator learns to produce synthetic samples that are increasingly difficult to distinguish from real samples.

On the other hand, an Autoencoder is a model that consists of a neural network that is trained to encode the input into a lower-dimensional representation (a smaller image) and then decode it back into the original output. The goal of training is to minimize the difference between the original input and the decoded output, forcing the model to learn an efficient representation of the input data. Once the autoencoder is trained, it can be used to generate new synthetic samples by feeding the model an image and decoding it into a synthetic output.

In summary, while GANs use adversarial competition between the generator and the discriminator to generate synthetic samples, Autoencoders use a neural network to encode and decode input data and generate new synthetic samples from the learned parameters. These two models have been widely used in recent years to generate additional samples in large data sets. However, throughout 2022, several releases were made on a new type of models that seek to generate synthetic images:

DALL-E
Imagen (Google Brain)
Midjourney
Stable Diffusion

All of them are based on the diffusion model proposed in a paper in 2015. Each of these models has the ability to generate images from text inputs, meaning you tell the model what you want to see, which has a high impact since it is possible to generate a set of image data from scratch. The only one of these four models that can be used freely is Stable Diffusion, which is a machine learning model developed by the company Stability AI, which generates high-quality digital images from natural language descriptions.

The model differs from other similar models, such as DALL-E, by being open-source, free, and not artificially limiting the images it produces. It can be used to generate text-guided image-to-image translations and enhance images.

‍

There are several options to use Stable Diffusion, however, access to an NVIDIA graphics card (GPU) is required to run it. For free, you can directly access the model using the following Google Colab notebook, just follow the steps described there and you can start generating images. On the other hand, if you have access to an NVIDIA GPU locally, it is ideal to install the model to be able to run it and generate the desired images.

To do this, it is recommended to follow the following steps:

Install Pyenv on your operating system.
Create a folder on your computer.
Open Windows Powershell and navigate to the created folder.
Update Pyenv with the following command: pyenv update
Install the required version of Python for Stable Diffusion using the command: pyenv install 3.10.6
Use the following command to use the Python version installed with Pyenv: pyenv shell 3.10.6
Use git to clone the following repository, which will be our graphical user interface (GUI). https://github.com/AUTOMATIC1111/stable-diffusion-webui
Open the link and download the file, this model is version 2.1 of Stable Diffusion, trained on 768x768 pixel images. Save the model in the directory ./stable-diffusion-webui/models/Stable-diffusion of the GUI.
Use git to clone the following repository: https://github.com/Stability-AI/stablediffusion.git
Once cloned, open the folder ./configs/stable-diffusion.
Copy the file named v2-inference-v.yaml.
Paste the file into the GUI folder, in the directory ./stable-diffusion-webui/models/Stable-diffusion.
The .yaml file and the .ckpt file should have the same name, rename them without changing the extension.
Open the directory /stable-diffusion-webui and find the file called webui-user.bat.
Right-click on the file and select the option to edit.
Find the line “set COMMANDLINE_ARGS” and replace it with “set COMMANDLINE_ARGS= — xformers”.
Save the file and close it.
Open Windows Powershell and navigate to the GUI directory ./stable-diffusion-webui.
Run pyenv shell 3.10.6 again.
Finally, run pyenv exec webui-user.bat and wait a few minutes until all necessary libraries are installed.
If everything installs correctly, you will see something like this.

22. Copy that URL and open it in your browser, for example in Google Chrome.

23. Now you can start entering text to generate images in the box above.

‍

Suppose you have an agriculture project where you need a dataset of strawberries to later label them and train an AI model that detects the location of all strawberries in images. However, there are not enough data available for this task. This is where Stable Diffusion comes into play to generate artificial strawberries, and then FLIP, an open-source library for computer vision tasks developed by LinkedAI, can further increase these data, creating more information for training.

‍

Let’s open Stable Diffusion and enter the following text input: “a realistic photo of strawberries”.

Resulting in the following output:

High-quality images, quite similar to real strawberries. Take a close look at each image and delete the ones that didn’t turn out well. Out of the 20 generated images, we will keep 17, which are the most realistic.

‍

FLIP is a library that basically performs data augmentation using various computer vision techniques such as rotation, resizing, image combination, etc. To use it, you need to have images with the object cropped and a transparent background. You can use any software to separate the object from the background. In this case, I’m using the LinkedAI platform to achieve this.

The idea is to generate cutouts of the objects in .PNG format as shown below.

Subsequently, we create 2 folders in the FLIP repository, specifically in the path examples/data, which should already be cloned on your computer. The folders should be named “backgrounds” and “objects”. In the “objects” folder, we create a subfolder called “strawberry” and we place all the strawberry .png cutouts inside.

On the other hand, we open Stable Diffusion again and create images for the artificial background. In my case, I used the following prompts:

Positive: an empty chopping board, top view.

Negative: fruits, vegetables.

Negatives are used so that the model does not add things to the image that we do not need. We select the best images and move them to the “backgrounds” folder.

We should have something like this in “objects/strawberry”:

And this is how it looks in backgrounds.

Finally, we run the FLIP example file, and we will achieve something like this.

In fact, we can have a file ready to train an object detection model in the Google Cloud Vision API.

‍

In conclusion, the use of synthetic data is becoming a promising solution for labeling artificial intelligence models using images, and deep learning models such as GANs and Autoencoders have been widely used to generate this data. Additionally, new models like DALL-E, Imagen, Midjourney, and Stable Diffusion have been developed, which have the ability to generate images from text inputs.

In particular, Stable Diffusion is an open-source and free machine learning model that enables high-quality digital image generation from natural language descriptions. The generation of synthetic data using these models can significantly reduce the cost and time required to label large datasets, which can improve the efficiency and accuracy of artificial intelligence models.

‍

Synthetic Data Generation: How Artificial Intelligence is Creating Images out of Thin Air