Generative Adversial Network(GAN)

11 min readDec 20, 2020

Before we go towards GAN lets first go through Neural Network.

Do you know what is Neural Network?

Neural networks are a set of algorithms, modeled loosely after the human brain, that is designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be translated.

What is Generative Adversial Network(GAN)?

An approach to generative modeling using deep learning methods, such as convolutional neural networks. Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.

GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model that we train to generate new examples, and the discriminator model that tries to classify examples as either real (from the domain) or fake (generated). The two models are trained together in a zero-sum game, adversarial, until the discriminator model is fooled about half the time, meaning the generator model is generating plausible examples.

GANs are an exciting and rapidly changing field, delivering on the promise of generative models in their ability to generate realistic examples across a range of problem domains, most notably in image-to-image translation tasks such as translating photos of summer to winter or day to night, and in generating photorealistic photos of objects, scenes, and people that even humans cannot tell are fake.

How Does GAN works?

One neural network, called the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity; i.e. the discriminator decides whether each instance of data that it reviews belongs to the actual training dataset or not. The generator is creating new, synthetic images that it passes to the discriminator. It does so in the hopes that they, too, will be deemed authentic, even though they are fake. The goal of the generator is to generate passable hand-written digits: to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.

Here are the steps a GAN takes:

The generator takes in random numbers and returns an image.
This generated image is fed into the discriminator alongside a stream of images taken from the actual, ground-truth dataset.
The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.

So you have a double feedback loop:

The discriminator is in a feedback loop with the ground truth of the images, which we know.
The generator is in a feedback loop with the discriminator.

Now, lets define some notation to be used throughout tutorial starting with the discriminator. Let x be data representing an image. D(x) is the discriminator network which outputs the (scalar) probability that xx came from training data rather than the generator. Here, since we are dealing with images the input to D(x) is an image of CHW size 3x64x64. Intuitively, D(x) should be HIGH when x comes from training data and LOW when xx comes from the generator. D(x) can also be thought of as a traditional binary classifier.

For the generator’s notation, let z be a latent space vector sampled from a standard normal distribution. G(z) represents the generator function which maps the latent vector z to data-space. The goal of G is to estimate the distribution that the training data comes from so it can generate fake samples from that estimated distribution

So, D(G(z)) is the probability (scalar) that the output of the generator G is a real image. As described in Goodfellow’s paper D and G play a minimax game in which D tries to maximize the probability it correctly classifies reals and fakes (logD(x), and G tries to minimize the probability that D will predict its outputs are fake (log(1−D(G(x))).

Lets Start with building the Model!!!!!

Inputs

Let’s define some inputs for the run:

dataroot — the path to the root of the dataset folder. We will talk more about the dataset in the next section
workers — the number of worker threads for loading the data with the DataLoader
batch_size — the batch size used in training. The GAN paper uses a batch size of 128
image_size — the spatial size of the images used for training. This implementation defaults to 64x64. If another size is desired, the structures of D and G must be changed.
nc — number of color channels in the input images. For color images this is 3
nz — length of latent vector
ngf — relates to the depth of feature maps carried through the generator
ndf — sets the depth of feature maps propagated through the discriminator
num_epochs — number of training epochs to run. Training for longer will probably lead to better results but will also take much longer
lr — learning rate for training. As described in the GAN paper, this number should be 0.0002
beta1 — beta1 hyperparameter for Adam optimizers. As described in paper, this number should be 0.5
ngpu — number of GPUs available. If this is 0, code will run in CPU mode. If this number is greater than 0 it will run on that number of GPUs

Data

In this tutorial we will use the Celeb-A Faces dataset which can be downloaded at the linked site, or in Google Drive. The dataset will download as a file named img_align_celeba.zip. Once downloaded, create a directory named celeba and extract the zip file into that directory. Then, set the dataroot input for this notebook to the celeba directory you just created. The resulting directory structure should be:

/path/to/celeba
-> img_align_celeba
-> 188242.jpg
-> 173822.jpg
-> 284702.jpg
-> 537394.jpg
…

This is an important step because we will be using the ImageFolder dataset class, which requires there to be subdirectories in the dataset’s root folder. Now, we can create the dataset, create the dataloader, set the device to run on, and finally visualize some of the training data.

Implementation

With our input parameters set and the dataset prepared, we can now get into the implementation. We will start with the weigth initialization strategy, then talk about the generator, discriminator, loss functions, and training loop in detail.

Weight Initialization

From the DCGAN paper, the authors specify that all model weights shall be randomly initialized from a Normal distribution with mean=0, stdev=0.02. The weights_init function takes an initialized model as input and reinitializes all convolutional, convolutional-transpose, and batch normalization layers to meet this criteria. This function is applied to the models immediately after initialization.

Generator

The generator, GG, is designed to map the latent space vector (z) to data-space. Since our data are images, converting z to data-space means ultimately creating a RGB image with the same size as the training images (i.e. 3x64x64). In practice, this is accomplished through a series of strided two dimensional convolutional transpose layers, each paired with a 2d batch norm layer and a relu activation. The output of the generator is fed through a tanh function to return it to the input data range of [−1,1]. It is worth noting the existence of the batch norm functions after the conv-transpose layers, as this is a critical contribution of the GAN paper. These layers help with the flow of gradients during training. An image of the generator from the DCGAN paper is shown below.

Now, we can instantiate the generator and apply the weights_init function. Check out the printed model to see how the generator object is structured.

Discriminator

As mentioned, the discriminator, DD, is a binary classification network that takes an image as input and outputs a scalar probability that the input image is real (as opposed to fake). Here, DD takes a 3x64x64 input image, processes it through a series of Conv2d, BatchNorm2d, and LeakyReLU layers, and outputs the final probability through a Sigmoid activation function. This architecture can be extended with more layers if necessary for the problem, but there is significance to the use of the strided convolution, BatchNorm, and LeakyReLUs. The DCGAN paper mentions it is a good practice to use strided convolution rather than pooling to downsample because it lets the network learn its own pooling function. Also batch norm and leaky relu functions promote healthy gradient flow which is critical for the learning process of both GG and DD.

Discriminator Code:

Now, as with the generator, we can create the discriminator, apply the weights_init function, and print the model’s structure.

Loss Functions and Optimizers

With DD and GG setup, we can specify how they learn through the loss functions and optimizers. We will use the Binary Cross Entropy loss (BCELoss) function which is defined in PyTorch as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=−[yn⋅logxn+(1−yn)⋅log(1−xn)]ℓ(x,y)=L=

{l1,…,lN}⊤,ln=−[yn⋅log⁡xn+(1−yn)⋅log⁡(1−xn)]

Notice how this function provides the calculation of both log components in the objective function (i.e. log(D(x))log(D(x)) and log(1−D(G(z)))log(1−D(G(z)))). We can specify what part of the BCE equation to use with the yy input. This is accomplished in the training loop which is coming up soon, but it is important to understand how we can choose which component we wish to calculate just by changing yy (i.e. GT labels).

Next, we define our real label as 1 and the fake label as 0. These labels will be used when calculating the losses of D and G, and this is also the convention used in the original GAN paper. Finally, we set up two separate optimizers, one for D and one for G. As specified in the GAN paper, both are Adam optimizers with learning rate 0.0002 and Beta1 = 0.5. For keeping track of the generator’s learning progression, we will generate a fixed batch of latent vectors that are drawn from a Gaussian distribution (i.e. fixed_noise) . In the training loop, we will periodically input this fixed_noise into G, and over the iterations we will see images form out of the noise.

Training

Finally, now that we have all of the parts of the GAN framework defined, we can train it.

Part 1: Training the Discriminator

First, we will construct a batch of real samples from the training set, forward pass through D, calculate the loss (log(D(x))), then calculate the gradients in a backward pass. Secondly, we will construct a batch of fake samples with the current generator, forward pass this batch through D, calculate the loss (log(1−D(G(z)))), and accumulate the gradients with a backward pass. Now, with the gradients accumulated from both the all-real and all-fake batches, we call a step of the Discriminator’s optimizer.We want to maximize log(D(x))+log(1−D(G(z))).

Part 2: Training The Generator

we want to train the Generator by minimizing log(1−D(G(z))) in an effort to generate better fakes. n the code we accomplish this by: classifying the Generator output from Part 1 with the Discriminator, computing G’s loss using real labels as GT, computing G’s gradients in a backward pass, and finally updating G’s parameters with an optimizer step. It may seem counter-intuitive to use the real labels as GT labels for the loss function, but this allows us to use the log(x) part of the BCELoss (rather than the log(1−x) part) which is exactly what we want.

Finally, we will do some statistic reporting and at the end of each epoch we will push our fixed_noise batch through the generator to visually track the progress of G’s training. The training statistics reported are:

Loss_D — discriminator loss calculated as the sum of losses for the all real and all fake batches
Loss_G — generator loss calculated as log(D(G(z)))
D(x) — the average output (across the batch) of the discriminator for the all real batch. This should start close to 1 then theoretically converge to 0.5 when G gets better. Think about why this is.
D(G(z)) — average discriminator outputs for the all fake batch. The first number is before D is updated and the second number is after D is updated. These numbers should start near 0 and converge to 0.5 as G gets better. Think about why this is.

Note: This step might take a while, depending on how many epochs you run and if you removed some data from the dataset.

Results

Finally, lets check out how we did. Here, we will look at three different results. First, we will see how D and G’s losses changed during training. Second, we will visualize G’s output on the fixed_noise batch for every epoch. And third, we will look at a batch of real data next to a batch of fake data from G.

Loss versus training iteration

Below is a plot of D & G’s losses versus training iterations.

Visualization of G’s progression

Remember how we saved the generator’s output on the fixed_noise batch after every epoch of training. Now, we can visualize the training progression of G with an animation. Press the play button to start the animation.

Real Images vs. Fake Images

Finally, lets take a look at some real images and fake images side by side.

Conclusion:

We have reached the end of our journey, but there are several places you could go from here.The General Adversial Network Model was build successfully but there is a scope of improvement in this Model specially the generator part as the Plot of the loss function clearly depicts that the Discrimator was able to differentiate between the real and fake images and the genators fails to fool the discriminator. You could:

Train for longer to see how good the results get
Modify this model to take a different dataset and possibly change the size of the images and the model architecture
Improve the Generator Model with better activation function and optimizer

#Path of the image folder
IMAGE_PATH = ‘img_align_celeba’
SAMPLE_PATH = ‘../’

#Number of GPU
ngpu = 1
# Number of Channels
nc = 3
nz = 100
# Size of feature maps in generator
ngf = 64
# Size of feature maps in discriminator
ndf = 64
if not os.path.exists(SAMPLE_PATH):
os.makedirs(SAMPLE_PATH)