Research/Blog

CellStrat > Research/Blog > Artificial Intelligence > Computer Vision > Photo-Realistic Single Image Super-Resolution using SRGAN

Photo-Realistic Single Image Super-Resolution using SRGAN

February 19, 2020
Posted by: vsinghal
Category: Computer Vision Deep Learning Generative Modeling

1 Comment

#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #AlwaysUpskilling

Last Saturday (15th Feb 2020) our AI Lab Team Lead Abdul Azeez presented a superb hands-on workshop on Super-Resolution GANs or SRGANs as they are called. SRGANs are very useful to increase the resolution in images (or create super-resolved images).

Basic GAN :-

GANs or Generative Adversarial Networks are generative models useful for applications such as style transfer, In-painting, super resolution, content generation etc.

**Applications of GANs**
*Image Source : Image System Laboratory*

As we know, a basic GAN has two neural Networks – the Discriminator (D) and the Generator (G). The Generator attempts to generate images which look like real images. The Discriminator tries to distinguish the generated images from real images. By combined loss minimization of these two neural networks, the entire model trains and eventually reach a state of equilibrium, where the Discriminator no longer can distinguish the fake images, generated by the Generator, from real images.

**GAN Architecture**
*Image Source :https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016*

The dueling nature of Generator and Discriminator may also be considered a minimax or zero-sum game. Initially the Discriminator is winning but eventually the Generator sort of wins where it produced real-like images which the Discriminator fails to tell from real images.

The basic GAN objective function is formulated as :-

Here the Discriminator is trying to maximize the objective function and the Generator is trying to minimize it.

The GAN gradient adjustments may be depicted as follows :-

Basic GANs do suffer from vanishing gradients / non-convergence as well as mode collapse (Generator fails to output diverse samples).

The highly challenging task of estimating a highresolution (HR) image from its low-resolution (LR) counterpart is referred to as super-resolution (SR).

Single Image Super Resolution (SISR) CNNs :-

A basic SISR CNN might consist of :-

**Single Image Super Resolution CNN**
*Image Source :* *https://medium.com/@hirotoschwert/introduction-to-deep-super-resolution-c052d84ce8cf*

For evaluation, Peak Signal-to-Noise Ratio (PSNR, in decibels) and Structural Similarity index (SSIM) are used.

Very Deep Super Resolution (VDSR) employs the similar structure as SRCNN (Super-Res CNN) , but goes deeper to achieve higher accuracy. Both these techniques use bicubic upsampling and deal with feature maps same size as the output.

Shi et al proposed Efficient Sub-Pixel Convolutional Neural Network (ESPCN) to make early SRCNN more efficient. For upsampling sub-pixel convolution (combination of a convolution and a ‘pixel shuffle’ operation) is exploited. Pixel shuffle rearranges the elements of H × W × C · r² tensor to form rH × rW × C tensor. The operation removes the handcrafted bicubic filter from the pipeline with little increase of computation.

**Difference between SRCNN, VDSR, and ESPCN**
*Image Source :* *https://medium.com/@hirotoschwert/introduction-to-deep-super-resolution-c052d84ce8cf*

**Pixel-shuffle operation**
*Image Source :* *https://arxiv.org/pdf/1609.05158.pdf*

SRGANs :-

SRGANs specialize in improving the resolution of the images.

Deep Convolutional GANs can super-resolve images but the finer texture details are often lost at large upscaling factors. An MSE based objective function can help in content loss minimization but still lead to low fidelity at higher resolutions.

The research paper 1609.04802 discusses Single Image Super Resolution GANs which can infer photo-realistic natural images for 4X upscaling factors.

**Left – Super-resolved image. Right – Original image**
*Image Source :* *https://arxiv.org/pdf/1609.04802.pdf*

SRGAN achieves this with a perceptual loss function which is a combination of an adversarial loss and a content loss. The new content loss optimizes perceptual similarity instead of pixel space similarity. The deep residual network of SRGAN is able to recover photo-realistic textures from deep downsampled images. A mean-opinion-score (human panel scoring) shows significant gains in perceptual resolution quality with SRGANs.

From left to right: bicubic interpolation, deep residual network optimized for MSE, deep residual generative adversarial network optimized for a loss more sensitive to human perception, original HR image.
*Image Source :* *https://arxiv.org/pdf/1609.04802.pdf*

The Generator has many residual blocks followed by some Convolutional blocks. The Discriminator tries to mark the generated images as 0 and high-res real images as 1.

**SRGAN Network**
*Image Source :* *https://arxiv.org/pdf/1609.04802.pdf*

We use Parametric ReLU or PReLU which has a learnable slope compared to a Leaky ReLU activation.

The GAN-part of the objective function of an SRGAN is formulated as follows :-

Here I-HR represents high resolution image. I-LR represents low resolution image (instead of Gaussian noise as in a traditional GAN, we pass low-res image as input to Generator). Rest of objective function is similar to the traditional GAN.

The total loss for an SRGAN is referred to as Perceptual Loss and is formulated as :-

The Adversarial Loss part is the traditional GAN loss of Generator and is low-weighted with 0.001 weight. The Content Loss is the new thing here, which can calculated as MSE loss or the VGG loss.

The MSE Content Loss is comparing I-HR with Generated image and taking MSE over this difference.

The VGG Content Loss is taking the feature vector (phi-function) after the j-th convolution (after activation) but before the i-th maxpool layer within a VGG19 network.

The VGG content loss is more invariant to changes in pixel space compared to MSE loss, leading to better perceptual quality. Hence the SRGAN uses VGG loss for content loss.

The Adversarial Loss is the traditional Generator Loss :-

The entire SRGAN network may now be depicted as follows :-

**SRGAN Nework Architecture. It is referred to as ‘SRResNet’ when only MSE loss is used.**
*Image Source :*
*https://medium.com/@hirotoschwert/introduction-to-deep-super-resolution-c052d84ce8cf*

The results of SRGAN are remarkable as shown below :-

*Image Source :* *https://arxiv.org/pdf/1609.04802.pdf*

*Image Source :* *https://github.com/leftthomas/SRGAN*

CellStrat AI Lab continues to create the benchmark in AI innovation in India.

To attend our next AI Lab meetup in Bengaluru, please RSVP below :-

BLR AI Lab meetup :-
Register : https://www.meetup.com/Disrupt-4-0/events/qqmxlrybcdbdc/
Topic : Graphical Neural Networks, Memory Networks
Date : Saturday 22nd Feb 2020, 10:30 AM – 5 PM
Presenters : Pushparaj M., Sujith Kamath

Did you know CellStrat is a leading global AI ML training provider ? Check out our signature AI ML training program, the Post Graduate Certificate in AI and ML, here –
https://www.meraevents.com/event/post-graduate-certificate-in-artificial-intelligence-self-paced-instructor-supported-?ucode=organizer. Learn with us for world-class advanced AI skilling.

See you this Saturday for the AI Lab meetup in BLR ! Lets disrupt the world with AI !

Questions ? Call me at +91-9742800566 !

Best Regards,

Vivek Singhal
Co-Founder & Chief Data Scientist, CellStrat
+91-9742800566

Vishal

February 20, 2020 at 12:57 am Reply

It was very good presentation by Abdul. Even being a non-techie myself, I was able to grasp quite a bit of the concept.

Research/Blog

Photo-Realistic Single Image Super-Resolution using SRGAN

1 Comment

Leave a Reply Cancel reply