Oral Session 3B

December 22,   1:30 PM to 2:30 PM

Chair: Venkat Ramana Peddigari

4 Teaching GANs to Sketch in Vector Format
December 22,   13:30:00 to 13:45:00
Authors: Varshaneya V (Sri Sathya Sai Institute of Higher Learning)*; Balasubramanian S (SSSIHL); Vineeth Balasubramanian (Indian Institute of Technology Hyderabad)
Abstract: Sketching is a fundamental human cognitive ability. Deep Neural Networks (DNNs) have achieved the state-of-the-art performance in recognition tasks like image recognition, speech recognition etc. but have not made significant progress in generating stroke-based sketches a.k.a sketches in vector format. Though there are Variational Auto Encoders (VAEs) for generating sketches in vector format, there is no Generative Adversarial Network (GAN) architecture for the same. In this paper, we propose a standalone GAN architecture called SkeGAN and a hybrid VAE-GAN architecture called VASkeGAN, for sketch generation in vector format. SkeGAN is a stochastic policy in Reinforcement Learning (RL), capable of generating both multidimensional continuous and discrete outputs. VASkeGAN draws sketches by coupling the efficient representation of data by VAE with the powerful generating capabilities of GAN. We have validated that SkeGAN and VASkeGAN generate visually appealing sketches with minimal scribble effect and is comparable to a recent work titled Sketch-RNN.
Presenting Author: Varshaneya V
Lab/Author homepage: https://www.sssihl.edu.in/departments/mathematics-computer-science/#1580992880554-d1d68989-9aea
Code: https://github.com/varshaneya/teaching-GANS-to-sketch-in-vector-format
Paper: https://doi.org/10.1145/3490035.3490258
Joining link to attend this talk
December 22,   13:30:00 to 13:45:00
39 HDRVideo-GAN: Deep Generative HDR Video Reconstruction
December 22,   13:45:00 to 14:00:00
Authors: Mrinal Anand (IIT Gandhinagar)*; Nidhin Harilal (Indian Institute of Technology Gandhinagar); Chandan Kumar (Indian Institute of Technology Gandhinagar); Shanmuganathan Raman (Indian Institute of Technology (IIT) Gandhinagar)
Abstract: High dynamic range (HDR) videos provide a more visually realistic experience than the standard low dynamic range (LDR) videos. Despite having significant progress in HDR imaging, it is still a challenging task to capture high-quality HDR video with a conventional off-the-shelf camera. Existing approaches rely entirely on using dense optical flow between the neighboring LDR sequences to reconstruct an HDR frame. However, they lead to inconsistencies in color and exposure over time when applied to alternating exposures with noisy frames. In this paper, we propose an end-to-end GAN-based framework for HDR video reconstruction from LDR sequences with alternating exposures. We first extract clean LDR frames from noisy LDR video with alternating exposures with a denoising network trained in a self-supervised setting. Using optical flow, we then align the neighboring alternating-exposure frames to a reference frame and then reconstruct high-quality HDR frames in a complete adversarial setting. To further improve the robustness and quality of generated frames, we incorporate temporal stability-based regularization term along with content and style-based losses in the cost function during the training procedure. Experimental results demonstrate that our framework achieves state-of-the-art performance and generates superior quality HDR frames of a video over the existing methods.
Presenting Author: Mrinal Anand
Paper: https://doi.org/10.1145/3490035.3490266
Joining link to attend this talk
December 22,   13:45:00 to 14:00:00
98 G3AN++: Exploring Wide GANs with Complementary Feature Learning for Video Generation
December 22,   14:00:00 to 14:15:00
Authors: Sonam Gupta (IIT Madras)*; Arti Keshari (Indian Institute of Technology, Madras); Sukhendu Das (Indian Institute of Technology, Madras)
Abstract: Video generation task is a challenging problem which involves the modelling of complex real-world dynamics. Most of the existing methods have designed deep networks to tackle high-dimensional video data distributions. However, the utilization of wider networks is still under-explored. Inspired by the success of wide networks in image recognition literature, we present G3AN++, a three-stream generative adversarial network for video generation. The three streams are spatial, temporal and spatio-temporal processing branches. In pursuit of improving the quality of video generation, we make our network wider by splitting the spatial stream into two parallel identical branches learning complementary feature representations. We further introduce a novel adaptive masking branch to impose the complementary constraint. The masking branch encourages the parallel branches to learn distinct and richer visual features. Extensive quantitative and qualitative analysis demonstrates that our model outperforms the existing state-of-the-art methods by a significant margin on Weizmann Action, UvA-Nemo Smile and UCF101 Action datasets. Additional exploration reveals that G3AN++ is capable of disentangling appearance and motion. We also show that the proposed method can be easily extended to solve the hard task of text-to-video generation.
Presenting Author: Arti Keshari
Lab/Author homepage: http://www.cse.iitm.ac.in/~vplab/
Code: https://github.com/GuptaSonam/G3ANpp-Exploring-Wide-GANs-with-Complementary-Feature-Learning-for-Video-Generation
Paper: https://doi.org/10.1145/3490035.3490282
Joining link to attend this talk
December 22,   14:00:00 to 14:15:00
160 Feature Generation for Long-tail Classification
December 22,   14:15:00 to 14:30:00
Authors: Rahul Vigneswaran K (Indian Institute of Technology, Hyderabad)*; Marc T Law (NVIDIA); Vineeth N Balasubramanian (Indian Institute of Technology, Hyderabad); Makarand Tapaswi (Wadhwani AI, IIIT Hyderbad)
Abstract: The visual world naturally exhibits an imbalance in the number of object or scene instances resulting in a long-tailed distribution. This imbalance poses significant challenges for classification models based on deep learning. Oversampling instances of the tail classes attempts to solve this imbalance. However, the limited visual diversity results in a network with poor representation ability. A simple counter to this is decoupling the representation and classifier networks and using oversampling only to train the classifier.In this paper, instead of repeatedly re-sampling the same image (and thereby features), we explore a direction that attempts to generate meaningful features by estimating the tail category's distribution. Inspired by ideas from recent work on few-shot learning, we create calibrated distributions to sample additional features that are subsequently used to train the classifier. Through several experiments on the CIFAR-100-LT (long-tail) dataset with varying imbalance factors and on mini-ImageNet-LT (long-tail), we show the efficacy of our approach and establish a new state-of-the-art. We also present a qualitative analysis of generated features using t-SNE visualizations and analyze the nearest neighbors used to calibrate the tail class distributions. Our code is available at https://github.com/rahulvigneswaran/TailCalibX.
Presenting Author: Rahul Vigneswaran K
Lab/Author homepage: https://lab1055.github.io/
Code: https://github.com/rahulvigneswaran/TailCalibX
Paper: https://doi.org/10.1145/3490035.3490300
Joining link to attend this talk
December 22,   14:15:00 to 14:30:00

December 20December 21December 22
Session 1A Session 2A Session 3A
Session 1B Session 2B Session 3B
Session P1 Session P2 Vision India
Plenary 1 Plenary 3 Plenary 4
Plenary 2    
List of Accepted Papers
Conference Program