Oral Session 1A

December 20,   10:45 AM to 12:45 PM

Chair: Avinash Sharma

          
13 Iterative Gradient Encoding Network Using Feature Co-Occurrence Loss for Single Image Reflection Removal
December 20,   10:45:00 to 11:00:00
Authors: Sutanu Bera (Indian Institute of Technology Kharagpur)*; Prabir Kumar Biswas (IIT Khargpur)
Abstract: Removing undesired reflections from a photo taken in front of a glass is of great importance for enhancing visual computing systems' efficiency. Previous learning-based approaches have produced visually plausible results for some reflections type, however, failed to generalize against other reflection types. There is a dearth of literature for efficient methods concerning single image reflection removal, which can generalize well in large-scale reflection types. In this study, we proposed an iterative gradient encoding network for single image reflection removal. Next, to further supervise the network in learning the correlation between the transmission layer features, we proposed a feature co-occurrence loss. Extensive experiments on the public benchmark dataset of SIR$^2$ demonstrated that our method can remove reflection favorably against the existing state-of-the-art method on all imaging settings, including diverse backgrounds. Moreover, as the reflection strength increases, our method can still remove reflection even where other state-of-the-art methods failed.
Presenting Author: Sutanu Bera
Paper: https://doi.org/10.1145/3490035.3490259
Joining link to attend this talk
December 20,   10:45:00 to 11:00:00
 
82 GPU-based Centroidal Voronoi Tessellation using Local Search on Thinnest Digital Surface
December 20,   11:00:00 to 11:15:00
Authors: Ashutosh Soni (IIT Kharagpur); Piyush Kanti Bhunre (Techno India University, Kolkata); Partha Bhowmick, IIT Kharagpur Bhowmick (IIT Kharagpur)*
Abstract: Voronoi tessellation is a classical geometric problem with different variants and nature of constraints. Centroidal Voronoi tessellation (CVT) is one such leading variant and used in many applications. Although there exists a multitude of CVT techniques for 3D objects considered as discrete volumes, e.g., as sets of voxels, there is no significant work in the literature on CVT computation for voxelized (a.k.a. digital) surfaces. In this paper we focus on this problem and propose a novel GPU-based algorithm for CVT computation on a digital surface. Its novelty rests on several fundamental ideas. Firstly, the digital surface is thinnest by construction; that is, while being voxelized from a triangulated surface, its every triangle is 2-minimal in topological sense. As a result, each voxel of the digital surface has the smallest possible neighborhood, which eventually limits the search space and aids in efficient computation. Secondly, as the optimization constraint, a novel formulation of Voronoi energy is used, which is easy to compute and hence leads to quick convergence of the algorithm. The GPU-based algorithm with related procedures and their complexity analysis have been discussed in detail, and related experimental results have also been furnished to adjudge the merit and usefulness of the proposed algorithm.
Presenting Author: Ashutosh Soni
Paper: https://doi.org/10.1145/3490035.3490279
Joining link to attend this talk
December 20,   11:00:00 to 11:15:00
 
104 Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor
December 20,   11:15:00 to 11:30:00
Authors: Anchit Gupta (IIIT Hyderabad); Faizan Farooq Khan (IIIT, Hyderabad); Rudrabha Mukhopadhyay (IIIT Hyderabad)*; Vinay Namboodiri (University of Bath); C.V. Jawahar (IIIT-Hyderabad)
Abstract: This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experience of video editing without missing out on the benefits of modern synthetic video generation algorithms. This control enables us to lip-sync complex dubbed movie scenes, interviews, television shows, and other visual content. Furthermore, our editor provides features that automatically translate lectures from spoken content, lip-sync of the professor, and background content like slides. While doing so, we also tackle the critical aspect of synchronizing background content with the translated speech. We qualitatively evaluate the usefulness of the proposed editor by conducting human evaluations. Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality. We attach demo videos with the supplementary material clearly explaining the tool and also showcasing multiple results.
Presenting Author: Anchit Gupta
Paper: https://doi.org/10.1145/3490035.3490284
Joining link to attend this talk
December 20,   11:15:00 to 11:30:00
 
106 Automated Tree Generation Using Grammar & Particle System
December 20,   11:30:00 to 11:45:00
Authors: Aryamaan Jain (IIIT Hyderabad)*; Jyoti Sunkara (IIIT Hyderabad); Ishaan N Shah (International Institute of Information Technology, Hyderabad); Avinash Sharma (CVIT, IIIT-Hyderabad); K S Rajan (IIIT Hyderabad)
Abstract: Trees are an integral part of many outdoor scenes and are rendered in a wide variety of computer applications like computer games, movies, simulations, architectural models, AR and VR. This has led to increasing demand for realistic, intuitive, lightweight and easy to produce computer-generated trees. The current approaches at 3D tree generation using a library of trees lack variations in structure and are repetitive. This paper presents an extended grammar-based automated solution for 3D tree generation that can model a wide range of species, both Western and Indian. For the foliage, we adopt a particle system approach that models the leaf, its size, orientation and changes. The proposed solution additionally allows control for individual trees, thus modelling the tree growth variations, changes in foliage across seasons, and leaf structure. This enables the generation of virtual forests with different tree compositions. In addition, a Blender add-on has been developed for use and will be released.
Presenting Author: Aryamaan Jain
Paper: https://doi.org/10.1145/3490035.3490285
Joining link to attend this talk
December 20,   11:30:00 to 11:45:00
 
133 Coarse-to-fine 3D Clothed Human Reconstruction using Peeled Semantic Segmentation Context
December 20,   11:45:00 to 12:00:00
Authors: Snehith G Routhu (International Institute of Inforamtion Technology Hyderabad); Sai Sagar (IIIT Hyderabad)*; Avinash Sharma (CVIT, IIIT-Hyderabad)
Abstract: 3D reconstruction of human body model from a monocular image is an under-constrained and challenging yet desired research problem in computer vision. The parametric body model based monocular shape reconstruction techniques infer the shape and pose parameter of a statistical body model, like SMPL. These methods can not capture the fine-grained body surface geometrical details and loose clothing. Non-parametric techniques like volumetric regression based methods are not computationally scalable and lacks fine-grained geometrical details. Implicit function learning methods recovers fine-grained geometrical details but are computationally expensive. Recently proposed multi-layered shape representation called PeeledHuman attempted a sparse non-parametric 2D representation that can handle severe self-occlusion. However, the key limitation of their PeeledHuman model is that the predicted depth maps of self-occluded parts are sparse and noisy, and hence after back-projection lead to distorted body parts and sometime with discontinuity between them. In this work, proposed to introduce Peeled Segmentation map representation in a coarse-to-fine refinement framework which consist of a cascade three networks namely, PeelGAN, PSegGAN and RefGAN. At first, we use original PeeledHuman as baseline model to predict initial coarse estimation of peeled depth maps from input RGB image. These peeled maps are subsequently fed as input along with monocular RGB image to our novel PSegGAN which predict Peeled Segmentation maps in a generative fashion. Finally, we fed these peeled segmentation maps as additional context along with monocular input image to our RefGAN to predict the refined peeled RBG and Depth maps. This also provide an additional output as 3D semantic segmentation of the reconstructed shape. We perform thorough empirical evaluation over three publicly available datasets to demonstrate superiority of our model.
Presenting Author: Snehith Goud Routhu
Paper: https://doi.org/10.1145/3490035.3490293
Joining link to attend this talk
December 20,   11:45:00 to 12:00:00
 
148 Disparity Based Depth Estimation Using Light Field Camera
December 20,   12:00:00 to 12:15:00
Authors: Suresh Kumar Nehra (Indian Institute of Technology, Kharagpur)*; Tamal Das (National Institute of Technology Agartala); Simantini Chakraborty (National Institute of Technology Agartala); Prabir Kumar Biswas (Indian Institute of Technology Kharagpur); Jayanta Mukhopadhyay (IIT Kharagpur)
Abstract: Light field cameras have a unique feature of capturing the direction of light rays along with its intensity. This additional information is used to estimate the depth of a 3-D scene. Recently a considerable amount of research has been done in depth estimation using light field data. However, these depth estimation methods heavily rely on iterative optimization techniques and do not fully exploit the inherent structured light field data. In this paper, we present a novel three-step disparity based algorithm to estimate accurate depth maps of a light field image. First, an initial depth map of scene points is estimated using principal views by estimating the disparity of segments in the central image. This initial depth helps in resolving ambiguity in depth propagation of refined depth map in Epi-Polar line Images (EPIs). Second, refined depth is estimated at lines using the disparity vector in EPIs. Finally, refined depth at lines in EPIs is propagated to other locations using the initial depth map. We also provided a synthetic data-set having the inherent characteristic of a light field. We have tested our approach on a variety of real-world scenes captured with Lytro Illum camera and also on synthetic images. The proposed method outperforms several state-of-the-art algorithms.
Presenting Author: Suresh Kumar Nehra
Paper: https://doi.org/10.1145/3490035.3490297
Joining link to attend this talk
December 20,   12:00:00 to 12:15:00
 
154 Neural View Synthesis and Appearance Editing from Unstructured Images
December 20,   12:15:00 to 12:30:00
Authors: Pulkit Gera (CVIT,IIIT)*; Aakash KT (IIIT Hyderabad); Dhawal S (IIIT Hyderabad); P. J. Narayanan (IIIT-Hyderabad)
Abstract: We present a neural rendering framework for simultaneous view synthesis and appearance editing of a scene with known environmental illumination captured using a mobile camera. Existing approaches either achieve view synthesis alone or view synthesis along with relighting, without control over the scene's appearance. Our approach explicitly disentangles the appearance and learns a lighting representation that is independent of it. Specifically, we jointly learn the scene appearance and a lighting-only representation of the scene. Such disentanglement allows our approach to generalize to arbitrary changes in appearance while performing view synthesis. We show results of editing the appearance of real scenes in interesting and non-trivial ways. The performance of our view synthesis approach is on par with state-of-the-art approaches on both real and synthetic data.
Presenting Author: Pulkit Gera
Lab/Author homepage: https://cvit.iiit.ac.in/
Paper: https://doi.org/10.1145/3490035.3490299
Joining link to attend this talk
December 20,   12:15:00 to 12:30:00
 
163 Attention Guided Complementary Feature Integration for Latent Image Recovery From Noisy/Blurry Pairs
December 20,   12:30:00 to 12:45:00
Authors: Green Rosh K S (Samsung Research Institute Bangalore)*; Sachin Lomte (Samsung Research Institute Bangalore); Nikhil Krishnan (Samsung Research India Bangalore); B H Pawan Prasad (Samsung Research)
Abstract: Low-light imaging using a hand-held camera is a challenging task due to excessive noise in the scene. Most of the existing methods try to address this problem either by denoising an image captured using high ISO or by deblurring an image captured using long exposure time. However these methods use a single image to estimate the latent scene and hence fail to leverage the complimentary informationavailable in the scene. In this paper, we propose a method to estimate the latent image using a pair of images captured using high ISO and high exposure time respectively, to leverage the complimentary information present in the two captures. We propose a novel deep learning based method to efficiently extract and integrate the information present in the two images. Contrary to other methods, we use separate filters to extract the complimentary information from thetwo images. We also progressively integrate the extracted features using a novel attention-guided mechanism. Further, we address the spatially varying nature and localization of motion blur in real life captures by using spatial attention layers. The proposed method achieves state-of-the-art performance against single as well as other noisy/blurry approaches to the problem. We also show that the networklearns spatial attention maps with strong correlation to the blur in the scene, and thus the proposed method is more interpretable and easier to analyze.
Presenting Author: Green Rosh K S
Paper: https://doi.org/10.1145/3490035.3490302
Joining link to attend this talk
December 20,   12:30:00 to 12:45:00

    
December 20December 21December 22
Session 1A Session 2A Session 3A
Session 1B Session 2B Session 3B
Session P1 Session P2 Vision India
Plenary 1 Plenary 3 Plenary 4
Plenary 2    
List of Accepted Papers
Conference Program