Oral Session 2A

December 21,   10:45 AM to 12:45 PM

Chair: Ravi Kiran S

18 MSDNet: A Novel Multi-Stage progressive image Dehazing Network
December 21,   10:45:00 to 11:00:00
Authors: CHIPPY M MANU (College of engineering trivandrum, Kerala, India)*; Sreeni K G (College of Engineering, Trivandrum)
Abstract: This paper presents a novel algorithm to dehaze a given hazy input image using a Multi-Stage progressive image Dehazing Network (MSDNet) architecture. The proposed multi-stage strategy frame-work splits the challenge of image recovery into sub-tasks for a degraded image to be gradually restored. The network is trained with images from two benchmark datasets - RESIDE and NTIRE2021. Experimental evaluations have been conducted with several hazy images from different datasets. The performance evaluation of MSDNet is done using various metrics such as Peak-Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measurement (SSIM), Feature Similarity Metric (FSIM), Visibility Index (VI), Realness Index (RI), Fog Aware Density Evaluator (FADE), Naturalness ImageQuality Evaluator (NIQE) and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). An average PSNR of 27.687, 36.25, 34.98, 20.52, and 20.97 have been obtained with images from ITS,SOTS indoor, SOTS outdoor, Dense-Haze, and NH-Haze datasets, respectively. The experimental results reveals the enhanced performance of MSDNet compared to other state-of-the-art techniques.
Presenting Author: Chippy M Manu
Paper: https://doi.org/10.1145/3490035.3490261
Joining link to attend this talk
December 21,   10:45:00 to 11:00:00
29 Monocular Multi-Layer Layout Estimation for Warehouse Racks
December 21,   11:00:00 to 11:15:00
Authors: Meher Shashwat Nigam (IIIT Hyderabad)*; Avinash P Prabhu (IIIT Hyderabad); Anurag Sahu (IIIT Hyderabad); Tanvi Karandikar (IIIT Hyderabad); Puru Gupta (International Institute of Information Technology, Hyderabad ); Sai Shankar Narasimhan (IIIT Hyderabad); Ravi Kiran Sarvadevabhatla (IIIT Hyderabad); Madhava Krishna (IIIT-Hyderabad)
Abstract: Given a monocular colour image of a warehouse rack, we aim to predict the bird's-eye view layout for each shelf in the rack, which we term as multi-layer layout prediction. To this end, we present RackLay, a deep neural network for real-time shelf layout estimation from a single image. Unlike previous layout estimation methods, which provide a single layout for the dominant ground plane alone, RackLay estimates the top-view and front-view layout for each shelf in the considered rack populated with objects. RackLay's architecture and its variants are versatile and estimate accurate layouts for diverse scenes characterized by varying number of visible shelves in an image, large range in shelf occupancy factor and varied background clutter. Given the extreme paucity of datasets in this space and the difficulty involved in acquiring real data from warehouses, we additionally release a flexible synthetic dataset generation pipeline WareSynth which allows users to control the generation process and tailor the dataset according to contingent application. The ablations across architectural variants and comparison with strong prior baselines vindicate the efficacy of RackLay as an apt architecture for the novel problem of multi-layered layout estimation. We also show that fusing the top-view and front-view enables 3D reasoning applications such as metric free space estimation for the considered rack.
Presenting Author: Anurag Sahu
Lab/Author homepage: https://robotics.iiit.ac.in/
Code: https://github.com/Avinash2468/Layout_Estimation
Dataset: https://anuragsahu.github.io/WareSynth/
Paper: https://doi.org/10.1145/3490035.3490263
Joining link to attend this talk
December 21,   11:00:00 to 11:15:00
34 Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification
December 21,   11:15:00 to 11:30:00
Authors: T M Feroz Ali (Indian Institute of Technology Bombay, Mumbai)*; Subhasis Chaudhuri (Indian Institute of Technology Bombay)
Abstract: Person re-identification is the task of matching pedestrian images across non-overlapping cameras. In this paper, we propose an efficient kernel based similarity metric learning for learning non-linear features using small scale training data for practical person re-ID systems. The method employs non-linear mappings combined with cross-view discriminative subspace learning and cross-view distance metric learning based on pairwise similarity constraints. It is a natural extension of Cross-view Quadratic Discriminant Analysis (XQDA) from linear to non-linear model using kernels. In addition to outperforming XQDA, the proposed method is computationally very efficient compared to its baselines. Extensive experiments on four benchmark datasets show that our method attains competitive performance against state-of-the-art methods. Our code is available at https://github.com/ferozalitm/Efficient-Kernel-XQDA.
Presenting Author: T M Feroz Ali
Lab/Author homepage: http://www.ee.iitb.ac.in/~viplab/
Code: https://github.com/ferozalitm/Efficient-Kernel-XQDA
Paper: https://doi.org/10.1145/3490035.3490264
Joining link to attend this talk
December 21,   11:15:00 to 11:30:00
40 A Novel Unsupervised Thresholding Technique for Landsat Image Change Detection
December 21,   11:30:00 to 11:45:00
Authors: Neha Gupta (VIT-AP University)*
Abstract: Thresholding is the most widely used change detection technique for identifying the changes in remote sensing images. However, most of the thresholding methods would generate isolated spots in the final change map, which are reduced by applying postprocessing step. This paper proposes a novel thresholding technique to address the aforesaid problem without applying the postprocessisng operation. The proposed technique uses two thresholds simultaneously to generate the change map, which is the key point of thismethod. Above-stated thresholds are calculated by initial indicators that are generated by local neighborhood mutual information. In addition, this approach compares the local statistics of the interested pixel with the derived thresholdsinstead of the pixel itself, which enhances the robustness of this technique. Particularly, the method resides on three things: generation of initial indicators, use of two thresholds simultaneously, and comparison of local statistics with thresholds. Finally, the method is tested on multitemporal multispectral images of different sensors, and experimental results validate the effectiveness of the proposed method.
Presenting Author: Neha Gupta
Paper: https://doi.org/10.1145/3490035.3490267
Joining link to attend this talk
December 21,   11:30:00 to 11:45:00
97 SCNet: A Generalized Attention-based Model for Crack Fault Segmentation
December 21,   11:45:00 to 12:00:00
Authors: Hrishikesh Sharma (Tata Consultancy Services Ltd.); Prakhar Pradhan (TCS Research)*; Balamuralidhar P ( Tata Consultancy Services)
Abstract: Anomaly detection and localization is an important vision problem, having multiple applications. Effective and generic semantic segmentation of anomalous regions on various different surfaces, where most anomalous regions inherently do not have any obvious pattern, is an active research problem. Periodic health monitoring and fault (anomaly) detection in vast infrastructures, which is an important safety-related task, is one such application area of anomaly segmentation. However, this task is quite challenging due to large variations in surface faults, texture-less construction material/background, lighting conditions etc. Cracks are critical and frequent surface faults that manifest as extreme zigzag-shaped thin, elongated regions. They are among the hardest faults to detect. In this work, we address an open aspect of crack segmentation problem, that of generalizing and improving the performance of segmentation across a variety of scenarios. We carefully study and abstract the sub-problems involved and solve them in a broader context, making our solution generic. On a variety of datasets related to surveillance of different infrastructures, under varying conditions, our model consistently outperforms the state-of-the-art algorithms by a significant margin, without any bells-and-whistles. The model is expected to show similar improved performance on (indoor) material quality inspection tasks as well.
Presenting Author: Prakhar Pradhan
Paper: https://doi.org/10.1145/3490035.3490281
Joining link to attend this talk
December 21,   11:45:00 to 12:00:00
120 Enhancing Label Transfer in Non-Parametric Scene Parsing by Superpixel-Based Dense Alignment
December 21,   12:00:00 to 12:15:00
Authors: Alexy Bhowmick (Assam Don Bosco University)*; Sarat Saharia (TEZPUR UNIVERSITY); Shyamanta Hazarika (IIT Guwahati)
Abstract: Contemporary (parametric) scene parsing methods are learning-based and mostly operate in a closed-universe scenario. We introduce a non-parametric scene parsing framework that is model-free, data-driven, and scales naturally to growing data. The scene parsing performance in the non-parametric approach depends on reliable dense correspondence or alignment across scenes for label transfer. Incorrect correspondence is known to adversely affect the scene parsing results. We propose a label transfer approach that relies on the dense correspondence of super-pixel pairs (in a query and candidate image) matched by a homogeneous kernel map to guide semantic label transfer. The aggregation (fusing) of multiple labels is done through a simple heuristic aggregation scheme (simple majority voting). The Markov Random Field (MRF) provides a principled probabilistic framework for combining the disparate information in the smoothing stage and ensures plausible labeling results. Evaluation results show that our non-parametric system obtains competitive scene parsing performance on the standard SIFT Flow and MSRC-21 datasets.
Presenting Author: Alexy Bhowmick
Lab/Author homepage: https://iitg.ac.in/lab/brail/index.php
Paper: https://doi.org/10.1145/3490035.3490290
Joining link to attend this talk
December 21,   12:00:00 to 12:15:00
128 Chart Classification : An Empirical Comparative Study of Different Learning Models
December 21,   12:15:00 to 12:30:00
Authors: Jennil Thiyam (IIT Guwahati)*; Ranbir Singh Sanasam (Indian Institute of Technology Guwahati); Prabin Bora (IIT Guwahati)
Abstract: Charts are powerful tools for visualizing and comparing data. Representation of information through charts grows with time due to its easy and aesthetically attractive structure. With the increase in the number of documents with various chart types, chart classification has become an important task for downstream applications such as chart data recovery, chart replenishment, etc. Though there have been various studies reported in the literature on chart classification using different classification methods, three of the important concerns are small dataset size, a small number of chart types, and inconsistencies in the performance reported in different studies. Motivated by the above concerns, this paper curates a large dataset of real chart images (110k samples) with a large number of chart types (24 charts types) and evaluates 21 different machine learning models. To the best of our knowledge, this is the largest (in sample size and chart types) real chart dataset reported in the literature to date. We further study - (i) the effect of dataset size on the classification model, (ii) the nature of chart noises and their influences on classification performance, and (iii) confusing chart pairs leading to misclassification.
Presenting Author: Jennil Thiyam
Paper: https://doi.org/10.1145/3490035.3490291
Joining link to attend this talk
December 21,   12:15:00 to 12:30:00
177 Few-shot Classification Without Forgetting of Event-Camera Data
December 21,   12:30:00 to 12:45:00
Authors: Anik Goyal (Indian Institute of Science); Soma Biswas (Indian Institute of Science, Bangalore)*
Abstract: Event-based cameras can capture changes in brightness in the form of asynchronous events, unlike traditional cameras, which has sparked tremendous interest due to their wide range of applications. In this work, we address for the first time in literature, the task of few-shot classification of event data without forgetting the base classes on which it has been initially trained. This not only relaxes the constraint of data availability from all possible classes before the initial model is trained, but also the constraint of capturing large amounts of training data for each of the classes we want to classify. The proposed framework has three main stages: First, we train the base classifier by augmenting the original event data using a data mixing technique, so that the feature extractor can better generalize to unseen classes. We also utilize an adaptive semantic similarity between the classifier weights. This guarantees that the margin between similar classes is greater than that between dissimilar classes which in turn reduces confusion between similar classes. Second, weight imprinting is employed to learn the initial classifier weights for the new classes with few examples. Finally, we finetune the entire framework using a class-imbalance aware loss in an end-to-end manner. This is accomplished by converting the event data via a series of differentiable operations, which are then fed into our network. Extensive experiments on few-shot versions of two standard event-camera datasets justify the effectiveness of the proposed framework. We believe that this study will serve as a solid foundation for future work in this critical field.
Presenting Author: Anik Goyal
Paper: https://doi.org/10.1145/3490035.3490304
Joining link to attend this talk
December 21,   12:30:00 to 12:45:00

December 20December 21December 22
Session 1A Session 2A Session 3A
Session 1B Session 2B Session 3B
Session P1 Session P2 Vision India
Plenary 1 Plenary 3 Plenary 4
Plenary 2    
List of Accepted Papers
Conference Program