publications
2022
- ECCVS3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental LearningJayateja Kalla, and Soma Biswas2022
Few-shot class-incremental learning (FSCIL) aims to learn progressively about new classes with very few labeled samples, without forgetting the knowledge of already learnt classes. FSCIL suffers from two major challenges: (i) over-fitting on the new classes due to limited amount of data, (ii) catastrophically forgetting about the old classes due to unavailability of data from these classes in the incremental stages. In this work, we propose a self-supervised stochastic classifier (S3C) (code: https://github.com/JAYATEJAK/S3C) to counter both these challenges in FSCIL. The stochasticity of the classifier weights (or class prototypes) not only mitigates the adverse effect of absence of large number of samples of the new classes, but also the absence of samples from previously learnt classes during the incremental steps. This is complemented by the self-supervision component, which helps to learn features from the base classes which generalize well to unseen classes that are encountered in future, thus reducing catastrophic forgetting. Extensive evaluation on three benchmark datasets using multiple evaluation metrics show the effectiveness of the proposed framework. We also experiment on two additional realistic scenarios of FSCIL, namely where the number of annotated data available for each of the new classes can be different, and also where the number of base classes is much lesser, and show that the proposed S3C performs significantly better than the state-of-the-art for all these challenging scenarios.
- ECCVWImproved Cross-Dataset Facial Expression Recognition by Handling Data Imbalance and Feature ConfusionManogna Sreenivas, and Soma Biswas2022
Facial Expression Recognition (FER) models trained on one dataset (source) usually do not perform well on a different dataset (tar- get) due to the implicit domain shift between different datasets. In ad- dition, FER data is naturally highly imbalanced, with a majority of the samples belonging to few expressions like neutral, happy and rel- atively fewer samples coming from expressions like disgust, fear, etc., which makes the FER task even more challenging. This class imbalance of the source and target data (which may be different), along with other factors like similarity of few expressions, etc., can result in unsatisfactory target classification performance due to confusion between the different classes. In this work, we propose an integrated module, termed DIFC, which can not only handle the source Data Imbalance, but also the Feature Confusion of the target data for improved classification of the target expressions.We integrate this DIFC module with an existing Un- supervised Domain Adaptation (UDA) approach to handle the domain shift and show that the proposed simple yet effective module can result in significant performance improvement on four benchmark datasets for Cross-Dataset FER (CD-FER) task. We also show that the proposed module works across different architectures and can be used with other UDA baselines to further boost their performance.
2021
- ICVGIPSelective Mixing and Voting Network for Semi-Supervised Domain GeneralizationAhmad Arfeen, Titir Dutta, and Soma BiswasIn Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing 2021
Domain generalization (DG) addresses the problem of generalizing classification performance across any unknown domain, by leveraging training samples from multiple source domains. Currently, the training process of the state-of-the-art DG-methods is dependent on a large amount of labeled data. This restricts the application of the models in many real-world scenarios, where collecting and annotating a large dataset is an expensive and difficult task. Thus, in this paper, we address the problem of Semi-supervised Domain Generalization (SSDG), where the training set contains only a few labeled data, in addition to a large number of unlabeled data from multiple domains. This is relatively unexplored in literature and poses a considerable challenge to the state-of-the-art DG models, since their performance degrades under such condition. To address this scenario, we propose a novel Selective Mixing and Voting Network (SMV-Net), which effectively extracts useful knowledge from the set of unlabeled training data, available to the model. Specifically, we propose a mixing strategy on selected unlabeled samples on which the model is confident about their predicted class labels to achieve a domain-invariant representation of the data, which generalizes effectively across any unseen domain. Secondly, we also propose a voting module, which not only improves the generalization capability of the classifier, but can also comment on the prediction of the test samples, using references from a few labeled training examples, despite of their domain gap. Finally, we introduce a test-time mixing strategy to re-look at the top class-predictions and re-order them if required to further boost the classification performance. Extensive experiments on two popular DG-datasets demonstrate the usefulness of the proposed framework.
- ICVGIPFew-Shot Classification without Forgetting of Event-Camera DataAnik Goyal, and Soma BiswasIn Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing 2021
Event-based cameras can capture changes in brightness in the form of asynchronous events, unlike traditional cameras, which has sparked tremendous interest due to their wide range of applications. In this work, we address for the first time in literature, the task of few-shot classification of event data without forgetting the base classes on which it has been initially trained. This not only relaxes the constraint of data availability from all possible classes before the initial model is trained, but also the constraint of capturing large amounts of training data for each of the classes we want to classify. The proposed framework has three main stages: First, we train the base classifier by augmenting the original event data using a data mixing technique, so that the feature extractor can better generalize to unseen classes. We also utilize an adaptive semantic similarity between the classifier weights. This guarantees that the margin between similar classes is greater than that between dissimilar classes which in turn reduces confusion between similar classes. Second, weight imprinting is employed to learn the initial classifier weights for the new classes with few examples. Finally, we finetune the entire framework using a class-imbalance aware loss in an end-to-end manner. This is accomplished by converting the event data via a series of differentiable operations, which are then fed into our network. Extensive experiments on few-shot versions of two standard event-camera datasets justify the effectiveness of the proposed framework. We believe that this study will serve as a solid foundation for future work in this critical field.
- ICVGIPCT-DANN: Co-Teaching Meets DANN for Wild Unsupervised Domain AdaptationRahul Bansal, and Soma BiswasIn Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing 2021
Unsupervised domain adaptation aims at leveraging supervision from an annotated source domain for performing tasks like classification/segmentation on an unsupervised target domain. However, a large enough related dataset with clean annotations may not be always available in real scenarios, since annotations are usually obtained from crowd sourcing, and thus are noisy. Here, we consider a more realistic and challenging setting, wild unsupervised domain adaptation (WUDA), where the source domain annotations can be noisy. Standard domain adaptation approaches which directly use these noisy source labels and the unlabeled targets for the domain adaptation task perform poorly, due to severe negative transfer from the noisy source domain. In this work, we propose a novel end-to-end framework, termed CT-DANN (Co-teaching meets DANN), which seamlessly integrates a state-of-the-art approach for handling noisy labels (Co-teaching) with a standard domain adaptation framework (DANN). CT-DANN effectively utilizes all the source samples after accounting for both their noisy labels as well as transferability with respect to the target domain. Extensive experiments on three benchmark datasets with different types and levels of noise and comparison with state-of-the-art WUDA approach justify the effectiveness of the proposed framework.
- PRLSML: Semantic meta-learning for few-shot semantic segmentation☆Ayyappa Kumar Pambala, Titir Dutta, and Soma BiswasPattern Recognition Letters 2021
The significant amount of training data required for training Convolutional Neural Networks has become a bottleneck for applications like semantic segmentation. Few-shot semantic segmentation algorithms address this problem, with an aim to achieve good performance in the low-data regime, with few annotated training images. Recent approaches based on class-prototypes computed from available training data have achieved immense success for this task. In this work, we propose a novel meta-learning framework, Semantic Meta-Learning (SML), which incorporates class level semantic descriptions in the generated prototypes for this problem. In addition, we propose to use the well-established technique, ridge regression, to not only bring in the class-level semantic information, but also to effectively utilise the information available from multiple images present in the training data for prototype computation. This has a simple closed-form solution, and thus can be implemented easily and efficiently. Extensive experiments on the benchmark PASCAL-5i dataset under different experimental settings demonstrate the effectiveness of the proposed framework.