WEBINAR | AI Prototype to Production: Operationalizing and Orchestrating AI
January 21, 2021

Top 3 Papers from NeurIPS 2020

Table of Contents:

Neural Information Processing Systems annual meeting (better known as NeurIPS) is widely regarded as the world's most important AI conference. Every year,  researchers working on neural information processing systems present their findings in areas related biological, technological, mathematical, and theoretical sciences. Here are our favorite papers that were presented at NeurIPS 2020.



Is normalization indispensable for training deep neural network?

When training a classifier or detector, batchnorm is always applied to help with solving the problem of vanishing/exploding variables, however batchnorm can lead to worse model performance if training with very small batch size. Limited by available gpu memory, if we need to train with small batch size, normally we'd adopt synchronized batchnorm so that statistics of batchnorm can be aggregated over multiple gpus to mitigate the small batch size problem, with a cost of longer training time. The synchronization between gpus is not as efficient, it lowers gpu utilization.

This paper tackles the problem of vanishing/exploding variables in a new way - a new residual operation 'RescaleNet' which adds one more hparam to stabilize ResNet-like layer. By redefining residual connection (minimal changes to original residual operation), we see in the detection/segmentation experimentation table that RescaleNet performs better than models trained with BatchNorm.

Uncertainty-aware Self-training for Few-shot Text Classification

Pseudo labeling has been widely used for model training in semi-supervised learning. Given small amount of labeled data for each class, a good pre-trained base model and a large pool of unlabeled data, make better use of all data by doing self-learning. The authors propose three key ideas: better uncertainty estimation on the pre-trained model (teacher model), better sample selection based on the uncertainty estimation (teacher model), confident learning (emphasize more on low variance examples) on student side.

The authors use Monte Carlo Dropout for uncertainty estimation - turns the deep neural net inference to bayesian interpretation. Hard example selection then uses the results from uncertainty estimation which chooses the most and least confused examples to use for self-training.

Though the proposed method has only been evaluated on text models, We think this has the potential to make a difference in computer vision related tasks as well.

Distribution Matching for Crowd Counting

Crowd counting - if viewed as a workflow application, it can be done by detect and count, however this does not lead to good performance because detectors are not prone to heavy occlusion. However if it is viewed as a use case that needs a model designed for it, there are multiple ways to tackle it - regression model that takes in pixels and outputs count, density map estimation, distribution matching.

This paper views crowd counting as distribution matching problem, it takes in image and outputs a map of density values, sum up the density map to get the final count estimate. The approach described by the authors can be applied to any network architecture. By applying the counting loss, the optimal transport loss and the total variation loss all together, the model outperforms the state-of-the-art methods by a large margin, especially on the large-scale and challenging datasets.