VALSE2020重要主题年度进展回顾二:视觉生成GAN,细粒度识别,自监督学习

2020/08/14 VALSE2020 共 3014 字,约 9 分钟

VALSE2020 重要主题年度进展回顾 (APR)

视觉生成GAN

视觉生成背景介绍

视觉生成的应用:老电影着色,破损照片修复,真人变身漫画,角色AR化,医学研究

判别模型(CNN, SVM) vs 生成模型(VAE, GAN)

视觉生成典型问题及进展

噪声–>图像

图像–>图像

2D图像–>3D图像

文本–>3D图像

图像–>视频

视频–>视频

文本–>视频

视觉生成挑战及未来趋势

讲者(华南理工大学Mingkui Tan)相关工作


细粒度视觉识别

1. By localization-classification subnetworks

先定位(localize) pattern region,然后通过 pattern-level subnetworks 进行 feature fusion,从而更加鲁棒地细粒度特征表示。

  • Mask-CNN (Xiushen Wei)
  • Selective sparse sampling for fine-grained image recognition (ICCV2019, 中科院)
  • Filtration and Distillation (AAAI2020, 中科院)
  • Graph-propagation based correlation learning for weakly supervised fine-grained image classification (AAAI2020, 大连理工)
  • weakly supervised fine-grained image classification via gaussian mixture model oriented discriminative learning (CVPR2020, 大连理工)

2. By end-to-end feature encoding

网络内部的特征交互(cross-layer),针对某些特定任务设计特定的损失函数,从而实现端到端的细粒度识别模型。

  • Bilinear CNN
  • Learning deep bilinear transformation for fine-grained image representation (NIPS2019)
  • cross-X learning for fine-grained visual categorization (ICCV2019)
  • fine-grained recognition: accounting for subtle differences between similar classes (AAAI2020) 通过 diversification block + gradient boosting 使网络聚焦在易错分的类别上
  • fine-grained image-to-image transformation towards visual recognition (CVPR2020) 细粒度图像的GAN

3. By leveraging attention mechanisms

  • look closer to see better (CVPR2017) 多尺度注意力机制
  • learning a mixture of granularity-specific experts for fine-grained categorization (ICCV2019) 多粒度的 experts 系统。
  • attention convolutional binary neural tree for fine-grained visual categorization (CVPR2020) 注意力机制 + 树结构

4. By contrastive learning manners

  • learning attentive pairwise interaction for fine-grained classification (AAAI2020) 输入为一对图片,mutual vector learning + gate vector generation + pairwise interaction

  • channel interaction networks for fine-grained image categorization (AAAI2020)

5. Recognition with web data

  • web-supervised network with softly update-drop training for fine-grained visual classification (AAAI2020) 通过交叉熵检测出噪声大的图像

6. Recognition with limited data

  • piecewise classifier mapping: learning fine-grained learners for novel categories with few examples
  • multi-attention meta learning for few-shot fine-grained image recognition (IJCAI2020)
  • revisiting pose-normalization for fine-grained few-shot recognition (CVPR2020)

7. Fine-grained datasets

  • RPC: A large-scale retial product checkout dataset (arxiv2019)

自监督学习

Background: Self-supervised

预训练模型本身没有直接的应用价值,不管是监督/无监督的自监督学习算法,都是为了下游任务。

自监督好处:

  • human designed labels may be sub-optimal
  • unleash the power of untapped data
  • how human may learn

Self-supervised learning on ImageNet

预训练模型在ImageNet上进行图像分类几乎达到监督学习的精度。

个体/样本分类:对单一图片进行图像增强,对于具有1M张图片的数据集,我们拥有了1MxN张图片,对应了1M个类别。

Self-supervised > Supervised

上图中,自监督学习得到的预训练模型在下游任务中更有效!

上图文章给出的三个结论:

  1. 迁移学习迁移的主要是底层特征而不是高层的语义特征。
  2. 无监督学习可以保持更好的空间位置关系。有监督的预训练模型会损失空间信息。
  3. 提出了更好的监督学习方法

Self-supervised learning via GAN

  • BigBiGAN 2019 (imagenet top1 55.6%)
  • Image GPT 2020 (imagenet top1 69.0%)

Self-supervised learning on Videos

Self-supervised learning on multi-modalities

采样牛的图片和声音,不同的牛虽然RGB图片不同但是叫声是相似的,所以可以通过音频信息辅助视觉信息的学习。

时序上不同的audio和RGB都是负标签。

Future Directions

  1. designing better pretrained models for downstream tasks (as opposed to focusing on ImageNet classification)
  2. multi-modality self-supervised pretraining and applications
  3. smarter ways to automatically collect useful data/labels

文档信息

Search

    Table of Contents