VALSE2020 重要主题年度进展回顾 (APR)

视觉生成GAN

视觉生成背景介绍

视觉生成的应用：老电影着色，破损照片修复，真人变身漫画，角色AR化，医学研究

判别模型（CNN, SVM） vs 生成模型（VAE, GAN）

视觉生成典型问题及进展

噪声–>图像

图像–>图像

2D图像–>3D图像

文本–>3D图像

图像–>视频

视频–>视频

文本–>视频

视觉生成挑战及未来趋势

讲者（华南理工大学Mingkui Tan）相关工作

细粒度视觉识别

1. By localization-classification subnetworks

先定位(localize) pattern region，然后通过 pattern-level subnetworks 进行 feature fusion，从而更加鲁棒地细粒度特征表示。

Mask-CNN (Xiushen Wei)
Selective sparse sampling for fine-grained image recognition (ICCV2019, 中科院)
Filtration and Distillation (AAAI2020, 中科院)
Graph-propagation based correlation learning for weakly supervised fine-grained image classification (AAAI2020, 大连理工)
weakly supervised fine-grained image classification via gaussian mixture model oriented discriminative learning (CVPR2020, 大连理工)

2. By end-to-end feature encoding

网络内部的特征交互（cross-layer），针对某些特定任务设计特定的损失函数，从而实现端到端的细粒度识别模型。

Bilinear CNN
Learning deep bilinear transformation for fine-grained image representation (NIPS2019)
cross-X learning for fine-grained visual categorization (ICCV2019)
fine-grained recognition: accounting for subtle differences between similar classes (AAAI2020) 通过 diversification block + gradient boosting 使网络聚焦在易错分的类别上
fine-grained image-to-image transformation towards visual recognition (CVPR2020) 细粒度图像的GAN

3. By leveraging attention mechanisms

look closer to see better (CVPR2017) 多尺度注意力机制
learning a mixture of granularity-specific experts for fine-grained categorization (ICCV2019) 多粒度的 experts 系统。
attention convolutional binary neural tree for fine-grained visual categorization (CVPR2020) 注意力机制 + 树结构

4. By contrastive learning manners

learning attentive pairwise interaction for fine-grained classification (AAAI2020) 输入为一对图片，mutual vector learning + gate vector generation + pairwise interaction
channel interaction networks for fine-grained image categorization (AAAI2020)

5. Recognition with web data

web-supervised network with softly update-drop training for fine-grained visual classification (AAAI2020) 通过交叉熵检测出噪声大的图像

6. Recognition with limited data

piecewise classifier mapping: learning fine-grained learners for novel categories with few examples
multi-attention meta learning for few-shot fine-grained image recognition (IJCAI2020)
revisiting pose-normalization for fine-grained few-shot recognition (CVPR2020)

7. Fine-grained datasets

RPC: A large-scale retial product checkout dataset (arxiv2019)

自监督学习

Background: Self-supervised

预训练模型本身没有直接的应用价值，不管是监督/无监督的自监督学习算法，都是为了下游任务。

自监督好处：

human designed labels may be sub-optimal
unleash the power of untapped data
how human may learn

Self-supervised learning on ImageNet

预训练模型在ImageNet上进行图像分类几乎达到监督学习的精度。

个体/样本分类：对单一图片进行图像增强，对于具有1M张图片的数据集，我们拥有了1MxN张图片，对应了1M个类别。

Self-supervised > Supervised

上图中，自监督学习得到的预训练模型在下游任务中更有效！

上图文章给出的三个结论：

迁移学习迁移的主要是底层特征而不是高层的语义特征。
无监督学习可以保持更好的空间位置关系。有监督的预训练模型会损失空间信息。
提出了更好的监督学习方法

Self-supervised learning via GAN

BigBiGAN 2019 (imagenet top1 55.6%)
Image GPT 2020 (imagenet top1 69.0%)

Self-supervised learning on Videos

Self-supervised learning on multi-modalities

采样牛的图片和声音，不同的牛虽然RGB图片不同但是叫声是相似的，所以可以通过音频信息辅助视觉信息的学习。

时序上不同的audio和RGB都是负标签。

Future Directions

designing better pretrained models for downstream tasks (as opposed to focusing on ImageNet classification)
multi-modality self-supervised pretraining and applications
smarter ways to automatically collect useful data/labels

文档信息

本文作者：Mengqi Cao
本文链接：https://rogercmq.github.io//2020/08/14/%E8%AE%B2%E5%BA%A7%E7%AC%94%E8%AE%B0-VALSE2020%E9%87%8D%E8%A6%81%E4%B8%BB%E9%A2%98%E5%B9%B4%E5%BA%A6%E8%BF%9B%E5%B1%95%E5%9B%9E%E9%A1%BE%E4%BA%8C/
版权声明：自由转载-非商用-非衍生-保持署名（创意共享3.0许可证）

玛丽莲梦奇

VALSE2020重要主题年度进展回顾二：视觉生成GAN，细粒度识别，自监督学习