VGGNet

Abstract

进一步加深了网络，在ImageNet 2014中定位和分类任务分别获一二名

Introduction

ILSVRC-2013 使用了更小的window size和更小的stride。
本文使用更小的filter，稳定的增加了网络深度（更多卷积层）

Architecture

基本架构

224*224 input 减去均值
使用了33和11卷积，保留维度的填充，部分卷积层后跟max-pooling2*2
3 fc layer（4096 input） 1 softmax
所有激活函数为relu
LRN因为没有效果而被弃用
图示
![vggNet Architecture][1]
![number of parameters][2]
讨论
多个33层可以达到77的视野，同时减少了参数数量，增加了非线性
1*1保持channel数不变，增加了模型能力

Classification Task

与Alex类似，74次迭代后停止

We conjecture that in spite of the larger number of parameters and the greater depth of our nets compared to (Krizhevsky et al., 2012), the nets required less epochs to converge due to (a) implicit regularisation imposed by greater depth and smaller conv. filter sizes; (b) pre-initialisation of certain layers.

初始化：
先使用随机初始化训练较浅网络，使用训练后的结果（前四层和后三层）初始化更深的网络

图片处理：
isotropically-rescaled：各向同性缩放
先缩放为S大小：[256,512]，再随机取区域裁剪为224*224

测试：
缩放为scale Q
将全连接转化为卷积，最后得到1000nn的输出，等效于在scale Q上选取了n*n个位置放入神经网络
节省了计算量

作者：Hao Zhang
链接：https://www.zhihu.com/question/53420266/answer/180001976
来源：知乎
卷积层和全连接层的唯一区别在于卷积层的神经元对输入是局部连接的, 并且同一个通道(channel)内不同神经元共享权值(weights). 卷积层和全连接层都是进行了一个点乘操作, 它们的函数形式相同. 因此卷积层可以转化为对应的全连接层, 全连接层也可以转化为对应的卷积层.
比如VGGNet中, 第一个全连接层的输入是77512, 输出是4096. 这可以用一个卷积核大小77, 步长(stride)为1, 没有填补(padding), 输出通道数4096的卷积层等效表示, 其输出为114096, 和全连接层等价. 后续的全连接层可以用1x1卷积等效替代.
简而言之, 全连接层转化为卷积层的规则是: 将卷积核大小设置为输入的空间大小.这样做的好处在于卷积层对输入大小没有限制, 因此可以高效地对测试图像做滑动窗式的预测. 比如训练时对224224大小的图像得到77512的特征, 而对于384384大小的测试图像, 将得到1212512的特征, 通过后面3个从全连接层等效过来的卷积层, 输出大小是66*1000, 这表示了测试图像在36个空间位置上的各类分数向量. 和分别对测试图像的36个位置使用原始的CNN相比, 由于等效的CNN共享了大量计算, 这种方案十分高效而又达到了相同目的
参考文献
[1]Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
[2]. Lin, Min, Qiang Chen, and Shuicheng Yan. “Network in network.” arXiv preprint arXiv:1312.4400
[3]. Sermanet, Pierre, et al. “Overfeat: Integrated recognition, localization and detection usingconvolutional networks.” arXiv preprint arXiv:1312.6229 (2013).
[4]. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks forsemantic segmentation.” Proceedings of the IEEE Conference on Computer Vision andPattern Recognition. 2015.
[5]. Fei-Fei Li, Andrej Karpathy, and Justin Johnson. CS231n: Convolutional Neural Networks for Visual Recognition. Stanford. 2016.

Evaluation

![evaluation results][3]
![multi-scale evaluation][4]
![dense&crop evaluation][5]
![convNet fusion evaluation][6]

比GoogLeNet泛化性能更好
[1]: http://upload-images.jianshu.io/upload_images/3232548-bcc2ded7859ee146.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240
[2]: http://upload-images.jianshu.io/upload_images/3232548-42a791044eec4ff3.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240
[3]: http://upload-images.jianshu.io/upload_images/3232548-0bf25f504335c186.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240
[4]: http://upload-images.jianshu.io/upload_images/3232548-5efba64ff7d5c399.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240
[5]: http://upload-images.jianshu.io/upload_images/3232548-076a5b039b016b20.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240
[6]: http://upload-images.jianshu.io/upload_images/3232548-396bedeb565e428c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240