DaSE-Computer-Vision-2021
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

972 lines
37 KiB

{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import drive\n",
"\n",
"drive.mount('/content/drive', force_remount=True)\n",
"\n",
"# 输入daseCV所在的路径\n",
"# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",
"# 例如 'CV/assignments/assignment1/daseCV/'\n",
"FOLDERNAME = None\n",
"\n",
"assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",
"\n",
"%cd drive/My\\ Drive\n",
"%cp -r $FOLDERNAME ../../\n",
"%cd ../../\n",
"%cd daseCV/datasets/\n",
"!bash get_datasets.sh\n",
"%cd ../../"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": [
"pdf-title"
]
},
"source": [
"# 卷积网络\n",
"到目前为止,我们已经成功使用深层全连接网络,并使用它们来探索不同的优化策略和网络结构。全连接网络是很好的实验平台,因为它们的计算效率很高,但实际上,所有最新结果都使用卷积网络。\n",
"\n",
"首先,你将实现几个在卷积网络中使用的层类型。然后,您将使用这些层在CIFAR-10数据集上训练卷积网络。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"pdf-ignore"
]
},
"outputs": [],
"source": [
"# As usual, a bit of setup\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from daseCV.classifiers.cnn import *\n",
"from daseCV.data_utils import get_CIFAR10_data\n",
"from daseCV.gradient_check import eval_numerical_gradient_array, eval_numerical_gradient\n",
"from daseCV.layers import *\n",
"from daseCV.fast_layers import *\n",
"from daseCV.solver import Solver\n",
"\n",
"%matplotlib inline\n",
"plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",
"plt.rcParams['image.interpolation'] = 'nearest'\n",
"plt.rcParams['image.cmap'] = 'gray'\n",
"\n",
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"def rel_error(x, y):\n",
" \"\"\" returns relative error \"\"\"\n",
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"pdf-ignore"
]
},
"outputs": [],
"source": [
"# Load the (preprocessed) CIFAR10 data.\n",
"\n",
"data = get_CIFAR10_data()\n",
"for k, v in data.items():\n",
" print('%s: ' % k, v.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 卷积:简单的正向传播\n",
"卷积网络的核心是卷积运算。在文件 `daseCV/layers.py` 中的函数`conv_forward_naive`里实现卷积层的正向传播。\n",
"\n",
"此时,你不必太担心效率。只需以你最清楚的方式编写代码即可。\n",
"\n",
"您可以通过运行以下cell来测试你的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_shape = (2, 3, 4, 4)\n",
"w_shape = (3, 3, 4, 4)\n",
"x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)\n",
"w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)\n",
"b = np.linspace(-0.1, 0.2, num=3)\n",
"\n",
"conv_param = {'stride': 2, 'pad': 1}\n",
"out, _ = conv_forward_naive(x, w, b, conv_param)\n",
"correct_out = np.array([[[[-0.08759809, -0.10987781],\n",
" [-0.18387192, -0.2109216 ]],\n",
" [[ 0.21027089, 0.21661097],\n",
" [ 0.22847626, 0.23004637]],\n",
" [[ 0.50813986, 0.54309974],\n",
" [ 0.64082444, 0.67101435]]],\n",
" [[[-0.98053589, -1.03143541],\n",
" [-1.19128892, -1.24695841]],\n",
" [[ 0.69108355, 0.66880383],\n",
" [ 0.59480972, 0.56776003]],\n",
" [[ 2.36270298, 2.36904306],\n",
" [ 2.38090835, 2.38247847]]]])\n",
"\n",
"# Compare your output to ours; difference should be around e-8\n",
"print('Testing conv_forward_naive')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 补充:通过卷积对进行图像处理\n",
"\n",
"为了检查你的代码以及更好的理解卷积层可以实现的操作类型,我们将设置一个包含两个图像的输入,并手动设置执行常见图像处理操作(灰度转换和边缘检测)的滤镜。卷积的正向传播会将这些操作应用于每个输入图像。然后,我们可以将结果可视化以此检查准确性。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"pdf-ignore-input"
]
},
"outputs": [],
"source": [
"from imageio import imread\n",
"from PIL import Image\n",
"\n",
"kitten = imread('notebook_images/kitten.jpg')\n",
"puppy = imread('notebook_images/puppy.jpg')\n",
"# kitten is wide, and puppy is already square\n",
"d = kitten.shape[1] - kitten.shape[0]\n",
"kitten_cropped = kitten[:, d//2:-d//2, :]\n",
"\n",
"img_size = 200 # Make this smaller if it runs too slow\n",
"resized_puppy = np.array(Image.fromarray(puppy).resize((img_size, img_size)))\n",
"resized_kitten = np.array(Image.fromarray(kitten_cropped).resize((img_size, img_size)))\n",
"x = np.zeros((2, 3, img_size, img_size))\n",
"x[0, :, :, :] = resized_puppy.transpose((2, 0, 1))\n",
"x[1, :, :, :] = resized_kitten.transpose((2, 0, 1))\n",
"\n",
"# Set up a convolutional weights holding 2 filters, each 3x3\n",
"w = np.zeros((2, 3, 3, 3))\n",
"\n",
"# The first filter converts the image to grayscale.\n",
"# Set up the red, green, and blue channels of the filter.\n",
"w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]\n",
"w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]\n",
"w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]\n",
"\n",
"# Second filter detects horizontal edges in the blue channel.\n",
"w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]\n",
"\n",
"# Vector of biases. We don't need any bias for the grayscale\n",
"# filter, but for the edge detection filter we want to add 128\n",
"# to each output so that nothing is negative.\n",
"b = np.array([0, 128])\n",
"\n",
"# Compute the result of convolving each input in x with each filter in w,\n",
"# offsetting by b, and storing the results in out.\n",
"out, _ = conv_forward_naive(x, w, b, {'stride': 1, 'pad': 1})\n",
"\n",
"def imshow_no_ax(img, normalize=True):\n",
" \"\"\" Tiny helper to show images as uint8 and remove axis labels \"\"\"\n",
" if normalize:\n",
" img_max, img_min = np.max(img), np.min(img)\n",
" img = 255.0 * (img - img_min) / (img_max - img_min)\n",
" plt.imshow(img.astype('uint8'))\n",
" plt.gca().axis('off')\n",
"\n",
"# Show the original images and the results of the conv operation\n",
"plt.subplot(2, 3, 1)\n",
"imshow_no_ax(puppy, normalize=False)\n",
"plt.title('Original image')\n",
"plt.subplot(2, 3, 2)\n",
"imshow_no_ax(out[0, 0])\n",
"plt.title('Grayscale')\n",
"plt.subplot(2, 3, 3)\n",
"imshow_no_ax(out[0, 1])\n",
"plt.title('Edges')\n",
"plt.subplot(2, 3, 4)\n",
"imshow_no_ax(kitten_cropped, normalize=False)\n",
"plt.subplot(2, 3, 5)\n",
"imshow_no_ax(out[1, 0])\n",
"plt.subplot(2, 3, 6)\n",
"imshow_no_ax(out[1, 1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 卷积:简单的反向传播\n",
"在文件`daseCV/layers.py`的`conv_backward_naive`函数中实现卷积操作的反向传播。同样,你不必太担心计算效率。\n",
"\n",
"完成后,运行以下cell来检查你的反向传播的正确性。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(4, 3, 5, 5)\n",
"w = np.random.randn(2, 3, 3, 3)\n",
"b = np.random.randn(2,)\n",
"dout = np.random.randn(4, 2, 5, 5)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)\n",
"\n",
"out, cache = conv_forward_naive(x, w, b, conv_param)\n",
"dx, dw, db = conv_backward_naive(dout, cache)\n",
"\n",
"# Your errors should be around e-8 or less.\n",
"print('Testing conv_backward_naive function')\n",
"print('dx error: ', rel_error(dx, dx_num))\n",
"print('dw error: ', rel_error(dw, dw_num))\n",
"print('db error: ', rel_error(db, db_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 最大池化: 简单的正向传播\n",
"在文件`daseCV/layers.py`中的`max_pool_forward_naive`函数里实现最大池化操作的正向传播。同样,不必太担心计算效率。\n",
"\n",
"通过运行以下cell检查你的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_shape = (2, 3, 4, 4)\n",
"x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)\n",
"pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}\n",
"\n",
"out, _ = max_pool_forward_naive(x, pool_param)\n",
"\n",
"correct_out = np.array([[[[-0.26315789, -0.24842105],\n",
" [-0.20421053, -0.18947368]],\n",
" [[-0.14526316, -0.13052632],\n",
" [-0.08631579, -0.07157895]],\n",
" [[-0.02736842, -0.01263158],\n",
" [ 0.03157895, 0.04631579]]],\n",
" [[[ 0.09052632, 0.10526316],\n",
" [ 0.14947368, 0.16421053]],\n",
" [[ 0.20842105, 0.22315789],\n",
" [ 0.26736842, 0.28210526]],\n",
" [[ 0.32631579, 0.34105263],\n",
" [ 0.38526316, 0.4 ]]]])\n",
"\n",
"# Compare your output with ours. Difference should be on the order of e-8.\n",
"print('Testing max_pool_forward_naive function:')\n",
"print('difference: ', rel_error(out, correct_out))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 最大池化: 简单的反向传播\n",
"在文件`daseCV/layers.py`中的`max_pool_backward_naive`函数里实现最大池化操作的反向传播。同样,不必太担心计算效率。\n",
"\n",
"通过运行以下cell检查你的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"x = np.random.randn(3, 2, 8, 8)\n",
"dout = np.random.randn(3, 2, 4, 4)\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)\n",
"\n",
"out, cache = max_pool_forward_naive(x, pool_param)\n",
"dx = max_pool_backward_naive(dout, cache)\n",
"\n",
"# Your error should be on the order of e-12\n",
"print('Testing max_pool_backward_naive function:')\n",
"print('dx error: ', rel_error(dx, dx_num))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fast layers\n",
"让卷积和池化层更快可能有点难度。为了减轻你的痛苦,我们在文件`daseCV/fast_layers.py`中为卷积和池化层提供了正向和反向传播的快速版本。\n",
"\n",
"快速卷积的实现依赖于Cython扩展。要编译它,你需要在`daseCV`目录中运行以下命令:\n",
"\n",
"```bash\n",
"python setup.py build_ext --inplace\n",
"```\n",
"\n",
"卷积和池化层的快速版本的API与你在之前实现的完全相同:正向传播接收数据、权重和参数,并产生输出和缓存对象;反向传播接收返回的导数和缓存对象,并针对数据和权重生成梯度。\n",
"\n",
"**提示:** 只有当池化区域不重叠并对输入进行平铺时,池化的快速实现才能表现出最好的性能。如果不满足这些条件,那么快速池化将不会比原来的的实现快很多。\n",
"\n",
"您可以通过运行以下代码和之前的版本之间进行性能的比较:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Rel errors should be around e-9 or less\n",
"from daseCV.fast_layers import conv_forward_fast, conv_backward_fast\n",
"from time import time\n",
"np.random.seed(231)\n",
"x = np.random.randn(100, 3, 31, 31)\n",
"w = np.random.randn(25, 3, 3, 3)\n",
"b = np.random.randn(25,)\n",
"dout = np.random.randn(100, 25, 16, 16)\n",
"conv_param = {'stride': 2, 'pad': 1}\n",
"\n",
"t0 = time()\n",
"out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)\n",
"t1 = time()\n",
"out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)\n",
"t2 = time()\n",
"\n",
"print('Testing conv_forward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('Fast: %fs' % (t2 - t1))\n",
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('Difference: ', rel_error(out_naive, out_fast))\n",
"\n",
"t0 = time()\n",
"dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)\n",
"t1 = time()\n",
"dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)\n",
"t2 = time()\n",
"\n",
"print('\\nTesting conv_backward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('Fast: %fs' % (t2 - t1))\n",
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('dx difference: ', rel_error(dx_naive, dx_fast))\n",
"print('dw difference: ', rel_error(dw_naive, dw_fast))\n",
"print('db difference: ', rel_error(db_naive, db_fast))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Relative errors should be close to 0.0\n",
"from daseCV.fast_layers import max_pool_forward_fast, max_pool_backward_fast\n",
"np.random.seed(231)\n",
"x = np.random.randn(100, 3, 32, 32)\n",
"dout = np.random.randn(100, 3, 16, 16)\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"t0 = time()\n",
"out_naive, cache_naive = max_pool_forward_naive(x, pool_param)\n",
"t1 = time()\n",
"out_fast, cache_fast = max_pool_forward_fast(x, pool_param)\n",
"t2 = time()\n",
"\n",
"print('Testing pool_forward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('fast: %fs' % (t2 - t1))\n",
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('difference: ', rel_error(out_naive, out_fast))\n",
"\n",
"t0 = time()\n",
"dx_naive = max_pool_backward_naive(dout, cache_naive)\n",
"t1 = time()\n",
"dx_fast = max_pool_backward_fast(dout, cache_fast)\n",
"t2 = time()\n",
"\n",
"print('\\nTesting pool_backward_fast:')\n",
"print('Naive: %fs' % (t1 - t0))\n",
"print('fast: %fs' % (t2 - t1))\n",
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n",
"print('dx difference: ', rel_error(dx_naive, dx_fast))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 卷积 \"sandwich\" 层\n",
"之前,我们引入了“sandwich”层的概念,该层将多种操作组合成常用的模式。在文件`daseCV/layer_utils.py`中,您会找到一些实现卷积网络常用模式的sandwich层。运行下面的cell以检查它们是否正常工作。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from daseCV.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 16, 16)\n",
"w = np.random.randn(3, 3, 3, 3)\n",
"b = np.random.randn(3,)\n",
"dout = np.random.randn(2, 3, 8, 8)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n",
"\n",
"out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)\n",
"dx, dw, db = conv_relu_pool_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)\n",
"\n",
"# Relative errors should be around e-8 or less\n",
"print('Testing conv_relu_pool')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from daseCV.layer_utils import conv_relu_forward, conv_relu_backward\n",
"np.random.seed(231)\n",
"x = np.random.randn(2, 3, 8, 8)\n",
"w = np.random.randn(3, 3, 3, 3)\n",
"b = np.random.randn(3,)\n",
"dout = np.random.randn(2, 3, 8, 8)\n",
"conv_param = {'stride': 1, 'pad': 1}\n",
"\n",
"out, cache = conv_relu_forward(x, w, b, conv_param)\n",
"dx, dw, db = conv_relu_backward(dout, cache)\n",
"\n",
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)\n",
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)\n",
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)\n",
"\n",
"# Relative errors should be around e-8 or less\n",
"print('Testing conv_relu:')\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dw error: ', rel_error(dw_num, dw))\n",
"print('db error: ', rel_error(db_num, db))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 三层卷积网络\n",
"现在,你已经实现了所有必需的层,我们可以将它们组合成一个简单的卷积网络。\n",
"\n",
"打开文件`daseCV/classifiers/cnn.py`,并完成`ThreeLayerConvNet`类。请记住,您可以使用fast/sandwich层(以及提供给你)。运行以下cell以帮助你调试:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 检查loss\n",
"建立新网络后,您应该做的第一件事就是检查损失。当我们使用softmax损失时,对于`C`个类别我们期望随机权重的损失(没有正则化)大约为`log(C)`。当我们添加正则化时,损失应该会略有增加。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = ThreeLayerConvNet()\n",
"\n",
"N = 50\n",
"X = np.random.randn(N, 3, 32, 32)\n",
"y = np.random.randint(10, size=N)\n",
"\n",
"loss, grads = model.loss(X, y)\n",
"print('Initial loss (no regularization): ', loss)\n",
"\n",
"model.reg = 0.5\n",
"loss, grads = model.loss(X, y)\n",
"print('Initial loss (with regularization): ', loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 梯度检查\n",
"在损失看起来合理之后,请使用数值梯度检查来确保您的反向传播是正确的。使用数值梯度检查时,应在每一层使用少量的人工数据和少量的神经元。注意:正确的实现可能仍然会出现相对误差,最高可达e-2。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_inputs = 2\n",
"input_dim = (3, 16, 16)\n",
"reg = 0.0\n",
"num_classes = 10\n",
"np.random.seed(231)\n",
"X = np.random.randn(num_inputs, *input_dim)\n",
"y = np.random.randint(num_classes, size=num_inputs)\n",
"\n",
"model = ThreeLayerConvNet(num_filters=3, filter_size=3,\n",
" input_dim=input_dim, hidden_dim=7,\n",
" dtype=np.float64)\n",
"loss, grads = model.loss(X, y)\n",
"# Errors should be small, but correct implementations may have\n",
"# relative errors up to the order of e-2\n",
"for param_name in sorted(grads):\n",
" f = lambda _: model.loss(X, y)[0]\n",
" param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)\n",
" e = rel_error(param_grad_num, grads[param_name])\n",
" print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 小样本的过拟合\n",
"一个不错的技巧是仅用少量训练样本来训练模型。您应该能够过度拟合较小的数据集,这将得到非常高的训练准确度和相对较低的验证准确度。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"\n",
"num_train = 100\n",
"small_data = {\n",
" 'X_train': data['X_train'][:num_train],\n",
" 'y_train': data['y_train'][:num_train],\n",
" 'X_val': data['X_val'],\n",
" 'y_val': data['y_val'],\n",
"}\n",
"\n",
"model = ThreeLayerConvNet(weight_scale=1e-2)\n",
"\n",
"solver = Solver(model, small_data,\n",
" num_epochs=15, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=1)\n",
"solver.train()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(2, 1, 1)\n",
"plt.plot(solver.loss_history, 'o')\n",
"plt.xlabel('iteration')\n",
"plt.ylabel('loss')\n",
"\n",
"plt.subplot(2, 1, 2)\n",
"plt.plot(solver.train_acc_history, '-o')\n",
"plt.plot(solver.val_acc_history, '-o')\n",
"plt.legend(['train', 'val'], loc='upper left')\n",
"plt.xlabel('epoch')\n",
"plt.ylabel('accuracy')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 训练网络\n",
"将三层卷积网络训练一个epoch,在训练集上将达到40%以上的准确度:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)\n",
"\n",
"solver = Solver(model, data,\n",
" num_epochs=1, batch_size=50,\n",
" update_rule='adam',\n",
" optim_config={\n",
" 'learning_rate': 1e-3,\n",
" },\n",
" verbose=True, print_every=20)\n",
"solver.train()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 可视化过滤器\n",
"You can visualize the first-layer convolutional filters from the trained network by running the following:\n",
"您可以通过运行以下命令可视化训练好的第一层卷积过滤器:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from daseCV.vis_utils import visualize_grid\n",
"\n",
"grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))\n",
"plt.imshow(grid.astype('uint8'))\n",
"plt.axis('off')\n",
"plt.gcf().set_size_inches(5, 5)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 空间批量归一化\n",
"我们已经看到,对于训练深层的全连接网络来说批量归一化是非常有用的技术。如论文(`BatchNormalization.ipynb`中的链接)中所建议的,批处理归一化也可以用于卷积网络,但是我们需要对其进行一些调整,该修改将称为“空间批量归一化”。\n",
"\n",
"通常,当我们对维数为`N`的最小批进行批归一化时接受的形状为 `(N, D)`的输入,之后生成形状为`(N, D)`的输出。对于来自卷积层的数据,批归一化需要接受形状为`(N, C, H, W)`的输入,并产生形状为`(N, C, H, W)`的输出,其中`N`维度为最小批大小而 `(H, W)` 维度是特征图的大小。\n",
"\n",
"如果特征图是使用卷积生成的,那么我们期望每个特征通道的两个不同图像以及同一图像内不同位置之间的统计信息例如均值、方差相对一致。毕竟每个特征通道都是由相同的卷积滤波器产生的!因此,空间批量归一化通过计算最小批维度`N`以及空间维度 `H` 和`W`的统计信息,为每个 `C`特征通道计算均值和方差。\n",
"\n",
"[1] [Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing\n",
"Internal Covariate Shift\", ICML 2015.](https://arxiv.org/abs/1502.03167)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 空间批量归一化:正向传播\n",
"\n",
"在文件 `daseCV/layers.py`中的`spatial_batchnorm_forward`函数里实现空间批归一化的正向传播。通过运行以下命令检查您的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"# Check the training-time forward pass by checking means and variances\n",
"# of features both before and after spatial batch normalization\n",
"\n",
"N, C, H, W = 2, 3, 4, 5\n",
"x = 4 * np.random.randn(N, C, H, W) + 10\n",
"\n",
"print('Before spatial batch normalization:')\n",
"print(' Shape: ', x.shape)\n",
"print(' Means: ', x.mean(axis=(0, 2, 3)))\n",
"print(' Stds: ', x.std(axis=(0, 2, 3)))\n",
"\n",
"# Means should be close to zero and stds close to one\n",
"gamma, beta = np.ones(C), np.zeros(C)\n",
"bn_param = {'mode': 'train'}\n",
"out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n",
"print('After spatial batch normalization:')\n",
"print(' Shape: ', out.shape)\n",
"print(' Means: ', out.mean(axis=(0, 2, 3)))\n",
"print(' Stds: ', out.std(axis=(0, 2, 3)))\n",
"\n",
"# Means should be close to beta and stds close to gamma\n",
"gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8])\n",
"out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n",
"print('After spatial batch normalization (nontrivial gamma, beta):')\n",
"print(' Shape: ', out.shape)\n",
"print(' Means: ', out.mean(axis=(0, 2, 3)))\n",
"print(' Stds: ', out.std(axis=(0, 2, 3)))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"# Check the test-time forward pass by running the training-time\n",
"# forward pass many times to warm up the running averages, and then\n",
"# checking the means and variances of activations after a test-time\n",
"# forward pass.\n",
"N, C, H, W = 10, 4, 11, 12\n",
"\n",
"bn_param = {'mode': 'train'}\n",
"gamma = np.ones(C)\n",
"beta = np.zeros(C)\n",
"for t in range(50):\n",
" x = 2.3 * np.random.randn(N, C, H, W) + 13\n",
" spatial_batchnorm_forward(x, gamma, beta, bn_param)\n",
"bn_param['mode'] = 'test'\n",
"x = 2.3 * np.random.randn(N, C, H, W) + 13\n",
"a_norm, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n",
"\n",
"# Means should be close to zero and stds close to one, but will be\n",
"# noisier than training-time forward passes.\n",
"print('After spatial batch normalization (test-time):')\n",
"print(' means: ', a_norm.mean(axis=(0, 2, 3)))\n",
"print(' stds: ', a_norm.std(axis=(0, 2, 3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 空间批量归一化:反向传播\n",
"在文件`daseCV/layers.py`中的函数`spatial_batchnorm_backward`里实现空间批量归一化的反向传播。运行以下命令以检查您的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, C, H, W = 2, 3, 4, 5\n",
"x = 5 * np.random.randn(N, C, H, W) + 12\n",
"gamma = np.random.randn(C)\n",
"beta = np.random.randn(C)\n",
"dout = np.random.randn(N, C, H, W)\n",
"\n",
"bn_param = {'mode': 'train'}\n",
"fx = lambda x: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]\n",
"fg = lambda a: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]\n",
"fb = lambda b: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]\n",
"\n",
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n",
"da_num = eval_numerical_gradient_array(fg, gamma, dout)\n",
"db_num = eval_numerical_gradient_array(fb, beta, dout)\n",
"\n",
"#You should expect errors of magnitudes between 1e-12~1e-06\n",
"_, cache = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n",
"dx, dgamma, dbeta = spatial_batchnorm_backward(dout, cache)\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dgamma error: ', rel_error(da_num, dgamma))\n",
"print('dbeta error: ', rel_error(db_num, dbeta))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 组归一化\n",
"在之前的notebook中,我们提到了“层归一化”是一种替代的归一化技术,它减轻了“批归一化”的批大小限制。但是,正如 [2] 的作者所观察到的,当与卷积层一起使用时,层归一化的性能不如批归一化:\n",
"\n",
">With fully connected layers, all the hidden units in a layer tend to make similar contributions to the final prediction, and re-centering and rescaling the summed inputs to a layer works well. However, the assumption of similar contributions is no longer true for convolutional neural networks. The large number of the hidden units whose\n",
"receptive fields lie near the boundary of the image are rarely turned on and thus have very different\n",
"statistics from the rest of the hidden units within the same layer.\n",
"\n",
"[3] 的作者提出了一种中间技术。与“层归一化”相反,在“层归一化”中您对每个数据点的整个特征进行归一化,他们建议将每个数据点一致的特征划分为G组,然后对每个组的每个数据点进行归一化。\n",
"\n",
"![Comparison of normalization techniques discussed so far](notebook_images/normalization.png)\n",
"<center>**Visual comparison of the normalization techniques discussed so far (image edited from [3])**</center>\n",
"\n",
"尽管在每一组中仍然存在贡献相等的假设,但作者假设这不是问题,因为在视觉识别的特征中出现了天生的分组。他们用来说明这一点的一个例子是,在传统的计算机视觉中,许多高性能的传统的特征都有明确分组在一起的术语。以Histogram of Oriented Gradients[4]为例——在计算每个空间局部块的直方图后,对每个块的直方图进行归一化处理,然后拼接在一起形成最终的特征向量。\n",
"\n",
"现在,你将实现组归一化。请注意,你将在以下cell中实现的这种归一化技术是在2018年引入并发布到ECCV的,这是是一个正在进行且激动人心的研究领域!\n",
"\n",
"[2] [Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. \"Layer Normalization.\" stat 1050 (2016): 21.](https://arxiv.org/pdf/1607.06450.pdf)\n",
"\n",
"\n",
"[3] [Wu, Yuxin, and Kaiming He. \"Group Normalization.\" arXiv preprint arXiv:1803.08494 (2018).](https://arxiv.org/abs/1803.08494)\n",
"\n",
"\n",
"[4] [N. Dalal and B. Triggs. Histograms of oriented gradients for\n",
"human detection. In Computer Vision and Pattern Recognition\n",
"(CVPR), 2005.](https://ieeexplore.ieee.org/abstract/document/1467360/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 组归一化:正向传播\n",
"\n",
"在文件`daseCV/layers.py`中的`spatial_groupnorm_forward`函数里实现组归一化的正向传播。通过运行以下命令检查您的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"# Check the training-time forward pass by checking means and variances\n",
"# of features both before and after spatial batch normalization\n",
"\n",
"N, C, H, W = 2, 6, 4, 5\n",
"G = 2\n",
"x = 4 * np.random.randn(N, C, H, W) + 10\n",
"x_g = x.reshape((N*G,-1))\n",
"print('Before spatial group normalization:')\n",
"print(' Shape: ', x.shape)\n",
"print(' Means: ', x_g.mean(axis=1))\n",
"print(' Stds: ', x_g.std(axis=1))\n",
"\n",
"# Means should be close to zero and stds close to one\n",
"gamma, beta = np.ones((1,C,1,1)), np.zeros((1,C,1,1))\n",
"bn_param = {'mode': 'train'}\n",
"\n",
"out, _ = spatial_groupnorm_forward(x, gamma, beta, G, bn_param)\n",
"out_g = out.reshape((N*G,-1))\n",
"print('After spatial group normalization:')\n",
"print(' Shape: ', out.shape)\n",
"print(' Means: ', out_g.mean(axis=1))\n",
"print(' Stds: ', out_g.std(axis=1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 空间组归一化:反向传播\n",
"在文件 `daseCV/layers.py`中的`spatial_groupnorm_backward`函数里实现空间批量归一化的反向传播。运行以下命令以检查您的代码:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(231)\n",
"N, C, H, W = 2, 6, 4, 5\n",
"G = 2\n",
"x = 5 * np.random.randn(N, C, H, W) + 12\n",
"gamma = np.random.randn(1,C,1,1)\n",
"beta = np.random.randn(1,C,1,1)\n",
"dout = np.random.randn(N, C, H, W)\n",
"\n",
"gn_param = {}\n",
"fx = lambda x: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]\n",
"fg = lambda a: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]\n",
"fb = lambda b: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]\n",
"\n",
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n",
"da_num = eval_numerical_gradient_array(fg, gamma, dout)\n",
"db_num = eval_numerical_gradient_array(fb, beta, dout)\n",
"\n",
"_, cache = spatial_groupnorm_forward(x, gamma, beta, G, gn_param)\n",
"dx, dgamma, dbeta = spatial_groupnorm_backward(dout, cache)\n",
"#You should expect errors of magnitudes between 1e-12~1e-07\n",
"print('dx error: ', rel_error(dx_num, dx))\n",
"print('dgamma error: ', rel_error(da_num, dgamma))\n",
"print('dbeta error: ', rel_error(db_num, dbeta))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"# 重要\n",
"\n",
"这里是作业的结尾处,请执行以下步骤:\n",
"\n",
"1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n",
"2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",
"FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n",
"\n",
"for files in FILES_TO_SAVE:\n",
" with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",
" f.write(''.join(open(files).readlines()))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}