@ -0,0 +1,918 @@ | |||
{ | |||
"cells": [ | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"from google.colab import drive\n", | |||
"\n", | |||
"drive.mount('/content/drive', force_remount=True)\n", | |||
"\n", | |||
"# 输入daseCV所在的路径\n", | |||
"# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n", | |||
"# 例如 'CV/assignments/assignment1/daseCV/'\n", | |||
"FOLDERNAME = None\n", | |||
"\n", | |||
"assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n", | |||
"\n", | |||
"%cd drive/My\\ Drive\n", | |||
"%cp -r $FOLDERNAME ../../\n", | |||
"%cd ../../\n", | |||
"%cd daseCV/datasets/\n", | |||
"!bash get_datasets.sh\n", | |||
"%cd ../../" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-title" | |||
] | |||
}, | |||
"source": [ | |||
"# Batch Normalization\n", | |||
"One way to make deep networks easier to train is to use more sophisticated optimization procedures such as SGD+momentum, RMSProp, or Adam. Another strategy is to change the architecture of the network to make it easier to train. \n", | |||
"One idea along these lines is batch normalization which was proposed by [1] in 2015.\n", | |||
"\n", | |||
"The idea is relatively straightforward. Machine learning methods tend to work better when their input data consists of uncorrelated features with zero mean and unit variance. When training a neural network, we can preprocess the data before feeding it to the network to explicitly decorrelate its features; this will ensure that the first layer of the network sees data that follows a nice distribution. However, even if we preprocess the input data, the activations at deeper layers of the network will likely no longer be decorrelated and will no longer have zero mean or unit variance since they are output from earlier layers in the network. Even worse, during the training process the distribution of features at each layer of the network will shift as the weights of each layer are updated.\n", | |||
"\n", | |||
"The authors of [1] hypothesize that the shifting distribution of features inside deep neural networks may make training deep networks more difficult. To overcome this problem, [1] proposes to insert batch normalization layers into the network. At training time, a batch normalization layer uses a minibatch of data to estimate the mean and standard deviation of each feature. These estimated means and standard deviations are then used to center and normalize the features of the minibatch. A running average of these means and standard deviations is kept during training, and at test time these running averages are used to center and normalize features.\n", | |||
"\n", | |||
"It is possible that this normalization strategy could reduce the representational power of the network, since it may sometimes be optimal for certain layers to have features that are not zero-mean or unit variance. To this end, the batch normalization layer includes learnable shift and scale parameters for each feature dimension.\n", | |||
"\n", | |||
"[1] [Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing\n", | |||
"Internal Covariate Shift\", ICML 2015.](https://arxiv.org/abs/1502.03167)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# As usual, a bit of setup\n", | |||
"import time\n", | |||
"import numpy as np\n", | |||
"import matplotlib.pyplot as plt\n", | |||
"from daseCV.classifiers.fc_net import *\n", | |||
"from daseCV.data_utils import get_CIFAR10_data\n", | |||
"from daseCV.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n", | |||
"from daseCV.solver import Solver\n", | |||
"\n", | |||
"%matplotlib inline\n", | |||
"plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", | |||
"plt.rcParams['image.interpolation'] = 'nearest'\n", | |||
"plt.rcParams['image.cmap'] = 'gray'\n", | |||
"\n", | |||
"# for auto-reloading external modules\n", | |||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", | |||
"%load_ext autoreload\n", | |||
"%autoreload 2\n", | |||
"\n", | |||
"\n", | |||
"def rel_error(x, y):\n", | |||
" \"\"\" returns relative error \"\"\"\n", | |||
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n", | |||
"\n", | |||
"\n", | |||
"def print_mean_std(x, axis=0):\n", | |||
" print(' means: ', x.mean(axis=axis))\n", | |||
" print(' stds: ', x.std(axis=axis))\n", | |||
" print()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# Load the (preprocessed) CIFAR10 data.\n", | |||
"data = get_CIFAR10_data()\n", | |||
"for k, v in data.items():\n", | |||
" print('%s: ' % k, v.shape)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## Batch normalization: forward\n", | |||
"\n", | |||
"在文件 `daseCV/layers` 中实现 `batchnorm_forward` 函数完成batch normalization的前向传播。然后运行以下代码测试你的实现是否准确。\n", | |||
"\n", | |||
"上面参考论文[1]可能会对你有帮助" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Check the training-time forward pass by checking means and variances\n", | |||
"# of features both before and after batch normalization \n", | |||
"\n", | |||
"# Simulate the forward pass for a two-layer network\n", | |||
"np.random.seed(231)\n", | |||
"N, D1, D2, D3 = 200, 50, 60, 3\n", | |||
"X = np.random.randn(N, D1)\n", | |||
"W1 = np.random.randn(D1, D2)\n", | |||
"W2 = np.random.randn(D2, D3)\n", | |||
"a = np.maximum(0, X.dot(W1)).dot(W2)\n", | |||
"\n", | |||
"print('Before batch normalization:')\n", | |||
"print_mean_std(a,axis=0)\n", | |||
"\n", | |||
"gamma = np.ones((D3,))\n", | |||
"beta = np.zeros((D3,))\n", | |||
"# Means should be close to zero and stds close to one\n", | |||
"print('After batch normalization (gamma=1, beta=0)')\n", | |||
"a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n", | |||
"print_mean_std(a_norm,axis=0)\n", | |||
"\n", | |||
"gamma = np.asarray([1.0, 2.0, 3.0])\n", | |||
"beta = np.asarray([11.0, 12.0, 13.0])\n", | |||
"# Now means should be close to beta and stds close to gamma\n", | |||
"print('After batch normalization (gamma=', gamma, ', beta=', beta, ')')\n", | |||
"a_norm, _ = batchnorm_forward(a, gamma, beta, {'mode': 'train'})\n", | |||
"print_mean_std(a_norm,axis=0)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Check the test-time forward pass by running the training-time\n", | |||
"# forward pass many times to warm up the running averages, and then\n", | |||
"# checking the means and variances of activations after a test-time\n", | |||
"# forward pass.\n", | |||
"\n", | |||
"np.random.seed(231)\n", | |||
"N, D1, D2, D3 = 200, 50, 60, 3\n", | |||
"W1 = np.random.randn(D1, D2)\n", | |||
"W2 = np.random.randn(D2, D3)\n", | |||
"\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"gamma = np.ones(D3)\n", | |||
"beta = np.zeros(D3)\n", | |||
"\n", | |||
"for t in range(50):\n", | |||
" X = np.random.randn(N, D1)\n", | |||
" a = np.maximum(0, X.dot(W1)).dot(W2)\n", | |||
" batchnorm_forward(a, gamma, beta, bn_param)\n", | |||
"\n", | |||
"bn_param['mode'] = 'test'\n", | |||
"X = np.random.randn(N, D1)\n", | |||
"a = np.maximum(0, X.dot(W1)).dot(W2)\n", | |||
"a_norm, _ = batchnorm_forward(a, gamma, beta, bn_param)\n", | |||
"\n", | |||
"# Means should be close to zero and stds close to one, but will be\n", | |||
"# noisier than training-time forward passes.\n", | |||
"print('After batch normalization (test-time):')\n", | |||
"print_mean_std(a_norm,axis=0)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## Batch normalization: backward\n", | |||
"在 `batchnorm_backward` 中实现batch normalization的反向传播\n", | |||
"\n", | |||
"要想得到反向传播的公式,你应该写出batch normalization的计算图,并且对每个中间节点求反向传播公式。一些中间节点可能有多个传出分支;注意要在反向传播中对这些分支的梯度求和。\n", | |||
"\n", | |||
"一旦你实现了该功能,请运行下面的代码进行梯度数值检测。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Gradient check batchnorm backward pass\n", | |||
"np.random.seed(231)\n", | |||
"N, D = 4, 5\n", | |||
"x = 5 * np.random.randn(N, D) + 12\n", | |||
"gamma = np.random.randn(D)\n", | |||
"beta = np.random.randn(D)\n", | |||
"dout = np.random.randn(N, D)\n", | |||
"\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"fx = lambda x: batchnorm_forward(x, gamma, beta, bn_param)[0]\n", | |||
"fg = lambda a: batchnorm_forward(x, a, beta, bn_param)[0]\n", | |||
"fb = lambda b: batchnorm_forward(x, gamma, b, bn_param)[0]\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n", | |||
"da_num = eval_numerical_gradient_array(fg, gamma.copy(), dout)\n", | |||
"db_num = eval_numerical_gradient_array(fb, beta.copy(), dout)\n", | |||
"\n", | |||
"_, cache = batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"dx, dgamma, dbeta = batchnorm_backward(dout, cache)\n", | |||
"#You should expect to see relative errors between 1e-13 and 1e-8\n", | |||
"print('dx error: ', rel_error(dx_num, dx))\n", | |||
"print('dgamma error: ', rel_error(da_num, dgamma))\n", | |||
"print('dbeta error: ', rel_error(db_num, dbeta))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## Batch normalization: alternative backward\n", | |||
"\n", | |||
"课堂上我们讨论过两种求sigmoid反向传播公式的方法,第一种是写出计算图,然后对计算图中的每一个中间变量求导;另一种方法是在纸上计算好最终的梯度,得到一个很简单的公式。打个比方,你可以先在纸上算出sigmoid的反向传播公式,然后直接实现就可以了,不需要算中间变量的梯度。\n", | |||
"\n", | |||
"BN也有这个性质,你可以自己推一波公式!(接下来不翻译了,自己看)\n", | |||
"\n", | |||
"In the forward pass, given a set of inputs $X=\\begin{bmatrix}x_1\\\\x_2\\\\...\\\\x_N\\end{bmatrix}$, \n", | |||
"\n", | |||
"we first calculate the mean $\\mu$ and variance $v$.\n", | |||
"With $\\mu$ and $v$ calculated, we can calculate the standard deviation $\\sigma$ and normalized data $Y$.\n", | |||
"The equations and graph illustration below describe the computation ($y_i$ is the i-th element of the vector $Y$).\n", | |||
"\n", | |||
"\\begin{align}\n", | |||
"& \\mu=\\frac{1}{N}\\sum_{k=1}^N x_k & v=\\frac{1}{N}\\sum_{k=1}^N (x_k-\\mu)^2 \\\\\n", | |||
"& \\sigma=\\sqrt{v+\\epsilon} & y_i=\\frac{x_i-\\mu}{\\sigma}\n", | |||
"\\end{align}" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"<img src=\"notebook_images/batchnorm_graph.png\" width=691 height=202>" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"source": [ | |||
"The meat of our problem during backpropagation is to compute $\\frac{\\partial L}{\\partial X}$, given the upstream gradient we receive, $\\frac{\\partial L}{\\partial Y}.$ To do this, recall the chain rule in calculus gives us $\\frac{\\partial L}{\\partial X} = \\frac{\\partial L}{\\partial Y} \\cdot \\frac{\\partial Y}{\\partial X}$.\n", | |||
"\n", | |||
"The unknown/hart part is $\\frac{\\partial Y}{\\partial X}$. We can find this by first deriving step-by-step our local gradients at \n", | |||
"$\\frac{\\partial v}{\\partial X}$, $\\frac{\\partial \\mu}{\\partial X}$,\n", | |||
"$\\frac{\\partial \\sigma}{\\partial v}$, \n", | |||
"$\\frac{\\partial Y}{\\partial \\sigma}$, and $\\frac{\\partial Y}{\\partial \\mu}$,\n", | |||
"and then use the chain rule to compose these gradients (which appear in the form of vectors!) appropriately to compute $\\frac{\\partial Y}{\\partial X}$.\n", | |||
"\n", | |||
"If it's challenging to directly reason about the gradients over $X$ and $Y$ which require matrix multiplication, try reasoning about the gradients in terms of individual elements $x_i$ and $y_i$ first: in that case, you will need to come up with the derivations for $\\frac{\\partial L}{\\partial x_i}$, by relying on the Chain Rule to first calculate the intermediate $\\frac{\\partial \\mu}{\\partial x_i}, \\frac{\\partial v}{\\partial x_i}, \\frac{\\partial \\sigma}{\\partial x_i},$ then assemble these pieces to calculate $\\frac{\\partial y_i}{\\partial x_i}$. \n", | |||
"\n", | |||
"You should make sure each of the intermediary gradient derivations are all as simplified as possible, for ease of implementation. \n", | |||
"\n", | |||
"\n", | |||
"算好之后,在 `batchnorm_backward_alt` 函数中实现简化版的batch normalization的反向传播公式,然后分别运行两种反向传播实现并比较结果,你的结果应该是一致的,但是简化版的实现应该会更快一点。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"N, D = 100, 500\n", | |||
"x = 5 * np.random.randn(N, D) + 12\n", | |||
"gamma = np.random.randn(D)\n", | |||
"beta = np.random.randn(D)\n", | |||
"dout = np.random.randn(N, D)\n", | |||
"\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"out, cache = batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"\n", | |||
"t1 = time.time()\n", | |||
"dx1, dgamma1, dbeta1 = batchnorm_backward(dout, cache)\n", | |||
"t2 = time.time()\n", | |||
"dx2, dgamma2, dbeta2 = batchnorm_backward_alt(dout, cache)\n", | |||
"t3 = time.time()\n", | |||
"\n", | |||
"print('dx difference: ', rel_error(dx1, dx2))\n", | |||
"print('dgamma difference: ', rel_error(dgamma1, dgamma2))\n", | |||
"print('dbeta difference: ', rel_error(dbeta1, dbeta2))\n", | |||
"print('speedup: %.2fx' % ((t2 - t1) / (t3 - t2)))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## Fully Connected Nets with Batch Normalization\n", | |||
"\n", | |||
"现在你已经实现了Batch Normalization,请在`daseCV/classifiers/fc_net.py`中的`FullyConnectedNet`上添加Batch Norm。\n", | |||
"\n", | |||
"具体来说,当在构造函数中`normalization`标记设置为`batchnorm`时,应该在每个ReLU激活层之前插入一个Batch Norm层。网络最后一层的输出不应该加Batch Norm。\n", | |||
"\n", | |||
"当你完成该功能,运行以下代码进行梯度检查。\n", | |||
"\n", | |||
"HINT: You might find it useful to define an additional helper layer similar to those in the file `daseCV/layer_utils.py`. If you decide to do so, do it in the file `daseCV/classifiers/fc_net.py`." | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n", | |||
"X = np.random.randn(N, D)\n", | |||
"y = np.random.randint(C, size=(N,))\n", | |||
"\n", | |||
"# You should expect losses between 1e-4~1e-10 for W, \n", | |||
"# losses between 1e-08~1e-10 for b,\n", | |||
"# and losses between 1e-08~1e-09 for beta and gammas.\n", | |||
"for reg in [0, 3.14]:\n", | |||
" print('Running check with reg = ', reg)\n", | |||
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n", | |||
" reg=reg, weight_scale=5e-2, dtype=np.float64,\n", | |||
" normalization='batchnorm')\n", | |||
"\n", | |||
" loss, grads = model.loss(X, y)\n", | |||
" print('Initial loss: ', loss)\n", | |||
"\n", | |||
" for name in sorted(grads):\n", | |||
" f = lambda _: model.loss(X, y)[0]\n", | |||
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n", | |||
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n", | |||
" if reg == 0: print()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Batchnorm for deep networks\n", | |||
"\n", | |||
"运行以下代码,在1000个样本的子集上训练一个六层网络,包括有和没有Batch Norm的版本。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"# Try training a very deep net with batchnorm\n", | |||
"hidden_dims = [100, 100, 100, 100, 100]\n", | |||
"\n", | |||
"num_train = 1000\n", | |||
"small_data = {\n", | |||
" 'X_train': data['X_train'][:num_train],\n", | |||
" 'y_train': data['y_train'][:num_train],\n", | |||
" 'X_val': data['X_val'],\n", | |||
" 'y_val': data['y_val'],\n", | |||
"}\n", | |||
"\n", | |||
"weight_scale = 2e-2\n", | |||
"bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization='batchnorm')\n", | |||
"model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n", | |||
"\n", | |||
"print('Solver with batch norm:')\n", | |||
"bn_solver = Solver(bn_model, small_data,\n", | |||
" num_epochs=10, batch_size=50,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 1e-3,\n", | |||
" },\n", | |||
" verbose=True,print_every=20)\n", | |||
"bn_solver.train()\n", | |||
"\n", | |||
"print('\\nSolver without batch norm:')\n", | |||
"solver = Solver(model, small_data,\n", | |||
" num_epochs=10, batch_size=50,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 1e-3,\n", | |||
" },\n", | |||
" verbose=True, print_every=20)\n", | |||
"solver.train()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"运行以下命令来可视化上面训练的两个网络的结果。你会发现,使用Batch Norm有助于网络更快地收敛。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore-input" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"def plot_training_history(title, label, baseline, bn_solvers, plot_fn, bl_marker='.', bn_marker='.', labels=None):\n", | |||
" \"\"\"utility function for plotting training history\"\"\"\n", | |||
" plt.title(title)\n", | |||
" plt.xlabel(label)\n", | |||
" bn_plots = [plot_fn(bn_solver) for bn_solver in bn_solvers]\n", | |||
" bl_plot = plot_fn(baseline)\n", | |||
" num_bn = len(bn_plots)\n", | |||
" for i in range(num_bn):\n", | |||
" label='with_norm'\n", | |||
" if labels is not None:\n", | |||
" label += str(labels[i])\n", | |||
" plt.plot(bn_plots[i], bn_marker, label=label)\n", | |||
" label='baseline'\n", | |||
" if labels is not None:\n", | |||
" label += str(labels[0])\n", | |||
" plt.plot(bl_plot, bl_marker, label=label)\n", | |||
" plt.legend(loc='lower center', ncol=num_bn+1) \n", | |||
"\n", | |||
" \n", | |||
"plt.subplot(3, 1, 1)\n", | |||
"plot_training_history('Training loss','Iteration', solver, [bn_solver], \\\n", | |||
" lambda x: x.loss_history, bl_marker='o', bn_marker='o')\n", | |||
"plt.subplot(3, 1, 2)\n", | |||
"plot_training_history('Training accuracy','Epoch', solver, [bn_solver], \\\n", | |||
" lambda x: x.train_acc_history, bl_marker='-o', bn_marker='-o')\n", | |||
"plt.subplot(3, 1, 3)\n", | |||
"plot_training_history('Validation accuracy','Epoch', solver, [bn_solver], \\\n", | |||
" lambda x: x.val_acc_history, bl_marker='-o', bn_marker='-o')\n", | |||
"\n", | |||
"plt.gcf().set_size_inches(15, 15)\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Batch normalization and initialization\n", | |||
"\n", | |||
"我们将进行一个小实验来研究Batch Norm和权值初始化之间的相互关系。\n", | |||
"\n", | |||
"下面代码将训练8层网络,分别使用不同规模的权重初始化进行Batch Norm和不进行Batch Norm。\n", | |||
"然后绘制训练精度、验证集精度、训练损失。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore-input" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"# Try training a very deep net with batchnorm\n", | |||
"hidden_dims = [50, 50, 50, 50, 50, 50, 50]\n", | |||
"num_train = 1000\n", | |||
"small_data = {\n", | |||
" 'X_train': data['X_train'][:num_train],\n", | |||
" 'y_train': data['y_train'][:num_train],\n", | |||
" 'X_val': data['X_val'],\n", | |||
" 'y_val': data['y_val'],\n", | |||
"}\n", | |||
"\n", | |||
"bn_solvers_ws = {}\n", | |||
"solvers_ws = {}\n", | |||
"weight_scales = np.logspace(-4, 0, num=20)\n", | |||
"for i, weight_scale in enumerate(weight_scales):\n", | |||
" print('Running weight scale %d / %d' % (i + 1, len(weight_scales)))\n", | |||
" bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization='batchnorm')\n", | |||
" model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n", | |||
"\n", | |||
" bn_solver = Solver(bn_model, small_data,\n", | |||
" num_epochs=10, batch_size=50,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 1e-3,\n", | |||
" },\n", | |||
" verbose=False, print_every=200)\n", | |||
" bn_solver.train()\n", | |||
" bn_solvers_ws[weight_scale] = bn_solver\n", | |||
"\n", | |||
" solver = Solver(model, small_data,\n", | |||
" num_epochs=10, batch_size=50,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 1e-3,\n", | |||
" },\n", | |||
" verbose=False, print_every=200)\n", | |||
" solver.train()\n", | |||
" solvers_ws[weight_scale] = solver" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore-input" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# Plot results of weight scale experiment\n", | |||
"best_train_accs, bn_best_train_accs = [], []\n", | |||
"best_val_accs, bn_best_val_accs = [], []\n", | |||
"final_train_loss, bn_final_train_loss = [], []\n", | |||
"\n", | |||
"for ws in weight_scales:\n", | |||
" best_train_accs.append(max(solvers_ws[ws].train_acc_history))\n", | |||
" bn_best_train_accs.append(max(bn_solvers_ws[ws].train_acc_history))\n", | |||
" \n", | |||
" best_val_accs.append(max(solvers_ws[ws].val_acc_history))\n", | |||
" bn_best_val_accs.append(max(bn_solvers_ws[ws].val_acc_history))\n", | |||
" \n", | |||
" final_train_loss.append(np.mean(solvers_ws[ws].loss_history[-100:]))\n", | |||
" bn_final_train_loss.append(np.mean(bn_solvers_ws[ws].loss_history[-100:]))\n", | |||
" \n", | |||
"plt.subplot(3, 1, 1)\n", | |||
"plt.title('Best val accuracy vs weight initialization scale')\n", | |||
"plt.xlabel('Weight initialization scale')\n", | |||
"plt.ylabel('Best val accuracy')\n", | |||
"plt.semilogx(weight_scales, best_val_accs, '-o', label='baseline')\n", | |||
"plt.semilogx(weight_scales, bn_best_val_accs, '-o', label='batchnorm')\n", | |||
"plt.legend(ncol=2, loc='lower right')\n", | |||
"\n", | |||
"plt.subplot(3, 1, 2)\n", | |||
"plt.title('Best train accuracy vs weight initialization scale')\n", | |||
"plt.xlabel('Weight initialization scale')\n", | |||
"plt.ylabel('Best training accuracy')\n", | |||
"plt.semilogx(weight_scales, best_train_accs, '-o', label='baseline')\n", | |||
"plt.semilogx(weight_scales, bn_best_train_accs, '-o', label='batchnorm')\n", | |||
"plt.legend()\n", | |||
"\n", | |||
"plt.subplot(3, 1, 3)\n", | |||
"plt.title('Final training loss vs weight initialization scale')\n", | |||
"plt.xlabel('Weight initialization scale')\n", | |||
"plt.ylabel('Final training loss')\n", | |||
"plt.semilogx(weight_scales, final_train_loss, '-o', label='baseline')\n", | |||
"plt.semilogx(weight_scales, bn_final_train_loss, '-o', label='batchnorm')\n", | |||
"plt.legend()\n", | |||
"plt.gca().set_ylim(1.0, 3.5)\n", | |||
"\n", | |||
"plt.gcf().set_size_inches(15, 15)\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## Inline Question 1:\n", | |||
"描述一下这个实验的结果。权重初始化的规模如何影响 带有/没有Batch Norm的模型,为什么?\n", | |||
"\n", | |||
"## Answer:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Batch normalization and batch size\n", | |||
"\n", | |||
"我们将进行一个小实验来研究Batch Norm和batch size之间的相互关系。\n", | |||
"\n", | |||
"下面的代码将使用不同的batch size来训练带有/没有Batch Norm的6层网络。\n", | |||
"然后将绘制随时间变化的训练准确率和验证集的准确率。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore-input" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"def run_batchsize_experiments(normalization_mode):\n", | |||
" np.random.seed(231)\n", | |||
" # Try training a very deep net with batchnorm\n", | |||
" hidden_dims = [100, 100, 100, 100, 100]\n", | |||
" num_train = 1000\n", | |||
" small_data = {\n", | |||
" 'X_train': data['X_train'][:num_train],\n", | |||
" 'y_train': data['y_train'][:num_train],\n", | |||
" 'X_val': data['X_val'],\n", | |||
" 'y_val': data['y_val'],\n", | |||
" }\n", | |||
" n_epochs=10\n", | |||
" weight_scale = 2e-2\n", | |||
" batch_sizes = [5,10,50]\n", | |||
" lr = 10**(-3.5)\n", | |||
" solver_bsize = batch_sizes[0]\n", | |||
"\n", | |||
" print('No normalization: batch size = ',solver_bsize)\n", | |||
" model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=None)\n", | |||
" solver = Solver(model, small_data,\n", | |||
" num_epochs=n_epochs, batch_size=solver_bsize,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': lr,\n", | |||
" },\n", | |||
" verbose=False)\n", | |||
" solver.train()\n", | |||
" \n", | |||
" bn_solvers = []\n", | |||
" for i in range(len(batch_sizes)):\n", | |||
" b_size=batch_sizes[i]\n", | |||
" print('Normalization: batch size = ',b_size)\n", | |||
" bn_model = FullyConnectedNet(hidden_dims, weight_scale=weight_scale, normalization=normalization_mode)\n", | |||
" bn_solver = Solver(bn_model, small_data,\n", | |||
" num_epochs=n_epochs, batch_size=b_size,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': lr,\n", | |||
" },\n", | |||
" verbose=False)\n", | |||
" bn_solver.train()\n", | |||
" bn_solvers.append(bn_solver)\n", | |||
" \n", | |||
" return bn_solvers, solver, batch_sizes\n", | |||
"\n", | |||
"batch_sizes = [5,10,50]\n", | |||
"bn_solvers_bsize, solver_bsize, batch_sizes = run_batchsize_experiments('batchnorm')" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"plt.subplot(2, 1, 1)\n", | |||
"plot_training_history('Training accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n", | |||
" lambda x: x.train_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n", | |||
"plt.subplot(2, 1, 2)\n", | |||
"plot_training_history('Validation accuracy (Batch Normalization)','Epoch', solver_bsize, bn_solvers_bsize, \\\n", | |||
" lambda x: x.val_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n", | |||
"\n", | |||
"plt.gcf().set_size_inches(15, 10)\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## Inline Question 2:\n", | |||
"描述一下这个实验的结果。请问Batch Norm和batch size之间的又什么关系?为什么会出现这种关系?\n", | |||
"\n", | |||
"## Answer:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Layer Normalization\n", | |||
"\n", | |||
"(这里大概讲的是batch norm受限于batch size的取值,但是受限于硬件资源,batch size不能取太大,所以提出了layer norm,对一个样本的特征向量进行归一化,均值和方差由该样本的特征向量的所有元素算出来,具体的自己看英文和论文。)\n", | |||
"\n", | |||
"Batch normalization has proved to be effective in making networks easier to train, but the dependency on batch size makes it less useful in complex networks which have a cap on the input batch size due to hardware limitations. \n", | |||
"\n", | |||
"Several alternatives to batch normalization have been proposed to mitigate this problem; one such technique is Layer Normalization [2]. Instead of normalizing over the batch, we normalize over the features. In other words, when using Layer Normalization, each feature vector corresponding to a single datapoint is normalized based on the sum of all terms within that feature vector.\n", | |||
"\n", | |||
"[2] [Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. \"Layer Normalization.\" stat 1050 (2016): 21.](https://arxiv.org/pdf/1607.06450.pdf)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## Inline Question 3:\n", | |||
"\n", | |||
"下面的数据预处理步骤中,哪些类似于Batch Norm,哪些类似于Layer Norm?\n", | |||
"\n", | |||
"1. Scaling each image in the dataset, so that the RGB channels for each row of pixels within an image sums up to 1.\n", | |||
"2. Scaling each image in the dataset, so that the RGB channels for all pixels within an image sums up to 1. \n", | |||
"3. Subtracting the mean image of the dataset from each image in the dataset.\n", | |||
"4. Setting all RGB values to either 0 or 1 depending on a given threshold.\n", | |||
"\n", | |||
"## Answer:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Layer Normalization: Implementation\n", | |||
"\n", | |||
"现在你要实现layer normalization。这步应该相对简单,因为在概念上,layer norm的实现几乎与batch norm一样。不过一个重要的区别是,对于layer norm,我们使用moments,并且测试阶段与训练阶段是相同的,每个数据样本直接计算平均值和方差。\n", | |||
"\n", | |||
"你要完成下面的工作\n", | |||
"\n", | |||
"* 实现 `daseCV/layers.py` 中的`layernorm_forward`。 \n", | |||
"\n", | |||
"运行下面第一个cell检查你的结果\n", | |||
"\n", | |||
"* 实现 `daseCV/layers.py` 中的`layernorm_backward`。\n", | |||
"运行下面第二个cell检查你的结果\n", | |||
"\n", | |||
"* 修改 `daseCV/classifiers/fc_net.py`,在`FullyConnectedNet`上增加layer normalization。当构造函数中的`normalization`标记为`\"layernorm\"`时,你应该在每个ReLU层前插入layer normalization层。\n", | |||
"\n", | |||
"运行下面第三个cell进行关于在layer normalization上的batch size的实验。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Check the training-time forward pass by checking means and variances\n", | |||
"# of features both before and after layer normalization \n", | |||
"\n", | |||
"# Simulate the forward pass for a two-layer network\n", | |||
"np.random.seed(231)\n", | |||
"N, D1, D2, D3 =4, 50, 60, 3\n", | |||
"X = np.random.randn(N, D1)\n", | |||
"W1 = np.random.randn(D1, D2)\n", | |||
"W2 = np.random.randn(D2, D3)\n", | |||
"a = np.maximum(0, X.dot(W1)).dot(W2)\n", | |||
"\n", | |||
"print('Before layer normalization:')\n", | |||
"print_mean_std(a,axis=1)\n", | |||
"\n", | |||
"gamma = np.ones(D3)\n", | |||
"beta = np.zeros(D3)\n", | |||
"# Means should be close to zero and stds close to one\n", | |||
"print('After layer normalization (gamma=1, beta=0)')\n", | |||
"a_norm, _ = layernorm_forward(a, gamma, beta, {'mode': 'train'})\n", | |||
"print_mean_std(a_norm,axis=1)\n", | |||
"\n", | |||
"gamma = np.asarray([3.0,3.0,3.0])\n", | |||
"beta = np.asarray([5.0,5.0,5.0])\n", | |||
"# Now means should be close to beta and stds close to gamma\n", | |||
"print('After layer normalization (gamma=', gamma, ', beta=', beta, ')')\n", | |||
"a_norm, _ = layernorm_forward(a, gamma, beta, {'mode': 'train'})\n", | |||
"print_mean_std(a_norm,axis=1)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Gradient check batchnorm backward pass\n", | |||
"np.random.seed(231)\n", | |||
"N, D = 4, 5\n", | |||
"x = 5 * np.random.randn(N, D) + 12\n", | |||
"gamma = np.random.randn(D)\n", | |||
"beta = np.random.randn(D)\n", | |||
"dout = np.random.randn(N, D)\n", | |||
"\n", | |||
"ln_param = {}\n", | |||
"fx = lambda x: layernorm_forward(x, gamma, beta, ln_param)[0]\n", | |||
"fg = lambda a: layernorm_forward(x, a, beta, ln_param)[0]\n", | |||
"fb = lambda b: layernorm_forward(x, gamma, b, ln_param)[0]\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n", | |||
"da_num = eval_numerical_gradient_array(fg, gamma.copy(), dout)\n", | |||
"db_num = eval_numerical_gradient_array(fb, beta.copy(), dout)\n", | |||
"\n", | |||
"_, cache = layernorm_forward(x, gamma, beta, ln_param)\n", | |||
"dx, dgamma, dbeta = layernorm_backward(dout, cache)\n", | |||
"\n", | |||
"#You should expect to see relative errors between 1e-12 and 1e-8\n", | |||
"print('dx error: ', rel_error(dx_num, dx))\n", | |||
"print('dgamma error: ', rel_error(da_num, dgamma))\n", | |||
"print('dbeta error: ', rel_error(db_num, dbeta))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Layer Normalization and batch size\n", | |||
"\n", | |||
"我们将使用layer norm来进行前面的batch size实验。与之前的实验相比,batch size对训练精度的影响要小得多!" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"ln_solvers_bsize, solver_bsize, batch_sizes = run_batchsize_experiments('layernorm')\n", | |||
"\n", | |||
"plt.subplot(2, 1, 1)\n", | |||
"plot_training_history('Training accuracy (Layer Normalization)','Epoch', solver_bsize, ln_solvers_bsize, \\\n", | |||
" lambda x: x.train_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n", | |||
"plt.subplot(2, 1, 2)\n", | |||
"plot_training_history('Validation accuracy (Layer Normalization)','Epoch', solver_bsize, ln_solvers_bsize, \\\n", | |||
" lambda x: x.val_acc_history, bl_marker='-^', bn_marker='-o', labels=batch_sizes)\n", | |||
"\n", | |||
"plt.gcf().set_size_inches(15, 10)\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## Inline Question 4:\n", | |||
"什么时候layer normalization可能不工作(不起作用),为什么?\n", | |||
"\n", | |||
"1. 在非常深的网络上使用\n", | |||
"2. 特征的维度非常的小\n", | |||
"3. 有非常高的正则化项\n", | |||
"\n", | |||
"\n", | |||
"## Answer:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"---\n", | |||
"# 重要\n", | |||
"\n", | |||
"这里是作业的结尾处,请执行以下步骤:\n", | |||
"\n", | |||
"1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n", | |||
"2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"import os\n", | |||
"\n", | |||
"FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n", | |||
"FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n", | |||
"\n", | |||
"for files in FILES_TO_SAVE:\n", | |||
" with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n", | |||
" f.write(''.join(open(files).readlines()))" | |||
] | |||
} | |||
], | |||
"metadata": { | |||
"kernelspec": { | |||
"display_name": "Python 3", | |||
"language": "python", | |||
"name": "python3" | |||
}, | |||
"language_info": { | |||
"codemirror_mode": { | |||
"name": "ipython", | |||
"version": 3 | |||
}, | |||
"file_extension": ".py", | |||
"mimetype": "text/x-python", | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.7.0" | |||
} | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 4 | |||
} |
@ -0,0 +1,972 @@ | |||
{ | |||
"cells": [ | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"from google.colab import drive\n", | |||
"\n", | |||
"drive.mount('/content/drive', force_remount=True)\n", | |||
"\n", | |||
"# 输入daseCV所在的路径\n", | |||
"# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n", | |||
"# 例如 'CV/assignments/assignment1/daseCV/'\n", | |||
"FOLDERNAME = None\n", | |||
"\n", | |||
"assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n", | |||
"\n", | |||
"%cd drive/My\\ Drive\n", | |||
"%cp -r $FOLDERNAME ../../\n", | |||
"%cd ../../\n", | |||
"%cd daseCV/datasets/\n", | |||
"!bash get_datasets.sh\n", | |||
"%cd ../../" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-title" | |||
] | |||
}, | |||
"source": [ | |||
"# 卷积网络\n", | |||
"到目前为止,我们已经成功使用深层全连接网络,并使用它们来探索不同的优化策略和网络结构。全连接网络是很好的实验平台,因为它们的计算效率很高,但实际上,所有最新结果都使用卷积网络。\n", | |||
"\n", | |||
"首先,你将实现几个在卷积网络中使用的层类型。然后,您将使用这些层在CIFAR-10数据集上训练卷积网络。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# As usual, a bit of setup\n", | |||
"import numpy as np\n", | |||
"import matplotlib.pyplot as plt\n", | |||
"from daseCV.classifiers.cnn import *\n", | |||
"from daseCV.data_utils import get_CIFAR10_data\n", | |||
"from daseCV.gradient_check import eval_numerical_gradient_array, eval_numerical_gradient\n", | |||
"from daseCV.layers import *\n", | |||
"from daseCV.fast_layers import *\n", | |||
"from daseCV.solver import Solver\n", | |||
"\n", | |||
"%matplotlib inline\n", | |||
"plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", | |||
"plt.rcParams['image.interpolation'] = 'nearest'\n", | |||
"plt.rcParams['image.cmap'] = 'gray'\n", | |||
"\n", | |||
"# for auto-reloading external modules\n", | |||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", | |||
"%load_ext autoreload\n", | |||
"%autoreload 2\n", | |||
"\n", | |||
"def rel_error(x, y):\n", | |||
" \"\"\" returns relative error \"\"\"\n", | |||
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# Load the (preprocessed) CIFAR10 data.\n", | |||
"\n", | |||
"data = get_CIFAR10_data()\n", | |||
"for k, v in data.items():\n", | |||
" print('%s: ' % k, v.shape)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 卷积:简单的正向传播\n", | |||
"卷积网络的核心是卷积运算。在文件 `daseCV/layers.py` 中的函数`conv_forward_naive`里实现卷积层的正向传播。\n", | |||
"\n", | |||
"此时,你不必太担心效率。只需以你最清楚的方式编写代码即可。\n", | |||
"\n", | |||
"您可以通过运行以下cell来测试你的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"x_shape = (2, 3, 4, 4)\n", | |||
"w_shape = (3, 3, 4, 4)\n", | |||
"x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)\n", | |||
"w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)\n", | |||
"b = np.linspace(-0.1, 0.2, num=3)\n", | |||
"\n", | |||
"conv_param = {'stride': 2, 'pad': 1}\n", | |||
"out, _ = conv_forward_naive(x, w, b, conv_param)\n", | |||
"correct_out = np.array([[[[-0.08759809, -0.10987781],\n", | |||
" [-0.18387192, -0.2109216 ]],\n", | |||
" [[ 0.21027089, 0.21661097],\n", | |||
" [ 0.22847626, 0.23004637]],\n", | |||
" [[ 0.50813986, 0.54309974],\n", | |||
" [ 0.64082444, 0.67101435]]],\n", | |||
" [[[-0.98053589, -1.03143541],\n", | |||
" [-1.19128892, -1.24695841]],\n", | |||
" [[ 0.69108355, 0.66880383],\n", | |||
" [ 0.59480972, 0.56776003]],\n", | |||
" [[ 2.36270298, 2.36904306],\n", | |||
" [ 2.38090835, 2.38247847]]]])\n", | |||
"\n", | |||
"# Compare your output to ours; difference should be around e-8\n", | |||
"print('Testing conv_forward_naive')\n", | |||
"print('difference: ', rel_error(out, correct_out))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 补充:通过卷积对进行图像处理\n", | |||
"\n", | |||
"为了检查你的代码以及更好的理解卷积层可以实现的操作类型,我们将设置一个包含两个图像的输入,并手动设置执行常见图像处理操作(灰度转换和边缘检测)的滤镜。卷积的正向传播会将这些操作应用于每个输入图像。然后,我们可以将结果可视化以此检查准确性。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore-input" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"from imageio import imread\n", | |||
"from PIL import Image\n", | |||
"\n", | |||
"kitten = imread('notebook_images/kitten.jpg')\n", | |||
"puppy = imread('notebook_images/puppy.jpg')\n", | |||
"# kitten is wide, and puppy is already square\n", | |||
"d = kitten.shape[1] - kitten.shape[0]\n", | |||
"kitten_cropped = kitten[:, d//2:-d//2, :]\n", | |||
"\n", | |||
"img_size = 200 # Make this smaller if it runs too slow\n", | |||
"resized_puppy = np.array(Image.fromarray(puppy).resize((img_size, img_size)))\n", | |||
"resized_kitten = np.array(Image.fromarray(kitten_cropped).resize((img_size, img_size)))\n", | |||
"x = np.zeros((2, 3, img_size, img_size))\n", | |||
"x[0, :, :, :] = resized_puppy.transpose((2, 0, 1))\n", | |||
"x[1, :, :, :] = resized_kitten.transpose((2, 0, 1))\n", | |||
"\n", | |||
"# Set up a convolutional weights holding 2 filters, each 3x3\n", | |||
"w = np.zeros((2, 3, 3, 3))\n", | |||
"\n", | |||
"# The first filter converts the image to grayscale.\n", | |||
"# Set up the red, green, and blue channels of the filter.\n", | |||
"w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]\n", | |||
"w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]\n", | |||
"w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]\n", | |||
"\n", | |||
"# Second filter detects horizontal edges in the blue channel.\n", | |||
"w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]\n", | |||
"\n", | |||
"# Vector of biases. We don't need any bias for the grayscale\n", | |||
"# filter, but for the edge detection filter we want to add 128\n", | |||
"# to each output so that nothing is negative.\n", | |||
"b = np.array([0, 128])\n", | |||
"\n", | |||
"# Compute the result of convolving each input in x with each filter in w,\n", | |||
"# offsetting by b, and storing the results in out.\n", | |||
"out, _ = conv_forward_naive(x, w, b, {'stride': 1, 'pad': 1})\n", | |||
"\n", | |||
"def imshow_no_ax(img, normalize=True):\n", | |||
" \"\"\" Tiny helper to show images as uint8 and remove axis labels \"\"\"\n", | |||
" if normalize:\n", | |||
" img_max, img_min = np.max(img), np.min(img)\n", | |||
" img = 255.0 * (img - img_min) / (img_max - img_min)\n", | |||
" plt.imshow(img.astype('uint8'))\n", | |||
" plt.gca().axis('off')\n", | |||
"\n", | |||
"# Show the original images and the results of the conv operation\n", | |||
"plt.subplot(2, 3, 1)\n", | |||
"imshow_no_ax(puppy, normalize=False)\n", | |||
"plt.title('Original image')\n", | |||
"plt.subplot(2, 3, 2)\n", | |||
"imshow_no_ax(out[0, 0])\n", | |||
"plt.title('Grayscale')\n", | |||
"plt.subplot(2, 3, 3)\n", | |||
"imshow_no_ax(out[0, 1])\n", | |||
"plt.title('Edges')\n", | |||
"plt.subplot(2, 3, 4)\n", | |||
"imshow_no_ax(kitten_cropped, normalize=False)\n", | |||
"plt.subplot(2, 3, 5)\n", | |||
"imshow_no_ax(out[1, 0])\n", | |||
"plt.subplot(2, 3, 6)\n", | |||
"imshow_no_ax(out[1, 1])\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 卷积:简单的反向传播\n", | |||
"在文件`daseCV/layers.py`的`conv_backward_naive`函数中实现卷积操作的反向传播。同样,你不必太担心计算效率。\n", | |||
"\n", | |||
"完成后,运行以下cell来检查你的反向传播的正确性。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(4, 3, 5, 5)\n", | |||
"w = np.random.randn(2, 3, 3, 3)\n", | |||
"b = np.random.randn(2,)\n", | |||
"dout = np.random.randn(4, 2, 5, 5)\n", | |||
"conv_param = {'stride': 1, 'pad': 1}\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)\n", | |||
"dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)\n", | |||
"db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)\n", | |||
"\n", | |||
"out, cache = conv_forward_naive(x, w, b, conv_param)\n", | |||
"dx, dw, db = conv_backward_naive(dout, cache)\n", | |||
"\n", | |||
"# Your errors should be around e-8 or less.\n", | |||
"print('Testing conv_backward_naive function')\n", | |||
"print('dx error: ', rel_error(dx, dx_num))\n", | |||
"print('dw error: ', rel_error(dw, dw_num))\n", | |||
"print('db error: ', rel_error(db, db_num))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 最大池化: 简单的正向传播\n", | |||
"在文件`daseCV/layers.py`中的`max_pool_forward_naive`函数里实现最大池化操作的正向传播。同样,不必太担心计算效率。\n", | |||
"\n", | |||
"通过运行以下cell检查你的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"x_shape = (2, 3, 4, 4)\n", | |||
"x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)\n", | |||
"pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}\n", | |||
"\n", | |||
"out, _ = max_pool_forward_naive(x, pool_param)\n", | |||
"\n", | |||
"correct_out = np.array([[[[-0.26315789, -0.24842105],\n", | |||
" [-0.20421053, -0.18947368]],\n", | |||
" [[-0.14526316, -0.13052632],\n", | |||
" [-0.08631579, -0.07157895]],\n", | |||
" [[-0.02736842, -0.01263158],\n", | |||
" [ 0.03157895, 0.04631579]]],\n", | |||
" [[[ 0.09052632, 0.10526316],\n", | |||
" [ 0.14947368, 0.16421053]],\n", | |||
" [[ 0.20842105, 0.22315789],\n", | |||
" [ 0.26736842, 0.28210526]],\n", | |||
" [[ 0.32631579, 0.34105263],\n", | |||
" [ 0.38526316, 0.4 ]]]])\n", | |||
"\n", | |||
"# Compare your output with ours. Difference should be on the order of e-8.\n", | |||
"print('Testing max_pool_forward_naive function:')\n", | |||
"print('difference: ', rel_error(out, correct_out))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 最大池化: 简单的反向传播\n", | |||
"在文件`daseCV/layers.py`中的`max_pool_backward_naive`函数里实现最大池化操作的反向传播。同样,不必太担心计算效率。\n", | |||
"\n", | |||
"通过运行以下cell检查你的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(3, 2, 8, 8)\n", | |||
"dout = np.random.randn(3, 2, 4, 4)\n", | |||
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)\n", | |||
"\n", | |||
"out, cache = max_pool_forward_naive(x, pool_param)\n", | |||
"dx = max_pool_backward_naive(dout, cache)\n", | |||
"\n", | |||
"# Your error should be on the order of e-12\n", | |||
"print('Testing max_pool_backward_naive function:')\n", | |||
"print('dx error: ', rel_error(dx, dx_num))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Fast layers\n", | |||
"让卷积和池化层更快可能有点难度。为了减轻你的痛苦,我们在文件`daseCV/fast_layers.py`中为卷积和池化层提供了正向和反向传播的快速版本。\n", | |||
"\n", | |||
"快速卷积的实现依赖于Cython扩展。要编译它,你需要在`daseCV`目录中运行以下命令:\n", | |||
"\n", | |||
"```bash\n", | |||
"python setup.py build_ext --inplace\n", | |||
"```\n", | |||
"\n", | |||
"卷积和池化层的快速版本的API与你在之前实现的完全相同:正向传播接收数据、权重和参数,并产生输出和缓存对象;反向传播接收返回的导数和缓存对象,并针对数据和权重生成梯度。\n", | |||
"\n", | |||
"**提示:** 只有当池化区域不重叠并对输入进行平铺时,池化的快速实现才能表现出最好的性能。如果不满足这些条件,那么快速池化将不会比原来的的实现快很多。\n", | |||
"\n", | |||
"您可以通过运行以下代码和之前的版本之间进行性能的比较:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"scrolled": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# Rel errors should be around e-9 or less\n", | |||
"from daseCV.fast_layers import conv_forward_fast, conv_backward_fast\n", | |||
"from time import time\n", | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(100, 3, 31, 31)\n", | |||
"w = np.random.randn(25, 3, 3, 3)\n", | |||
"b = np.random.randn(25,)\n", | |||
"dout = np.random.randn(100, 25, 16, 16)\n", | |||
"conv_param = {'stride': 2, 'pad': 1}\n", | |||
"\n", | |||
"t0 = time()\n", | |||
"out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)\n", | |||
"t1 = time()\n", | |||
"out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)\n", | |||
"t2 = time()\n", | |||
"\n", | |||
"print('Testing conv_forward_fast:')\n", | |||
"print('Naive: %fs' % (t1 - t0))\n", | |||
"print('Fast: %fs' % (t2 - t1))\n", | |||
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", | |||
"print('Difference: ', rel_error(out_naive, out_fast))\n", | |||
"\n", | |||
"t0 = time()\n", | |||
"dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)\n", | |||
"t1 = time()\n", | |||
"dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)\n", | |||
"t2 = time()\n", | |||
"\n", | |||
"print('\\nTesting conv_backward_fast:')\n", | |||
"print('Naive: %fs' % (t1 - t0))\n", | |||
"print('Fast: %fs' % (t2 - t1))\n", | |||
"print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", | |||
"print('dx difference: ', rel_error(dx_naive, dx_fast))\n", | |||
"print('dw difference: ', rel_error(dw_naive, dw_fast))\n", | |||
"print('db difference: ', rel_error(db_naive, db_fast))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Relative errors should be close to 0.0\n", | |||
"from daseCV.fast_layers import max_pool_forward_fast, max_pool_backward_fast\n", | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(100, 3, 32, 32)\n", | |||
"dout = np.random.randn(100, 3, 16, 16)\n", | |||
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n", | |||
"\n", | |||
"t0 = time()\n", | |||
"out_naive, cache_naive = max_pool_forward_naive(x, pool_param)\n", | |||
"t1 = time()\n", | |||
"out_fast, cache_fast = max_pool_forward_fast(x, pool_param)\n", | |||
"t2 = time()\n", | |||
"\n", | |||
"print('Testing pool_forward_fast:')\n", | |||
"print('Naive: %fs' % (t1 - t0))\n", | |||
"print('fast: %fs' % (t2 - t1))\n", | |||
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", | |||
"print('difference: ', rel_error(out_naive, out_fast))\n", | |||
"\n", | |||
"t0 = time()\n", | |||
"dx_naive = max_pool_backward_naive(dout, cache_naive)\n", | |||
"t1 = time()\n", | |||
"dx_fast = max_pool_backward_fast(dout, cache_fast)\n", | |||
"t2 = time()\n", | |||
"\n", | |||
"print('\\nTesting pool_backward_fast:')\n", | |||
"print('Naive: %fs' % (t1 - t0))\n", | |||
"print('fast: %fs' % (t2 - t1))\n", | |||
"print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))\n", | |||
"print('dx difference: ', rel_error(dx_naive, dx_fast))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 卷积 \"sandwich\" 层\n", | |||
"之前,我们引入了“sandwich”层的概念,该层将多种操作组合成常用的模式。在文件`daseCV/layer_utils.py`中,您会找到一些实现卷积网络常用模式的sandwich层。运行下面的cell以检查它们是否正常工作。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"from daseCV.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward\n", | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(2, 3, 16, 16)\n", | |||
"w = np.random.randn(3, 3, 3, 3)\n", | |||
"b = np.random.randn(3,)\n", | |||
"dout = np.random.randn(2, 3, 8, 8)\n", | |||
"conv_param = {'stride': 1, 'pad': 1}\n", | |||
"pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}\n", | |||
"\n", | |||
"out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)\n", | |||
"dx, dw, db = conv_relu_pool_backward(dout, cache)\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)\n", | |||
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)\n", | |||
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)\n", | |||
"\n", | |||
"# Relative errors should be around e-8 or less\n", | |||
"print('Testing conv_relu_pool')\n", | |||
"print('dx error: ', rel_error(dx_num, dx))\n", | |||
"print('dw error: ', rel_error(dw_num, dw))\n", | |||
"print('db error: ', rel_error(db_num, db))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"from daseCV.layer_utils import conv_relu_forward, conv_relu_backward\n", | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(2, 3, 8, 8)\n", | |||
"w = np.random.randn(3, 3, 3, 3)\n", | |||
"b = np.random.randn(3,)\n", | |||
"dout = np.random.randn(2, 3, 8, 8)\n", | |||
"conv_param = {'stride': 1, 'pad': 1}\n", | |||
"\n", | |||
"out, cache = conv_relu_forward(x, w, b, conv_param)\n", | |||
"dx, dw, db = conv_relu_backward(dout, cache)\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)\n", | |||
"dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)\n", | |||
"db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)\n", | |||
"\n", | |||
"# Relative errors should be around e-8 or less\n", | |||
"print('Testing conv_relu:')\n", | |||
"print('dx error: ', rel_error(dx_num, dx))\n", | |||
"print('dw error: ', rel_error(dw_num, dw))\n", | |||
"print('db error: ', rel_error(db_num, db))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 三层卷积网络\n", | |||
"现在,你已经实现了所有必需的层,我们可以将它们组合成一个简单的卷积网络。\n", | |||
"\n", | |||
"打开文件`daseCV/classifiers/cnn.py`,并完成`ThreeLayerConvNet`类。请记住,您可以使用fast/sandwich层(以及提供给你)。运行以下cell以帮助你调试:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 检查loss\n", | |||
"建立新网络后,您应该做的第一件事就是检查损失。当我们使用softmax损失时,对于`C`个类别我们期望随机权重的损失(没有正则化)大约为`log(C)`。当我们添加正则化时,损失应该会略有增加。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"model = ThreeLayerConvNet()\n", | |||
"\n", | |||
"N = 50\n", | |||
"X = np.random.randn(N, 3, 32, 32)\n", | |||
"y = np.random.randint(10, size=N)\n", | |||
"\n", | |||
"loss, grads = model.loss(X, y)\n", | |||
"print('Initial loss (no regularization): ', loss)\n", | |||
"\n", | |||
"model.reg = 0.5\n", | |||
"loss, grads = model.loss(X, y)\n", | |||
"print('Initial loss (with regularization): ', loss)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 梯度检查\n", | |||
"在损失看起来合理之后,请使用数值梯度检查来确保您的反向传播是正确的。使用数值梯度检查时,应在每一层使用少量的人工数据和少量的神经元。注意:正确的实现可能仍然会出现相对误差,最高可达e-2。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"num_inputs = 2\n", | |||
"input_dim = (3, 16, 16)\n", | |||
"reg = 0.0\n", | |||
"num_classes = 10\n", | |||
"np.random.seed(231)\n", | |||
"X = np.random.randn(num_inputs, *input_dim)\n", | |||
"y = np.random.randint(num_classes, size=num_inputs)\n", | |||
"\n", | |||
"model = ThreeLayerConvNet(num_filters=3, filter_size=3,\n", | |||
" input_dim=input_dim, hidden_dim=7,\n", | |||
" dtype=np.float64)\n", | |||
"loss, grads = model.loss(X, y)\n", | |||
"# Errors should be small, but correct implementations may have\n", | |||
"# relative errors up to the order of e-2\n", | |||
"for param_name in sorted(grads):\n", | |||
" f = lambda _: model.loss(X, y)[0]\n", | |||
" param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)\n", | |||
" e = rel_error(param_grad_num, grads[param_name])\n", | |||
" print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 小样本的过拟合\n", | |||
"一个不错的技巧是仅用少量训练样本来训练模型。您应该能够过度拟合较小的数据集,这将得到非常高的训练准确度和相对较低的验证准确度。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"\n", | |||
"num_train = 100\n", | |||
"small_data = {\n", | |||
" 'X_train': data['X_train'][:num_train],\n", | |||
" 'y_train': data['y_train'][:num_train],\n", | |||
" 'X_val': data['X_val'],\n", | |||
" 'y_val': data['y_val'],\n", | |||
"}\n", | |||
"\n", | |||
"model = ThreeLayerConvNet(weight_scale=1e-2)\n", | |||
"\n", | |||
"solver = Solver(model, small_data,\n", | |||
" num_epochs=15, batch_size=50,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 1e-3,\n", | |||
" },\n", | |||
" verbose=True, print_every=1)\n", | |||
"solver.train()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"plt.subplot(2, 1, 1)\n", | |||
"plt.plot(solver.loss_history, 'o')\n", | |||
"plt.xlabel('iteration')\n", | |||
"plt.ylabel('loss')\n", | |||
"\n", | |||
"plt.subplot(2, 1, 2)\n", | |||
"plt.plot(solver.train_acc_history, '-o')\n", | |||
"plt.plot(solver.val_acc_history, '-o')\n", | |||
"plt.legend(['train', 'val'], loc='upper left')\n", | |||
"plt.xlabel('epoch')\n", | |||
"plt.ylabel('accuracy')\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 训练网络\n", | |||
"将三层卷积网络训练一个epoch,在训练集上将达到40%以上的准确度:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)\n", | |||
"\n", | |||
"solver = Solver(model, data,\n", | |||
" num_epochs=1, batch_size=50,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 1e-3,\n", | |||
" },\n", | |||
" verbose=True, print_every=20)\n", | |||
"solver.train()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 可视化过滤器\n", | |||
"You can visualize the first-layer convolutional filters from the trained network by running the following:\n", | |||
"您可以通过运行以下命令可视化训练好的第一层卷积过滤器:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"from daseCV.vis_utils import visualize_grid\n", | |||
"\n", | |||
"grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))\n", | |||
"plt.imshow(grid.astype('uint8'))\n", | |||
"plt.axis('off')\n", | |||
"plt.gcf().set_size_inches(5, 5)\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 空间批量归一化\n", | |||
"我们已经看到,对于训练深层的全连接网络来说批量归一化是非常有用的技术。如论文(`BatchNormalization.ipynb`中的链接)中所建议的,批处理归一化也可以用于卷积网络,但是我们需要对其进行一些调整,该修改将称为“空间批量归一化”。\n", | |||
"\n", | |||
"通常,当我们对维数为`N`的最小批进行批归一化时接受的形状为 `(N, D)`的输入,之后生成形状为`(N, D)`的输出。对于来自卷积层的数据,批归一化需要接受形状为`(N, C, H, W)`的输入,并产生形状为`(N, C, H, W)`的输出,其中`N`维度为最小批大小而 `(H, W)` 维度是特征图的大小。\n", | |||
"\n", | |||
"如果特征图是使用卷积生成的,那么我们期望每个特征通道的两个不同图像以及同一图像内不同位置之间的统计信息例如均值、方差相对一致。毕竟每个特征通道都是由相同的卷积滤波器产生的!因此,空间批量归一化通过计算最小批维度`N`以及空间维度 `H` 和`W`的统计信息,为每个 `C`特征通道计算均值和方差。\n", | |||
"\n", | |||
"[1] [Sergey Ioffe and Christian Szegedy, \"Batch Normalization: Accelerating Deep Network Training by Reducing\n", | |||
"Internal Covariate Shift\", ICML 2015.](https://arxiv.org/abs/1502.03167)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 空间批量归一化:正向传播\n", | |||
"\n", | |||
"在文件 `daseCV/layers.py`中的`spatial_batchnorm_forward`函数里实现空间批归一化的正向传播。通过运行以下命令检查您的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"# Check the training-time forward pass by checking means and variances\n", | |||
"# of features both before and after spatial batch normalization\n", | |||
"\n", | |||
"N, C, H, W = 2, 3, 4, 5\n", | |||
"x = 4 * np.random.randn(N, C, H, W) + 10\n", | |||
"\n", | |||
"print('Before spatial batch normalization:')\n", | |||
"print(' Shape: ', x.shape)\n", | |||
"print(' Means: ', x.mean(axis=(0, 2, 3)))\n", | |||
"print(' Stds: ', x.std(axis=(0, 2, 3)))\n", | |||
"\n", | |||
"# Means should be close to zero and stds close to one\n", | |||
"gamma, beta = np.ones(C), np.zeros(C)\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"print('After spatial batch normalization:')\n", | |||
"print(' Shape: ', out.shape)\n", | |||
"print(' Means: ', out.mean(axis=(0, 2, 3)))\n", | |||
"print(' Stds: ', out.std(axis=(0, 2, 3)))\n", | |||
"\n", | |||
"# Means should be close to beta and stds close to gamma\n", | |||
"gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8])\n", | |||
"out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"print('After spatial batch normalization (nontrivial gamma, beta):')\n", | |||
"print(' Shape: ', out.shape)\n", | |||
"print(' Means: ', out.mean(axis=(0, 2, 3)))\n", | |||
"print(' Stds: ', out.std(axis=(0, 2, 3)))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"# Check the test-time forward pass by running the training-time\n", | |||
"# forward pass many times to warm up the running averages, and then\n", | |||
"# checking the means and variances of activations after a test-time\n", | |||
"# forward pass.\n", | |||
"N, C, H, W = 10, 4, 11, 12\n", | |||
"\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"gamma = np.ones(C)\n", | |||
"beta = np.zeros(C)\n", | |||
"for t in range(50):\n", | |||
" x = 2.3 * np.random.randn(N, C, H, W) + 13\n", | |||
" spatial_batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"bn_param['mode'] = 'test'\n", | |||
"x = 2.3 * np.random.randn(N, C, H, W) + 13\n", | |||
"a_norm, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"\n", | |||
"# Means should be close to zero and stds close to one, but will be\n", | |||
"# noisier than training-time forward passes.\n", | |||
"print('After spatial batch normalization (test-time):')\n", | |||
"print(' means: ', a_norm.mean(axis=(0, 2, 3)))\n", | |||
"print(' stds: ', a_norm.std(axis=(0, 2, 3)))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 空间批量归一化:反向传播\n", | |||
"在文件`daseCV/layers.py`中的函数`spatial_batchnorm_backward`里实现空间批量归一化的反向传播。运行以下命令以检查您的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"N, C, H, W = 2, 3, 4, 5\n", | |||
"x = 5 * np.random.randn(N, C, H, W) + 12\n", | |||
"gamma = np.random.randn(C)\n", | |||
"beta = np.random.randn(C)\n", | |||
"dout = np.random.randn(N, C, H, W)\n", | |||
"\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"fx = lambda x: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]\n", | |||
"fg = lambda a: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]\n", | |||
"fb = lambda b: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n", | |||
"da_num = eval_numerical_gradient_array(fg, gamma, dout)\n", | |||
"db_num = eval_numerical_gradient_array(fb, beta, dout)\n", | |||
"\n", | |||
"#You should expect errors of magnitudes between 1e-12~1e-06\n", | |||
"_, cache = spatial_batchnorm_forward(x, gamma, beta, bn_param)\n", | |||
"dx, dgamma, dbeta = spatial_batchnorm_backward(dout, cache)\n", | |||
"print('dx error: ', rel_error(dx_num, dx))\n", | |||
"print('dgamma error: ', rel_error(da_num, dgamma))\n", | |||
"print('dbeta error: ', rel_error(db_num, dbeta))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 组归一化\n", | |||
"在之前的notebook中,我们提到了“层归一化”是一种替代的归一化技术,它减轻了“批归一化”的批大小限制。但是,正如 [2] 的作者所观察到的,当与卷积层一起使用时,层归一化的性能不如批归一化:\n", | |||
"\n", | |||
">With fully connected layers, all the hidden units in a layer tend to make similar contributions to the final prediction, and re-centering and rescaling the summed inputs to a layer works well. However, the assumption of similar contributions is no longer true for convolutional neural networks. The large number of the hidden units whose\n", | |||
"receptive fields lie near the boundary of the image are rarely turned on and thus have very different\n", | |||
"statistics from the rest of the hidden units within the same layer.\n", | |||
"\n", | |||
"[3] 的作者提出了一种中间技术。与“层归一化”相反,在“层归一化”中您对每个数据点的整个特征进行归一化,他们建议将每个数据点一致的特征划分为G组,然后对每个组的每个数据点进行归一化。\n", | |||
"\n", | |||
"![Comparison of normalization techniques discussed so far](notebook_images/normalization.png)\n", | |||
"<center>**Visual comparison of the normalization techniques discussed so far (image edited from [3])**</center>\n", | |||
"\n", | |||
"尽管在每一组中仍然存在贡献相等的假设,但作者假设这不是问题,因为在视觉识别的特征中出现了天生的分组。他们用来说明这一点的一个例子是,在传统的计算机视觉中,许多高性能的传统的特征都有明确分组在一起的术语。以Histogram of Oriented Gradients[4]为例——在计算每个空间局部块的直方图后,对每个块的直方图进行归一化处理,然后拼接在一起形成最终的特征向量。\n", | |||
"\n", | |||
"现在,你将实现组归一化。请注意,你将在以下cell中实现的这种归一化技术是在2018年引入并发布到ECCV的,这是是一个正在进行且激动人心的研究领域!\n", | |||
"\n", | |||
"[2] [Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. \"Layer Normalization.\" stat 1050 (2016): 21.](https://arxiv.org/pdf/1607.06450.pdf)\n", | |||
"\n", | |||
"\n", | |||
"[3] [Wu, Yuxin, and Kaiming He. \"Group Normalization.\" arXiv preprint arXiv:1803.08494 (2018).](https://arxiv.org/abs/1803.08494)\n", | |||
"\n", | |||
"\n", | |||
"[4] [N. Dalal and B. Triggs. Histograms of oriented gradients for\n", | |||
"human detection. In Computer Vision and Pattern Recognition\n", | |||
"(CVPR), 2005.](https://ieeexplore.ieee.org/abstract/document/1467360/)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 组归一化:正向传播\n", | |||
"\n", | |||
"在文件`daseCV/layers.py`中的`spatial_groupnorm_forward`函数里实现组归一化的正向传播。通过运行以下命令检查您的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"# Check the training-time forward pass by checking means and variances\n", | |||
"# of features both before and after spatial batch normalization\n", | |||
"\n", | |||
"N, C, H, W = 2, 6, 4, 5\n", | |||
"G = 2\n", | |||
"x = 4 * np.random.randn(N, C, H, W) + 10\n", | |||
"x_g = x.reshape((N*G,-1))\n", | |||
"print('Before spatial group normalization:')\n", | |||
"print(' Shape: ', x.shape)\n", | |||
"print(' Means: ', x_g.mean(axis=1))\n", | |||
"print(' Stds: ', x_g.std(axis=1))\n", | |||
"\n", | |||
"# Means should be close to zero and stds close to one\n", | |||
"gamma, beta = np.ones((1,C,1,1)), np.zeros((1,C,1,1))\n", | |||
"bn_param = {'mode': 'train'}\n", | |||
"\n", | |||
"out, _ = spatial_groupnorm_forward(x, gamma, beta, G, bn_param)\n", | |||
"out_g = out.reshape((N*G,-1))\n", | |||
"print('After spatial group normalization:')\n", | |||
"print(' Shape: ', out.shape)\n", | |||
"print(' Means: ', out_g.mean(axis=1))\n", | |||
"print(' Stds: ', out_g.std(axis=1))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"## 空间组归一化:反向传播\n", | |||
"在文件 `daseCV/layers.py`中的`spatial_groupnorm_backward`函数里实现空间批量归一化的反向传播。运行以下命令以检查您的代码:" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"N, C, H, W = 2, 6, 4, 5\n", | |||
"G = 2\n", | |||
"x = 5 * np.random.randn(N, C, H, W) + 12\n", | |||
"gamma = np.random.randn(1,C,1,1)\n", | |||
"beta = np.random.randn(1,C,1,1)\n", | |||
"dout = np.random.randn(N, C, H, W)\n", | |||
"\n", | |||
"gn_param = {}\n", | |||
"fx = lambda x: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]\n", | |||
"fg = lambda a: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]\n", | |||
"fb = lambda b: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]\n", | |||
"\n", | |||
"dx_num = eval_numerical_gradient_array(fx, x, dout)\n", | |||
"da_num = eval_numerical_gradient_array(fg, gamma, dout)\n", | |||
"db_num = eval_numerical_gradient_array(fb, beta, dout)\n", | |||
"\n", | |||
"_, cache = spatial_groupnorm_forward(x, gamma, beta, G, gn_param)\n", | |||
"dx, dgamma, dbeta = spatial_groupnorm_backward(dout, cache)\n", | |||
"#You should expect errors of magnitudes between 1e-12~1e-07\n", | |||
"print('dx error: ', rel_error(dx_num, dx))\n", | |||
"print('dgamma error: ', rel_error(da_num, dgamma))\n", | |||
"print('dbeta error: ', rel_error(db_num, dbeta))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"---\n", | |||
"# 重要\n", | |||
"\n", | |||
"这里是作业的结尾处,请执行以下步骤:\n", | |||
"\n", | |||
"1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n", | |||
"2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"import os\n", | |||
"\n", | |||
"FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n", | |||
"FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n", | |||
"\n", | |||
"for files in FILES_TO_SAVE:\n", | |||
" with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n", | |||
" f.write(''.join(open(files).readlines()))" | |||
] | |||
} | |||
], | |||
"metadata": { | |||
"kernelspec": { | |||
"display_name": "Python 3", | |||
"language": "python", | |||
"name": "python3" | |||
}, | |||
"language_info": { | |||
"codemirror_mode": { | |||
"name": "ipython", | |||
"version": 3 | |||
}, | |||
"file_extension": ".py", | |||
"mimetype": "text/x-python", | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.7.0" | |||
} | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 4 | |||
} |
@ -0,0 +1,369 @@ | |||
{ | |||
"cells": [ | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"from google.colab import drive\n", | |||
"\n", | |||
"drive.mount('/content/drive', force_remount=True)\n", | |||
"\n", | |||
"# 输入daseCV所在的路径\n", | |||
"# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n", | |||
"# 例如 'CV/assignments/assignment1/daseCV/'\n", | |||
"FOLDERNAME = None\n", | |||
"\n", | |||
"assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n", | |||
"\n", | |||
"%cd drive/My\\ Drive\n", | |||
"%cp -r $FOLDERNAME ../../\n", | |||
"%cd ../../\n", | |||
"%cd daseCV/datasets/\n", | |||
"!bash get_datasets.sh\n", | |||
"%cd ../../" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-title" | |||
] | |||
}, | |||
"source": [ | |||
"# Dropout\n", | |||
"Dropout [1] 是一种通过在正向传播中将一些输出随机设置为零,神经网络正则化的方法。在这个练习中,你将实现一个dropout层,并修改你的全连接网络使其可选择的使用dropout\n", | |||
"\n", | |||
"[1] [Geoffrey E. Hinton et al, \"Improving neural networks by preventing co-adaptation of feature detectors\", arXiv 2012](https://arxiv.org/abs/1207.0580)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# As usual, a bit of setup\n", | |||
"from __future__ import print_function\n", | |||
"import time\n", | |||
"import numpy as np\n", | |||
"import matplotlib.pyplot as plt\n", | |||
"from daseCV.classifiers.fc_net import *\n", | |||
"from daseCV.data_utils import get_CIFAR10_data\n", | |||
"from daseCV.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n", | |||
"from daseCV.solver import Solver\n", | |||
"\n", | |||
"%matplotlib inline\n", | |||
"plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", | |||
"plt.rcParams['image.interpolation'] = 'nearest'\n", | |||
"plt.rcParams['image.cmap'] = 'gray'\n", | |||
"\n", | |||
"# for auto-reloading external modules\n", | |||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", | |||
"%load_ext autoreload\n", | |||
"%autoreload 2\n", | |||
"\n", | |||
"\n", | |||
"def rel_error(x, y):\n", | |||
" \"\"\" returns relative error \"\"\"\n", | |||
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"tags": [ | |||
"pdf-ignore" | |||
] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# Load the (preprocessed) CIFAR10 data.\n", | |||
"\n", | |||
"data = get_CIFAR10_data()\n", | |||
"for k, v in data.items():\n", | |||
" print('%s: ' % k, v.shape)" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Dropout 正向传播\n", | |||
"在文件 `daseCV/layers.py` 中完成dropout的正向传播过程。由于dropout在训练和测试期间的行为是不同的,因此请确保两种模式下都实现完成。\n", | |||
"\n", | |||
"完成此操作后,运行下面的cell以测试你的代码。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(500, 500) + 10\n", | |||
"\n", | |||
"for p in [0.25, 0.4, 0.7]:\n", | |||
" out, _ = dropout_forward(x, {'mode': 'train', 'p': p})\n", | |||
" out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})\n", | |||
"\n", | |||
" print('Running tests with p = ', p)\n", | |||
" print('Mean of input: ', x.mean())\n", | |||
" print('Mean of train-time output: ', out.mean())\n", | |||
" print('Mean of test-time output: ', out_test.mean())\n", | |||
" print('Fraction of train-time output set to zero: ', (out == 0).mean())\n", | |||
" print('Fraction of test-time output set to zero: ', (out_test == 0).mean())\n", | |||
" print()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# Dropout 反向传播\n", | |||
"在文件 `daseCV/layers.py` 中完成dropout的反向传播。完成之后运行以下cell以对你的实现代码进行梯度检查。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"x = np.random.randn(10, 10) + 10\n", | |||
"dout = np.random.randn(*x.shape)\n", | |||
"\n", | |||
"dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}\n", | |||
"out, cache = dropout_forward(x, dropout_param)\n", | |||
"dx = dropout_backward(dout, cache)\n", | |||
"dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)\n", | |||
"\n", | |||
"# Error should be around e-10 or less\n", | |||
"print('dx relative error: ', rel_error(dx, dx_num))" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## 问题 1:\n", | |||
"如果我们不利用inverted dropout,在训练的时候直接将dropout后的值除以 `p`,会发生什么?为什么会这样呢?\n", | |||
"\n", | |||
"\n", | |||
"\n", | |||
"## 回答:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 全连接网络的Dropout\n", | |||
"\n", | |||
"修改`daseCV/classifiers/fc_net.py`文件完成使用dropout的部分。具体来说,如果网络的构造函数收到的`dropout`参数值不为1,则应在每个ReLU之后添加一个dropout层。完成之后,运行以下命令以对你的代码进行梯度检查。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"np.random.seed(231)\n", | |||
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n", | |||
"X = np.random.randn(N, D)\n", | |||
"y = np.random.randint(C, size=(N,))\n", | |||
"\n", | |||
"for dropout in [1, 0.75, 0.5]:\n", | |||
" print('Running check with dropout = ', dropout)\n", | |||
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n", | |||
" weight_scale=5e-2, dtype=np.float64,\n", | |||
" dropout=dropout, seed=123)\n", | |||
"\n", | |||
" loss, grads = model.loss(X, y)\n", | |||
" print('Initial loss: ', loss)\n", | |||
" \n", | |||
" # Relative errors should be around e-6 or less; Note that it's fine\n", | |||
" # if for dropout=1 you have W2 error be on the order of e-5.\n", | |||
" for name in sorted(grads):\n", | |||
" f = lambda _: model.loss(X, y)[0]\n", | |||
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n", | |||
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n", | |||
" print()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# 正则化实验\n", | |||
"作为实验,我们将在500个样本上训练一对双层网络:一个不使用dropout,另一个使用概率为0.25的dropout。之后,我们将可视化这两个网络训练和验证的准确度。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"scrolled": false | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# Train two identical nets, one with dropout and one without\n", | |||
"np.random.seed(231)\n", | |||
"num_train = 500\n", | |||
"small_data = {\n", | |||
" 'X_train': data['X_train'][:num_train],\n", | |||
" 'y_train': data['y_train'][:num_train],\n", | |||
" 'X_val': data['X_val'],\n", | |||
" 'y_val': data['y_val'],\n", | |||
"}\n", | |||
"\n", | |||
"solvers = {}\n", | |||
"dropout_choices = [1, 0.25]\n", | |||
"for dropout in dropout_choices:\n", | |||
" model = FullyConnectedNet([500], dropout=dropout)\n", | |||
" print(dropout)\n", | |||
"\n", | |||
" solver = Solver(model, small_data,\n", | |||
" num_epochs=25, batch_size=100,\n", | |||
" update_rule='adam',\n", | |||
" optim_config={\n", | |||
" 'learning_rate': 5e-4,\n", | |||
" },\n", | |||
" verbose=True, print_every=100)\n", | |||
" solver.train()\n", | |||
" solvers[dropout] = solver\n", | |||
" print()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# Plot train and validation accuracies of the two models\n", | |||
"\n", | |||
"train_accs = []\n", | |||
"val_accs = []\n", | |||
"for dropout in dropout_choices:\n", | |||
" solver = solvers[dropout]\n", | |||
" train_accs.append(solver.train_acc_history[-1])\n", | |||
" val_accs.append(solver.val_acc_history[-1])\n", | |||
"\n", | |||
"plt.subplot(3, 1, 1)\n", | |||
"for dropout in dropout_choices:\n", | |||
" plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)\n", | |||
"plt.title('Train accuracy')\n", | |||
"plt.xlabel('Epoch')\n", | |||
"plt.ylabel('Accuracy')\n", | |||
"plt.legend(ncol=2, loc='lower right')\n", | |||
" \n", | |||
"plt.subplot(3, 1, 2)\n", | |||
"for dropout in dropout_choices:\n", | |||
" plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)\n", | |||
"plt.title('Val accuracy')\n", | |||
"plt.xlabel('Epoch')\n", | |||
"plt.ylabel('Accuracy')\n", | |||
"plt.legend(ncol=2, loc='lower right')\n", | |||
"\n", | |||
"plt.gcf().set_size_inches(15, 15)\n", | |||
"plt.show()" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## 问题 2:\n", | |||
"对比有无dropout的验证和训练的精度,你对使用dropout作为正则化有何建议?\n", | |||
"\n", | |||
"## 回答:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"tags": [ | |||
"pdf-inline" | |||
] | |||
}, | |||
"source": [ | |||
"## 问题三 3:\n", | |||
"假设我们正在训练一个深层的全连接网络用以进行图像分类,并隐层之后dropout(通过使用概率p进行参数化)。如果我们担心过度拟合而决定减小隐层的大小(即每层中的节点数)时,应该如何修改p(如果有的话)?\n", | |||
"\n", | |||
"## 回答:\n", | |||
"[FILL THIS IN]\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"---\n", | |||
"# 重要\n", | |||
"\n", | |||
"这里是作业的结尾处,请执行以下步骤:\n", | |||
"\n", | |||
"1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n", | |||
"2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"import os\n", | |||
"\n", | |||
"FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n", | |||
"FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n", | |||
"\n", | |||
"for files in FILES_TO_SAVE:\n", | |||
" with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n", | |||
" f.write(''.join(open(files).readlines()))" | |||
] | |||
} | |||
], | |||
"metadata": { | |||
"kernelspec": { | |||
"display_name": "Python 3", | |||
"language": "python", | |||
"name": "python3" | |||
}, | |||
"language_info": { | |||
"codemirror_mode": { | |||
"name": "ipython", | |||
"version": 3 | |||
}, | |||
"file_extension": ".py", | |||
"mimetype": "text/x-python", | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.7.0" | |||
} | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 2 | |||
} |
@ -0,0 +1,48 @@ | |||
#!/bin/bash | |||
#NOTE: DO NOT EDIT THIS FILE-- MAY RESULT IN INCOMPLETE SUBMISSIONS | |||
NOTEBOOKS="FullyConnectedNets.ipynb | |||
BatchNormalization.ipynb | |||
Dropout.ipynb | |||
ConvolutionalNetworks.ipynb | |||
PyTorch.ipynb | |||
TensorFlow.ipynb" | |||
CODE="cs231n/layers.py | |||
cs231n/classifiers/fc_net.py | |||
cs231n/optim.py | |||
cs231n/classifiers/cnn.py" | |||
REMOTE_DIR="cs231n-2019-assignment2" | |||
ZIP_FILENAME="a2.zip" | |||
FILES="${NOTEBOOKS} ${CODE}" | |||
for FILE in ${FILES} | |||
do | |||
if [ ! -f ${FILE} ]; then | |||
echo "Required file ${FILE} not found, Exiting." | |||
exit 0 | |||
fi | |||
done | |||
if [ -d ${REMOTE_DIR} ]; then | |||
rm -r ${REMOTE_DIR} | |||
fi | |||
mkdir -p ${REMOTE_DIR} | |||
cp ${FILES} ${REMOTE_DIR} | |||
echo "### Zipping file ###" | |||
zip -r ${REMOTE_DIR}/${ZIP_FILENAME} . -x "*.git*" "*cs231n/datasets*" "*.ipynb_checkpoints*" "*README.md" "collectSubmission.sh" "*requirements.txt" "*__pycache__*" ".env/*" > assignment_zip.log | |||
echo "" | |||
echo "### Submitting to myth ###" | |||
echo "Type in your Stanford student ID (alphanumeric, *not* the 8-digit ID):" | |||
read -p "Student ID: " SUID | |||
echo "" | |||
echo "### Copying to ${SUID}@myth.stanford.edu:${REMOTE_DIR} ###" | |||
echo "Note: if myth is under heavy use, this may hang: If this happens, rerun the script." | |||
scp -r ${REMOTE_DIR} ${SUID}@myth.stanford.edu:~/ | |||
echo "" | |||
echo "### Running remote submission script from ${SUID}@myth.stanford.edu:${REMOTE_DIR} ###" | |||
ssh ${SUID}@myth.stanford.edu "cd ${REMOTE_DIR} && /afs/ir/class/cs231n/grading/submit_a2 && exit" |
@ -0,0 +1,15 @@ | |||
#!/bin/bash | |||
# what real Python executable to use | |||
#PYVER=2.7 | |||
#PATHTOPYTHON=/usr/local/bin/ | |||
#PYTHON=${PATHTOPYTHON}python${PYVER} | |||
PYTHON=$(which $(readlink .env/bin/python)) # only works with python3 | |||
# find the root of the virtualenv, it should be the parent of the dir this script is in | |||
ENV=`$PYTHON -c "import os; print(os.path.abspath(os.path.join(os.path.dirname(\"$0\"), '..')))"` | |||
# now run Python with the virtualenv set as Python's HOME | |||
export PYTHONHOME=$ENV | |||
exec $PYTHON "$@" |
@ -0,0 +1,73 @@ | |||
absl-py==0.7.1 | |||
astor==0.7.1 | |||
attrs==19.1.0 | |||
backcall==0.1.0 | |||
bleach==3.1.0 | |||
cycler==0.10.0 | |||
Cython==0.29.7 | |||
decorator==4.4.0 | |||
defusedxml==0.6.0 | |||
entrypoints==0.3 | |||
future==0.17.1 | |||
gast==0.2.2 | |||
google-pasta==0.1.5 | |||
grpcio==1.20.0 | |||
h5py==2.9.0 | |||
imageio==2.5.0 | |||
ipykernel==5.1.0 | |||
ipython==7.4.0 | |||
ipython-genutils==0.2.0 | |||
ipywidgets==7.4.2 | |||
jedi==0.13.3 | |||
Jinja2==2.10.1 | |||
jsonschema==3.0.1 | |||
jupyter==1.0.0 | |||
jupyter-client==5.2.4 | |||
jupyter-console==6.0.0 | |||
jupyter-core==4.4.0 | |||
Keras==2.2.4 | |||
Keras-Applications==1.0.7 | |||
Keras-Preprocessing==1.0.9 | |||
kiwisolver==1.0.1 | |||
Markdown==3.1 | |||
MarkupSafe==1.1.1 | |||
matplotlib==3.0.3 | |||
mistune==0.8.4 | |||
nbconvert==5.4.1 | |||
nbformat==4.4.0 | |||
notebook==5.7.8 | |||
numexpr==2.6.9 | |||
numpy==1.16.2 | |||
pandocfilters==1.4.2 | |||
parso==0.4.0 | |||
pexpect==4.7.0 | |||
pickleshare==0.7.5 | |||
Pillow==6.0.0 | |||
prometheus-client==0.6.0 | |||
prompt-toolkit==2.0.9 | |||
protobuf==3.7.1 | |||
ptyprocess==0.6.0 | |||
Pygments==2.3.1 | |||
pyparsing==2.4.0 | |||
pyrsistent==0.14.11 | |||
python-dateutil==2.8.0 | |||
PyYAML==5.1 | |||
pyzmq==18.0.1 | |||
qtconsole==4.4.3 | |||
scipy==1.2.1 | |||
Send2Trash==1.5.0 | |||
six==1.12.0 | |||
# Add this line if you want GPU support for tensorflow! | |||
# tensorflow-gpu==2.0.0a0 | |||
tensorflow==2.0.0a0 | |||
termcolor==1.1.0 | |||
terminado==0.8.2 | |||
testpath==0.4.2 | |||
torch==1.0.1.post2 | |||
torchvision==0.2.2.post3 | |||
tornado==6.0.2 | |||
traitlets==4.3.2 | |||
wcwidth==0.1.7 | |||
webencodings==0.5.1 | |||
Werkzeug==0.15.2 | |||
widgetsnbextension==3.4.2 |
@ -0,0 +1,4 @@ | |||
# Assume the virtualenv is called .env | |||
cp frameworkpython .env/bin | |||
.env/bin/frameworkpython -m IPython notebook |