10185501402
/
DaSE-Computer-Vision-2021

{ "cells": [  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "from google.colab import drive\n",    "\n",    "drive.mount('/content/drive', force_remount=True)\n",    "\n",    "# 输入daseCV所在的路径\n",    "# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",    "# 例如 'CV/assignments/assignment1/daseCV/'\n",    "FOLDERNAME = None\n",    "\n",    "assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",    "\n",    "%cd drive/My\\ Drive\n",    "%cp -r $FOLDERNAME ../../\n",    "%cd ../../\n",    "%cd daseCV/datasets/\n",    "!bash get_datasets.sh\n",    "%cd ../../"   ]  },  {   "cell_type": "markdown",   "metadata": {    "tags": [     "pdf-title"    ]   },   "source": [    "# 全连接神经网络\n",    "\n",    "在前面的作业中，你在CIFAR-10上实现了一个两层的全连接神经网络。那个实现很简单，但不是很模块化，因为损失和梯度计算在一个函数内。对于一个简单的两层网络来说，还可以人为处理，但是当我们使用更大的模型时，人工处理损失和梯度就变得不切实际了。理想情况下，我们希望使用更加模块化的设计来构建网络，这样我们就可以独立地实现不同类型的层，然后将它们整合到不同架构的模型中。"   ]  },  {   "cell_type": "markdown",   "metadata": {    "tags": [     "pdf-ignore"    ]   },   "source": [    "在本练习中，我们将使用更模块化的方法实现全连接网络。对于每一层，我们将实现一个`forward`和一个`backward`的函数。`forward`函数将接收输入、权重和其他参数，并返回一个输出和一个`cache`对象，存储反向传播所需的数据，如下所示：\n",    "\n",    "```python\n",    "def layer_forward(x, w):\n",    "  \"\"\" Receive inputs x and weights w \"\"\"\n",    "  # Do some computations ...\n",    "  z = # ... some intermediate value\n",    "  # Do some more computations ...\n",    "  out = # the output\n",    "   \n",    "  cache = (x, w, z, out) # Values we need to compute gradients\n",    "   \n",    "  return out, cache\n",    "```\n",    "\n",    "反向传播将接收上游的梯度和`cache`对象，并返回相对于输入和权重的梯度：\n",    "\n",    "```python\n",    "def layer_backward(dout, cache):\n",    "  \"\"\"\n",    "  Receive dout (derivative of loss with respect to outputs) and cache,\n",    "  and compute derivative with respect to inputs.\n",    "  \"\"\"\n",    "  # Unpack cache values\n",    "  x, w, z, out = cache\n",    "  \n",    "  # Use values in cache to compute derivatives\n",    "  dx = # Derivative of loss with respect to x\n",    "  dw = # Derivative of loss with respect to w\n",    "  \n",    "  return dx, dw\n",    "```\n",    "\n",    "以这种方式实现了一些层之后，我们能够轻松地将它们组合起来，以构建不同架构的分类器。\n",    "\n",    "除了实现任意深度的全连接网络外，我们还将探索不同的优化更新规则，并引入Dropout作为正则化器和Batch/Layer归一化工具来更有效地优化网络。\n",    "  "   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {    "tags": [     "pdf-ignore"    ]   },   "outputs": [],   "source": [    "# As usual, a bit of setup\n",    "from __future__ import print_function\n",    "import time\n",    "import numpy as np\n",    "import matplotlib.pyplot as plt\n",    "from daseCV.classifiers.fc_net import *\n",    "from daseCV.data_utils import get_CIFAR10_data\n",    "from daseCV.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n",    "from daseCV.solver import Solver\n",    "\n",    "%matplotlib inline\n",    "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",    "plt.rcParams['image.interpolation'] = 'nearest'\n",    "plt.rcParams['image.cmap'] = 'gray'\n",    "\n",    "# for auto-reloading external modules\n",    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",    "%load_ext autoreload\n",    "%autoreload 2\n",    "\n",    "def rel_error(x, y):\n",    "  \"\"\" returns relative error \"\"\"\n",    "  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {    "tags": [     "pdf-ignore"    ]   },   "outputs": [],   "source": [    "# Load the (preprocessed) CIFAR10 data.\n",    "\n",    "data = get_CIFAR10_data()\n",    "for k, v in list(data.items()):\n",    "  print(('%s: ' % k, v.shape))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 仿射层：前向传播\n",    "打开 `daseCV/layers.py` 并实现 `affine_forward` 函数。\n",    "\n",    "当你完成上述函数后，你可以用下面的代码测试你的实现正确与否"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# Test the affine_forward function\n",    "\n",    "num_inputs = 2\n",    "input_shape = (4, 5, 6)\n",    "output_dim = 3\n",    "\n",    "input_size = num_inputs * np.prod(input_shape)\n",    "weight_size = output_dim * np.prod(input_shape)\n",    "\n",    "x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)\n",    "w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)\n",    "b = np.linspace(-0.3, 0.1, num=output_dim)\n",    "\n",    "\n",    "out, _ = affine_forward(x, w, b)\n",    "correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],\n",    "                        [ 3.25553199,  3.5141327,   3.77273342]])\n",    "\n",    "# Compare your output with ours. The error should be around e-9 or less.\n",    "print('Testing affine_forward function:')\n",    "print('difference: ', rel_error(out, correct_out))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 仿射层：反向传播\n",    "实现 `affine_backwards` 函数，并使用数值梯度检查测试你的实现。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# Test the affine_backward function\n",    "np.random.seed(231)\n",    "x = np.random.randn(10, 2, 3)\n",    "w = np.random.randn(6, 5)\n",    "b = np.random.randn(5)\n",    "dout = np.random.randn(10, 5)\n",    "\n",    "dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)\n",    "dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)\n",    "db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)\n",    "\n",    "_, cache = affine_forward(x, w, b)\n",    "dx, dw, db = affine_backward(dout, cache)\n",    "\n",    "# The error should be around e-10 or less\n",    "print('Testing affine_backward function:')\n",    "print('dx error: ', rel_error(dx_num, dx))\n",    "print('dw error: ', rel_error(dw_num, dw))\n",    "print('db error: ', rel_error(db_num, db))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# ReLU 激活函数：前向传播\n",    "\n",    "在`relu_forward`函数中实现ReLU激活函数的前向传播，并使用以下代码测试您的实现:"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# Test the relu_forward function\n",    "\n",    "x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)\n",    "\n",    "out, _ = relu_forward(x)\n",    "correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],\n",    "                        [ 0.,          0.,          0.04545455,  0.13636364,],\n",    "                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])\n",    "\n",    "# Compare your output with ours. The error should be on the order of e-8\n",    "print('Testing relu_forward function:')\n",    "print('difference: ', rel_error(out, correct_out))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# ReLU 激活函数：反向传播\n",    "\n",    "在`relu_back`函数中为ReLU激活函数实现反向传播，并使用数值梯度检查来测试你的实现"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "np.random.seed(231)\n",    "x = np.random.randn(10, 10)\n",    "dout = np.random.randn(*x.shape)\n",    "\n",    "dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)\n",    "\n",    "_, cache = relu_forward(x)\n",    "dx = relu_backward(dout, cache)\n",    "\n",    "# The error should be on the order of e-12\n",    "print('Testing relu_backward function:')\n",    "print('dx error: ', rel_error(dx_num, dx))"   ]  },  {   "cell_type": "markdown",   "metadata": {    "tags": [     "pdf-inline"    ]   },   "source": [    "## Inline Question 1: \n",    "\n",    "作业中只要求你实现ReLU,但是神经网络可以使用很多不同的激活函数,每个都有它的优点和缺点。但是，激活函数的一个常见问题是在反向传播时出现零(或接近零)梯度流。下列哪个激活函数会有这个问题？如果在一维情况下考虑这些函数，什么样的输入将会发生这种现象？\n",    "1. Sigmoid\n",    "2. ReLU\n",    "3. Leaky ReLU\n",    "\n",    "## Answer:\n",    "[FILL THIS IN]\n"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# “三明治” 层\n",    "\n",    "在神经网络中有一些常用的层模式。例如，仿射层后面经常跟一个ReLU层。为了简化这些常见模式，我们在文件`daseCV/layer_utils.py`中定义了几个常用的层\n",    "\n",    "请查看 `affine_relu_forward` 和 `affine_relu_backward` 函数, 并且运行下列代码进行数值梯度检查："   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "from daseCV.layer_utils import affine_relu_forward, affine_relu_backward\n",    "np.random.seed(231)\n",    "x = np.random.randn(2, 3, 4)\n",    "w = np.random.randn(12, 10)\n",    "b = np.random.randn(10)\n",    "dout = np.random.randn(2, 10)\n",    "\n",    "out, cache = affine_relu_forward(x, w, b)\n",    "dx, dw, db = affine_relu_backward(dout, cache)\n",    "\n",    "dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)\n",    "dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)\n",    "db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)\n",    "\n",    "# Relative error should be around e-10 or less\n",    "print('Testing affine_relu_forward and affine_relu_backward:')\n",    "print('dx error: ', rel_error(dx_num, dx))\n",    "print('dw error: ', rel_error(dw_num, dw))\n",    "print('db error: ', rel_error(db_num, db))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 损失层：Softmax and SVM\n",    "\n",    "在上次作业中你已经实现了这些损失函数，所以这次作业就不用做了，免费送你了。当然，你仍然应该通过查看`daseCV/layers.py`其中的实现来确保理解它们是如何工作的。\n",    "\n",    "你可以通过运行以下程序来确保实现是正确的:"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "np.random.seed(231)\n",    "num_classes, num_inputs = 10, 50\n",    "x = 0.001 * np.random.randn(num_inputs, num_classes)\n",    "y = np.random.randint(num_classes, size=num_inputs)\n",    "\n",    "dx_num = eval_numerical_gradient(lambda x: svm_loss(x, y)[0], x, verbose=False)\n",    "loss, dx = svm_loss(x, y)\n",    "\n",    "# Test svm_loss function. Loss should be around 9 and dx error should be around the order of e-9\n",    "print('Testing svm_loss:')\n",    "print('loss: ', loss)\n",    "print('dx error: ', rel_error(dx_num, dx))\n",    "\n",    "dx_num = eval_numerical_gradient(lambda x: softmax_loss(x, y)[0], x, verbose=False)\n",    "loss, dx = softmax_loss(x, y)\n",    "\n",    "# Test softmax_loss function. Loss should be close to 2.3 and dx error should be around e-8\n",    "print('\\nTesting softmax_loss:')\n",    "print('loss: ', loss)\n",    "print('dx error: ', rel_error(dx_num, dx))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 两层网络\n",    "\n",    "在之前的作业中，你已经实现了一个简单的两层神经网络。现在你已经模块化地实现了一些层，你将使用这些模块重新实现两层网络。\n",    "\n",    "打开文件`daseCV/classifiers/fc_net`。并完成`TwoLayerNet`类的实现。这个类将作为这个作业中其他网络的模块，所以请通读它以确保你理解了这个API。\n",    "你可以运行下面的单元来测试您的实现。\n"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "np.random.seed(231)\n",    "N, D, H, C = 3, 5, 50, 7\n",    "X = np.random.randn(N, D)\n",    "y = np.random.randint(C, size=N)\n",    "\n",    "std = 1e-3\n",    "model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)\n",    "\n",    "print('Testing initialization ... ')\n",    "W1_std = abs(model.params['W1'].std() - std)\n",    "b1 = model.params['b1']\n",    "W2_std = abs(model.params['W2'].std() - std)\n",    "b2 = model.params['b2']\n",    "assert W1_std < std / 10, 'First layer weights do not seem right'\n",    "assert np.all(b1 == 0), 'First layer biases do not seem right'\n",    "assert W2_std < std / 10, 'Second layer weights do not seem right'\n",    "assert np.all(b2 == 0), 'Second layer biases do not seem right'\n",    "\n",    "print('Testing test-time forward pass ... ')\n",    "model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)\n",    "model.params['b1'] = np.linspace(-0.1, 0.9, num=H)\n",    "model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)\n",    "model.params['b2'] = np.linspace(-0.9, 0.1, num=C)\n",    "X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T\n",    "scores = model.loss(X)\n",    "correct_scores = np.asarray(\n",    "  [[11.53165108,  12.2917344,   13.05181771,  13.81190102,  14.57198434, 15.33206765,  16.09215096],\n",    "   [12.05769098,  12.74614105,  13.43459113,  14.1230412,   14.81149128, 15.49994135,  16.18839143],\n",    "   [12.58373087,  13.20054771,  13.81736455,  14.43418138,  15.05099822, 15.66781506,  16.2846319 ]])\n",    "scores_diff = np.abs(scores - correct_scores).sum()\n",    "assert scores_diff < 1e-6, 'Problem with test-time forward pass'\n",    "\n",    "print('Testing training loss (no regularization)')\n",    "y = np.asarray([0, 5, 1])\n",    "loss, grads = model.loss(X, y)\n",    "correct_loss = 3.4702243556\n",    "assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'\n",    "\n",    "model.reg = 1.0\n",    "loss, grads = model.loss(X, y)\n",    "correct_loss = 26.5948426952\n",    "assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'\n",    "\n",    "# Errors should be around e-7 or less\n",    "for reg in [0.0, 0.7]:\n",    "  print('Running numeric gradient check with reg = ', reg)\n",    "  model.reg = reg\n",    "  loss, grads = model.loss(X, y)\n",    "\n",    "  for name in sorted(grads):\n",    "    f = lambda _: model.loss(X, y)[0]\n",    "    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)\n",    "    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# Solver\n",    "\n",    "在之前的作业中，模型的训练逻辑与模型本身是耦合的。在这次作业中，按照更加模块化的设计，我们将模型的训练逻辑划分为单独的类。\n",    "\n",    "打开文件`daseCV/solver`，通读一遍以熟悉API。然后使用一个`Sovler`实例来训练一个`TwoLayerNet`，它可以在验证集上达到至少`50%`的精度。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "model = TwoLayerNet()\n",    "solver = None\n",    "\n",    "##############################################################################\n",    "# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least  #\n",    "# 50% accuracy on the validation set.                                        #\n",    "##############################################################################\n",    "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",    "\n",    "pass\n",    "\n",    "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",    "##############################################################################\n",    "#                             END OF YOUR CODE                               #\n",    "##############################################################################"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# Run this cell to visualize training loss and train / val accuracy\n",    "\n",    "plt.subplot(2, 1, 1)\n",    "plt.title('Training loss')\n",    "plt.plot(solver.loss_history, 'o')\n",    "plt.xlabel('Iteration')\n",    "\n",    "plt.subplot(2, 1, 2)\n",    "plt.title('Accuracy')\n",    "plt.plot(solver.train_acc_history, '-o', label='train')\n",    "plt.plot(solver.val_acc_history, '-o', label='val')\n",    "plt.plot([0.5] * len(solver.val_acc_history), 'k--')\n",    "plt.xlabel('Epoch')\n",    "plt.legend(loc='lower right')\n",    "plt.gcf().set_size_inches(15, 12)\n",    "plt.show()"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 多层网络\n",    "\n",    "接下来，请实现一个带有任意数量的隐层的全连接网络。\n",    "\n",    "阅读`daseCV/classifiers/fc_net.py`中的`FullyConnectedNet`类。\n",    "\n",    "实现初始化、前向传播和反向传播的函数，暂时不要考虑实现dropout或batch/layer normalization，我们将在后面添加上去。"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "## 初始化loss和梯度检查\n",    "\n",    "刚开始要做完整性检查，运行以下代码来检查初始loss，并对有正则化和无正则化的网络进行梯度检查。请问初始的loss合理吗?\n",    "\n",    "在梯度检查中，你应该期望得到1e-7或更少的errors。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "np.random.seed(231)\n",    "N, D, H1, H2, C = 2, 15, 20, 30, 10\n",    "X = np.random.randn(N, D)\n",    "y = np.random.randint(C, size=(N,))\n",    "\n",    "for reg in [0, 3.14]:\n",    "  print('Running check with reg = ', reg)\n",    "  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",    "                            reg=reg, weight_scale=5e-2, dtype=np.float64)\n",    "\n",    "  loss, grads = model.loss(X, y)\n",    "  print('Initial loss: ', loss)\n",    "  \n",    "  # Most of the errors should be on the order of e-7 or smaller.   \n",    "  # NOTE: It is fine however to see an error for W2 on the order of e-5\n",    "  # for the check when reg = 0.0\n",    "  for name in sorted(grads):\n",    "    f = lambda _: model.loss(X, y)[0]\n",    "    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",    "    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "实现另一个完整性检查，请确保你可以过拟合50个图像的小数据集。首先，我们将尝试一个三层网络，每个隐藏层有100个单元。在接下来的代码中，调整**learning rate**和**weight initialization scale**以达到过拟合，在20 epoch内达到100%的训练精度。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# TODO: Use a three-layer Net to overfit 50 training examples by \n",    "# tweaking just the learning rate and initialization scale.\n",    "\n",    "num_train = 50\n",    "small_data = {\n",    "  'X_train': data['X_train'][:num_train],\n",    "  'y_train': data['y_train'][:num_train],\n",    "  'X_val': data['X_val'],\n",    "  'y_val': data['y_val'],\n",    "}\n",    "\n",    "weight_scale = 1e-2   # Experiment with this!\n",    "learning_rate = 1e-2  # Experiment with this!\n",    "model = FullyConnectedNet([100, 100],\n",    "              weight_scale=weight_scale, dtype=np.float64)\n",    "solver = Solver(model, small_data,\n",    "                print_every=10, num_epochs=20, batch_size=25,\n",    "                update_rule='sgd',\n",    "                optim_config={\n",    "                  'learning_rate': learning_rate,\n",    "                }\n",    "         )\n",    "solver.train()\n",    "\n",    "plt.plot(solver.loss_history, 'o')\n",    "plt.title('Training loss history')\n",    "plt.xlabel('Iteration')\n",    "plt.ylabel('Training loss')\n",    "plt.show()"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "现在尝试使用一个五层的网络，每层100个单元，对50张图片进行训练。同样，你将调整learning rate和weight initialization scale比例，你应该能够在20个epoch内实现100%的训练精度。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# TODO: Use a five-layer Net to overfit 50 training examples by \n",    "# tweaking just the learning rate and initialization scale.\n",    "\n",    "num_train = 50\n",    "small_data = {\n",    "  'X_train': data['X_train'][:num_train],\n",    "  'y_train': data['y_train'][:num_train],\n",    "  'X_val': data['X_val'],\n",    "  'y_val': data['y_val'],\n",    "}\n",    "\n",    "weight_scale = 1e-1 # Experiment with this!\n",    "learning_rate = 2e-3  # Experiment with this!\n",    "model = FullyConnectedNet([100, 100, 100, 100],\n",    "                weight_scale=weight_scale, dtype=np.float64)\n",    "solver = Solver(model, small_data,\n",    "                print_every=10, num_epochs=20, batch_size=25,\n",    "                update_rule='sgd',\n",    "                optim_config={\n",    "                  'learning_rate': learning_rate,\n",    "                }\n",    "         )\n",    "solver.train()\n",    "\n",    "plt.plot(solver.loss_history, 'o')\n",    "plt.title('Training loss history')\n",    "plt.xlabel('Iteration')\n",    "plt.ylabel('Training loss')\n",    "plt.show()"   ]  },  {   "cell_type": "markdown",   "metadata": {    "tags": [     "pdf-inline"    ]   },   "source": [    "#### Inline Question 2: \n",    "你注意到训练三层网和训练五层网难度的区别了吗？根据你的经验，哪个网络对initalization scale更敏感?为什么会这样呢?\n",    "\n",    "## Answer:\n",    "[FILL THIS IN]\n"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 更新规则\n",    "\n",    "到目前为止，我们使用了普通的随机梯度下降法(SGD)作为我们的更新规则。更复杂的更新规则可以更容易地训练深度网络。我们将实现一些最常用的更新规则，并将它们与普通的SGD进行比较。"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# SGD+Momentum\n",    "\n",    "带动量的随机梯度下降法是一种广泛使用的更新规则，它使深度网络的收敛速度快于普通的随机梯度下降法。更多信息参见http://cs231n.github.io/neural-networks-3/#sgd 动量更新部分。\n",    "\n",    "打开文件`daseCV/optim`，并阅读该文件顶部的文档，以确保你理解了该API。在函数`sgd_momentum`中实现SGD+动量更新规则，并运行以下代码检查你的实现。你会看到errors小于e-8。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "from daseCV.optim import sgd_momentum\n",    "\n",    "N, D = 4, 5\n",    "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",    "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",    "v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",    "\n",    "config = {'learning_rate': 1e-3, 'velocity': v}\n",    "next_w, _ = sgd_momentum(w, dw, config=config)\n",    "\n",    "expected_next_w = np.asarray([\n",    "  [ 0.1406,      0.20738947,  0.27417895,  0.34096842,  0.40775789],\n",    "  [ 0.47454737,  0.54133684,  0.60812632,  0.67491579,  0.74170526],\n",    "  [ 0.80849474,  0.87528421,  0.94207368,  1.00886316,  1.07565263],\n",    "  [ 1.14244211,  1.20923158,  1.27602105,  1.34281053,  1.4096    ]])\n",    "expected_velocity = np.asarray([\n",    "  [ 0.5406,      0.55475789,  0.56891579, 0.58307368,  0.59723158],\n",    "  [ 0.61138947,  0.62554737,  0.63970526,  0.65386316,  0.66802105],\n",    "  [ 0.68217895,  0.69633684,  0.71049474,  0.72465263,  0.73881053],\n",    "  [ 0.75296842,  0.76712632,  0.78128421,  0.79544211,  0.8096    ]])\n",    "\n",    "# Should see relative errors around e-8 or less\n",    "print('next_w error: ', rel_error(next_w, expected_next_w))\n",    "print('velocity error: ', rel_error(expected_velocity, config['velocity']))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "当你完成了上面的步骤，运行以下代码来训练一个具有SGD和SGD+momentum的六层网络。你应该看到SGD+momentum更新规则收敛得更快。\n"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "num_train = 4000\n",    "small_data = {\n",    "  'X_train': data['X_train'][:num_train],\n",    "  'y_train': data['y_train'][:num_train],\n",    "  'X_val': data['X_val'],\n",    "  'y_val': data['y_val'],\n",    "}\n",    "\n",    "solvers = {}\n",    "\n",    "for update_rule in ['sgd', 'sgd_momentum']:\n",    "  print('running with ', update_rule)\n",    "  model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",    "\n",    "  solver = Solver(model, small_data,\n",    "                  num_epochs=5, batch_size=100,\n",    "                  update_rule=update_rule,\n",    "                  optim_config={\n",    "                    'learning_rate': 5e-3,\n",    "                  },\n",    "                  verbose=True)\n",    "  solvers[update_rule] = solver\n",    "  solver.train()\n",    "  print()\n",    "\n",    "plt.subplot(3, 1, 1)\n",    "plt.title('Training loss')\n",    "plt.xlabel('Iteration')\n",    "\n",    "plt.subplot(3, 1, 2)\n",    "plt.title('Training accuracy')\n",    "plt.xlabel('Epoch')\n",    "\n",    "plt.subplot(3, 1, 3)\n",    "plt.title('Validation accuracy')\n",    "plt.xlabel('Epoch')\n",    "\n",    "for update_rule, solver in solvers.items():\n",    "  plt.subplot(3, 1, 1)\n",    "  plt.plot(solver.loss_history, 'o', label=\"loss_%s\" % update_rule)\n",    "  \n",    "  plt.subplot(3, 1, 2)\n",    "  plt.plot(solver.train_acc_history, '-o', label=\"train_acc_%s\" % update_rule)\n",    "\n",    "  plt.subplot(3, 1, 3)\n",    "  plt.plot(solver.val_acc_history, '-o', label=\"val_acc_%s\" % update_rule)\n",    "  \n",    "for i in [1, 2, 3]:\n",    "  plt.subplot(3, 1, i)\n",    "  plt.legend(loc='upper center', ncol=4)\n",    "plt.gcf().set_size_inches(15, 15)\n",    "plt.show()"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# RMSProp and Adam\n",    "\n",    "RMSProp [1] 和Adam [2] 是另外两个更新规则，它们通过使用梯度的二阶矩平均值来设置每个参数的学习速率。\n",    "\n",    "在文件`daseCV/optim`中实现`RMSProp`函数和`Adam`函数，并使用下面的代码来检查您的实现。\n",    "\n",    "[1] Tijmen Tieleman and Geoffrey Hinton. \"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.\" COURSERA: Neural Networks for Machine Learning 4 (2012).\n",    "\n",    "[2] Diederik Kingma and Jimmy Ba, \"Adam: A Method for Stochastic Optimization\", ICLR 2015."   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# Test RMSProp implementation\n",    "from daseCV.optim import rmsprop\n",    "\n",    "N, D = 4, 5\n",    "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",    "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",    "cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",    "\n",    "config = {'learning_rate': 1e-2, 'cache': cache}\n",    "next_w, _ = rmsprop(w, dw, config=config)\n",    "\n",    "expected_next_w = np.asarray([\n",    "  [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],\n",    "  [-0.132737,   -0.08078555, -0.02881884,  0.02316247,  0.07515774],\n",    "  [ 0.12716641,  0.17918792,  0.23122175,  0.28326742,  0.33532447],\n",    "  [ 0.38739248,  0.43947102,  0.49155973,  0.54365823,  0.59576619]])\n",    "expected_cache = np.asarray([\n",    "  [ 0.5976,      0.6126277,   0.6277108,   0.64284931,  0.65804321],\n",    "  [ 0.67329252,  0.68859723,  0.70395734,  0.71937285,  0.73484377],\n",    "  [ 0.75037008,  0.7659518,   0.78158892,  0.79728144,  0.81302936],\n",    "  [ 0.82883269,  0.84469141,  0.86060554,  0.87657507,  0.8926    ]])\n",    "\n",    "# You should see relative errors around e-7 or less\n",    "print('next_w error: ', rel_error(expected_next_w, next_w))\n",    "print('cache error: ', rel_error(expected_cache, config['cache']))"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "# Test Adam implementation\n",    "from daseCV.optim import adam\n",    "\n",    "N, D = 4, 5\n",    "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",    "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",    "m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",    "v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)\n",    "\n",    "config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}\n",    "next_w, _ = adam(w, dw, config=config)\n",    "\n",    "expected_next_w = np.asarray([\n",    "  [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],\n",    "  [-0.1380274,  -0.08544591, -0.03286534,  0.01971428,  0.0722929],\n",    "  [ 0.1248705,   0.17744702,  0.23002243,  0.28259667,  0.33516969],\n",    "  [ 0.38774145,  0.44031188,  0.49288093,  0.54544852,  0.59801459]])\n",    "expected_v = np.asarray([\n",    "  [ 0.69966,     0.68908382,  0.67851319,  0.66794809,  0.65738853,],\n",    "  [ 0.64683452,  0.63628604,  0.6257431,   0.61520571,  0.60467385,],\n",    "  [ 0.59414753,  0.58362676,  0.57311152,  0.56260183,  0.55209767,],\n",    "  [ 0.54159906,  0.53110598,  0.52061845,  0.51013645,  0.49966,   ]])\n",    "expected_m = np.asarray([\n",    "  [ 0.48,        0.49947368,  0.51894737,  0.53842105,  0.55789474],\n",    "  [ 0.57736842,  0.59684211,  0.61631579,  0.63578947,  0.65526316],\n",    "  [ 0.67473684,  0.69421053,  0.71368421,  0.73315789,  0.75263158],\n",    "  [ 0.77210526,  0.79157895,  0.81105263,  0.83052632,  0.85      ]])\n",    "\n",    "# You should see relative errors around e-7 or less\n",    "print('next_w error: ', rel_error(expected_next_w, next_w))\n",    "print('v error: ', rel_error(expected_v, config['v']))\n",    "print('m error: ', rel_error(expected_m, config['m']))"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "当你完成了上面RMSProp和Adam函数后，运行下面的代码训练一对网络，其中分别使用了上述两个方法"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "learning_rates = {'rmsprop': 1e-4, 'adam': 1e-3}\n",    "for update_rule in ['adam', 'rmsprop']:\n",    "  print('running with ', update_rule)\n",    "  model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",    "\n",    "  solver = Solver(model, small_data,\n",    "                  num_epochs=5, batch_size=100,\n",    "                  update_rule=update_rule,\n",    "                  optim_config={\n",    "                    'learning_rate': learning_rates[update_rule]\n",    "                  },\n",    "                  verbose=True)\n",    "  solvers[update_rule] = solver\n",    "  solver.train()\n",    "  print()\n",    "\n",    "plt.subplot(3, 1, 1)\n",    "plt.title('Training loss')\n",    "plt.xlabel('Iteration')\n",    "\n",    "plt.subplot(3, 1, 2)\n",    "plt.title('Training accuracy')\n",    "plt.xlabel('Epoch')\n",    "\n",    "plt.subplot(3, 1, 3)\n",    "plt.title('Validation accuracy')\n",    "plt.xlabel('Epoch')\n",    "\n",    "for update_rule, solver in list(solvers.items()):\n",    "  plt.subplot(3, 1, 1)\n",    "  plt.plot(solver.loss_history, 'o', label=update_rule)\n",    "  \n",    "  plt.subplot(3, 1, 2)\n",    "  plt.plot(solver.train_acc_history, '-o', label=update_rule)\n",    "\n",    "  plt.subplot(3, 1, 3)\n",    "  plt.plot(solver.val_acc_history, '-o', label=update_rule)\n",    "  \n",    "for i in [1, 2, 3]:\n",    "  plt.subplot(3, 1, i)\n",    "  plt.legend(loc='upper center', ncol=4)\n",    "plt.gcf().set_size_inches(15, 15)\n",    "plt.show()"   ]  },  {   "cell_type": "markdown",   "metadata": {    "tags": [     "pdf-inline"    ]   },   "source": [    "## Inline Question 3:\n",    "\n",    "AdaGrad，类似于Adam，是一个per-parameter优化方法，它使用以下更新规则:\n",    "\n",    "```\n",    "cache += dw**2\n",    "w += - learning_rate * dw / (np.sqrt(cache) + eps)\n",    "```\n",    "\n",    "当使用AdaGrad训练一个网络时，更新的值会变得非常小，而且他的网络学习的非常慢。利用你对AdaGrad更新规则的了解，解释为什么更新的值会变得非常小？ Adam会有同样的问题吗？\n",    "\n",    "## Answer: \n",    "[FILL THIS IN]\n"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 训练一个效果足够好的模型！\n",    "\n",    "在CIFAR-10上尽可能训练最好的全连接模型，将最好的模型存储在`best_model`变量中。我们要求你在验证集上获得至少50%的准确性。\n",    "\n",    "如果你细心的话，应该是有可能得到55%以上精度的，但我们不苛求你达到这么高的精度。在后面的作业上，我们会要求你们在CIFAR-10上训练最好的卷积神经网络，我们希望你们把精力放在卷积网络上，而不是全连接网络上。\n",    "\n",    "在做这部分之前完成`BatchNormalization.ipynb`和`Dropout.ipynb`可能会对你有帮助，因为这些技术可以帮助你训练强大的模型。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "best_model = None\n",    "################################################################################\n",    "# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might   #\n",    "# find batch/layer normalization and dropout useful. Store your best model in  #\n",    "# the best_model variable.                                                     #\n",    "################################################################################\n",    "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",    "\n",    "pass\n",    "\n",    "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",    "################################################################################\n",    "#                              END OF YOUR CODE                                #\n",    "################################################################################"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "# 测试你的模型！\n",    "\n",    "在验证和测试集上运行您的最佳模型。验证集的准确率应达到50%以上。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)\n",    "y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)\n",    "print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())\n",    "print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())"   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "---\n",    "# 重要\n",    "\n",    "这里是作业的结尾处，请执行以下步骤:\n",    "\n",    "1. 点击`File -> Save`或者用`control+s`组合键，确保你最新的的notebook的作业已经保存到谷歌云。\n",    "2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {},   "outputs": [],   "source": [    "import os\n",    "\n",    "FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",    "FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n",    "\n",    "for files in FILES_TO_SAVE:\n",    "  with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",    "    f.write(''.join(open(files).readlines()))"   ]  } ], "metadata": {  "kernelspec": {   "display_name": "Python 3",   "language": "python",   "name": "python3"  },  "language_info": {   "codemirror_mode": {    "name": "ipython",    "version": 3   },   "file_extension": ".py",   "mimetype": "text/x-python",   "name": "python",   "nbconvert_exporter": "python",   "pygments_lexer": "ipython3",   "version": "3.7.0"  } }, "nbformat": 4, "nbformat_minor": 4}