DaSE-Computer-Vision-2021
Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

1049 рядки
38 KiB

  1. {
  2. "cells": [
  3. {
  4. "cell_type": "code",
  5. "execution_count": null,
  6. "metadata": {},
  7. "outputs": [],
  8. "source": [
  9. "from google.colab import drive\n",
  10. "\n",
  11. "drive.mount('/content/drive', force_remount=True)\n",
  12. "\n",
  13. "# 输入daseCV所在的路径\n",
  14. "# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",
  15. "# 例如 'CV/assignments/assignment1/daseCV/'\n",
  16. "FOLDERNAME = None\n",
  17. "\n",
  18. "assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",
  19. "\n",
  20. "%cd drive/My\\ Drive\n",
  21. "%cp -r $FOLDERNAME ../../\n",
  22. "%cd ../../\n",
  23. "%cd daseCV/datasets/\n",
  24. "!bash get_datasets.sh\n",
  25. "%cd ../../"
  26. ]
  27. },
  28. {
  29. "cell_type": "markdown",
  30. "metadata": {
  31. "tags": [
  32. "pdf-title"
  33. ]
  34. },
  35. "source": [
  36. "# 全连接神经网络\n",
  37. "\n",
  38. "在前面的作业中,你在CIFAR-10上实现了一个两层的全连接神经网络。那个实现很简单,但不是很模块化,因为损失和梯度计算在一个函数内。对于一个简单的两层网络来说,还可以人为处理,但是当我们使用更大的模型时,人工处理损失和梯度就变得不切实际了。理想情况下,我们希望使用更加模块化的设计来构建网络,这样我们就可以独立地实现不同类型的层,然后将它们整合到不同架构的模型中。"
  39. ]
  40. },
  41. {
  42. "cell_type": "markdown",
  43. "metadata": {
  44. "tags": [
  45. "pdf-ignore"
  46. ]
  47. },
  48. "source": [
  49. "在本练习中,我们将使用更模块化的方法实现全连接网络。对于每一层,我们将实现一个`forward`和一个`backward`的函数。`forward`函数将接收输入、权重和其他参数,并返回一个输出和一个`cache`对象,存储反向传播所需的数据,如下所示:\n",
  50. "\n",
  51. "```python\n",
  52. "def layer_forward(x, w):\n",
  53. " \"\"\" Receive inputs x and weights w \"\"\"\n",
  54. " # Do some computations ...\n",
  55. " z = # ... some intermediate value\n",
  56. " # Do some more computations ...\n",
  57. " out = # the output\n",
  58. " \n",
  59. " cache = (x, w, z, out) # Values we need to compute gradients\n",
  60. " \n",
  61. " return out, cache\n",
  62. "```\n",
  63. "\n",
  64. "反向传播将接收上游的梯度和`cache`对象,并返回相对于输入和权重的梯度:\n",
  65. "\n",
  66. "```python\n",
  67. "def layer_backward(dout, cache):\n",
  68. " \"\"\"\n",
  69. " Receive dout (derivative of loss with respect to outputs) and cache,\n",
  70. " and compute derivative with respect to inputs.\n",
  71. " \"\"\"\n",
  72. " # Unpack cache values\n",
  73. " x, w, z, out = cache\n",
  74. " \n",
  75. " # Use values in cache to compute derivatives\n",
  76. " dx = # Derivative of loss with respect to x\n",
  77. " dw = # Derivative of loss with respect to w\n",
  78. " \n",
  79. " return dx, dw\n",
  80. "```\n",
  81. "\n",
  82. "以这种方式实现了一些层之后,我们能够轻松地将它们组合起来,以构建不同架构的分类器。\n",
  83. "\n",
  84. "除了实现任意深度的全连接网络外,我们还将探索不同的优化更新规则,并引入Dropout作为正则化器和Batch/Layer归一化工具来更有效地优化网络。\n",
  85. " "
  86. ]
  87. },
  88. {
  89. "cell_type": "code",
  90. "execution_count": null,
  91. "metadata": {
  92. "tags": [
  93. "pdf-ignore"
  94. ]
  95. },
  96. "outputs": [],
  97. "source": [
  98. "# As usual, a bit of setup\n",
  99. "from __future__ import print_function\n",
  100. "import time\n",
  101. "import numpy as np\n",
  102. "import matplotlib.pyplot as plt\n",
  103. "from daseCV.classifiers.fc_net import *\n",
  104. "from daseCV.data_utils import get_CIFAR10_data\n",
  105. "from daseCV.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n",
  106. "from daseCV.solver import Solver\n",
  107. "\n",
  108. "%matplotlib inline\n",
  109. "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",
  110. "plt.rcParams['image.interpolation'] = 'nearest'\n",
  111. "plt.rcParams['image.cmap'] = 'gray'\n",
  112. "\n",
  113. "# for auto-reloading external modules\n",
  114. "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
  115. "%load_ext autoreload\n",
  116. "%autoreload 2\n",
  117. "\n",
  118. "def rel_error(x, y):\n",
  119. " \"\"\" returns relative error \"\"\"\n",
  120. " return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))"
  121. ]
  122. },
  123. {
  124. "cell_type": "code",
  125. "execution_count": null,
  126. "metadata": {
  127. "tags": [
  128. "pdf-ignore"
  129. ]
  130. },
  131. "outputs": [],
  132. "source": [
  133. "# Load the (preprocessed) CIFAR10 data.\n",
  134. "\n",
  135. "data = get_CIFAR10_data()\n",
  136. "for k, v in list(data.items()):\n",
  137. " print(('%s: ' % k, v.shape))"
  138. ]
  139. },
  140. {
  141. "cell_type": "markdown",
  142. "metadata": {},
  143. "source": [
  144. "# 仿射层:前向传播\n",
  145. "打开 `daseCV/layers.py` 并实现 `affine_forward` 函数。\n",
  146. "\n",
  147. "当你完成上述函数后,你可以用下面的代码测试你的实现正确与否"
  148. ]
  149. },
  150. {
  151. "cell_type": "code",
  152. "execution_count": null,
  153. "metadata": {},
  154. "outputs": [],
  155. "source": [
  156. "# Test the affine_forward function\n",
  157. "\n",
  158. "num_inputs = 2\n",
  159. "input_shape = (4, 5, 6)\n",
  160. "output_dim = 3\n",
  161. "\n",
  162. "input_size = num_inputs * np.prod(input_shape)\n",
  163. "weight_size = output_dim * np.prod(input_shape)\n",
  164. "\n",
  165. "x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)\n",
  166. "w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)\n",
  167. "b = np.linspace(-0.3, 0.1, num=output_dim)\n",
  168. "\n",
  169. "\n",
  170. "out, _ = affine_forward(x, w, b)\n",
  171. "correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],\n",
  172. " [ 3.25553199, 3.5141327, 3.77273342]])\n",
  173. "\n",
  174. "# Compare your output with ours. The error should be around e-9 or less.\n",
  175. "print('Testing affine_forward function:')\n",
  176. "print('difference: ', rel_error(out, correct_out))"
  177. ]
  178. },
  179. {
  180. "cell_type": "markdown",
  181. "metadata": {},
  182. "source": [
  183. "# 仿射层:反向传播\n",
  184. "实现 `affine_backwards` 函数,并使用数值梯度检查测试你的实现。"
  185. ]
  186. },
  187. {
  188. "cell_type": "code",
  189. "execution_count": null,
  190. "metadata": {},
  191. "outputs": [],
  192. "source": [
  193. "# Test the affine_backward function\n",
  194. "np.random.seed(231)\n",
  195. "x = np.random.randn(10, 2, 3)\n",
  196. "w = np.random.randn(6, 5)\n",
  197. "b = np.random.randn(5)\n",
  198. "dout = np.random.randn(10, 5)\n",
  199. "\n",
  200. "dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)\n",
  201. "dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)\n",
  202. "db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)\n",
  203. "\n",
  204. "_, cache = affine_forward(x, w, b)\n",
  205. "dx, dw, db = affine_backward(dout, cache)\n",
  206. "\n",
  207. "# The error should be around e-10 or less\n",
  208. "print('Testing affine_backward function:')\n",
  209. "print('dx error: ', rel_error(dx_num, dx))\n",
  210. "print('dw error: ', rel_error(dw_num, dw))\n",
  211. "print('db error: ', rel_error(db_num, db))"
  212. ]
  213. },
  214. {
  215. "cell_type": "markdown",
  216. "metadata": {},
  217. "source": [
  218. "# ReLU 激活函数:前向传播\n",
  219. "\n",
  220. "在`relu_forward`函数中实现ReLU激活函数的前向传播,并使用以下代码测试您的实现:"
  221. ]
  222. },
  223. {
  224. "cell_type": "code",
  225. "execution_count": null,
  226. "metadata": {},
  227. "outputs": [],
  228. "source": [
  229. "# Test the relu_forward function\n",
  230. "\n",
  231. "x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)\n",
  232. "\n",
  233. "out, _ = relu_forward(x)\n",
  234. "correct_out = np.array([[ 0., 0., 0., 0., ],\n",
  235. " [ 0., 0., 0.04545455, 0.13636364,],\n",
  236. " [ 0.22727273, 0.31818182, 0.40909091, 0.5, ]])\n",
  237. "\n",
  238. "# Compare your output with ours. The error should be on the order of e-8\n",
  239. "print('Testing relu_forward function:')\n",
  240. "print('difference: ', rel_error(out, correct_out))"
  241. ]
  242. },
  243. {
  244. "cell_type": "markdown",
  245. "metadata": {},
  246. "source": [
  247. "# ReLU 激活函数:反向传播\n",
  248. "\n",
  249. "在`relu_back`函数中为ReLU激活函数实现反向传播,并使用数值梯度检查来测试你的实现"
  250. ]
  251. },
  252. {
  253. "cell_type": "code",
  254. "execution_count": null,
  255. "metadata": {},
  256. "outputs": [],
  257. "source": [
  258. "np.random.seed(231)\n",
  259. "x = np.random.randn(10, 10)\n",
  260. "dout = np.random.randn(*x.shape)\n",
  261. "\n",
  262. "dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)\n",
  263. "\n",
  264. "_, cache = relu_forward(x)\n",
  265. "dx = relu_backward(dout, cache)\n",
  266. "\n",
  267. "# The error should be on the order of e-12\n",
  268. "print('Testing relu_backward function:')\n",
  269. "print('dx error: ', rel_error(dx_num, dx))"
  270. ]
  271. },
  272. {
  273. "cell_type": "markdown",
  274. "metadata": {
  275. "tags": [
  276. "pdf-inline"
  277. ]
  278. },
  279. "source": [
  280. "## Inline Question 1: \n",
  281. "\n",
  282. "作业中只要求你实现ReLU,但是神经网络可以使用很多不同的激活函数,每个都有它的优点和缺点。但是,激活函数的一个常见问题是在反向传播时出现零(或接近零)梯度流。下列哪个激活函数会有这个问题?如果在一维情况下考虑这些函数,什么样的输入将会发生这种现象?\n",
  283. "1. Sigmoid\n",
  284. "2. ReLU\n",
  285. "3. Leaky ReLU\n",
  286. "\n",
  287. "## Answer:\n",
  288. "[FILL THIS IN]\n"
  289. ]
  290. },
  291. {
  292. "cell_type": "markdown",
  293. "metadata": {},
  294. "source": [
  295. "# “三明治” 层\n",
  296. "\n",
  297. "在神经网络中有一些常用的层模式。例如,仿射层后面经常跟一个ReLU层。为了简化这些常见模式,我们在文件`daseCV/layer_utils.py`中定义了几个常用的层\n",
  298. "\n",
  299. "请查看 `affine_relu_forward` 和 `affine_relu_backward` 函数, 并且运行下列代码进行数值梯度检查:"
  300. ]
  301. },
  302. {
  303. "cell_type": "code",
  304. "execution_count": null,
  305. "metadata": {},
  306. "outputs": [],
  307. "source": [
  308. "from daseCV.layer_utils import affine_relu_forward, affine_relu_backward\n",
  309. "np.random.seed(231)\n",
  310. "x = np.random.randn(2, 3, 4)\n",
  311. "w = np.random.randn(12, 10)\n",
  312. "b = np.random.randn(10)\n",
  313. "dout = np.random.randn(2, 10)\n",
  314. "\n",
  315. "out, cache = affine_relu_forward(x, w, b)\n",
  316. "dx, dw, db = affine_relu_backward(dout, cache)\n",
  317. "\n",
  318. "dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)\n",
  319. "dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)\n",
  320. "db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)\n",
  321. "\n",
  322. "# Relative error should be around e-10 or less\n",
  323. "print('Testing affine_relu_forward and affine_relu_backward:')\n",
  324. "print('dx error: ', rel_error(dx_num, dx))\n",
  325. "print('dw error: ', rel_error(dw_num, dw))\n",
  326. "print('db error: ', rel_error(db_num, db))"
  327. ]
  328. },
  329. {
  330. "cell_type": "markdown",
  331. "metadata": {},
  332. "source": [
  333. "# 损失层:Softmax and SVM\n",
  334. "\n",
  335. "在上次作业中你已经实现了这些损失函数,所以这次作业就不用做了,免费送你了。当然,你仍然应该通过查看`daseCV/layers.py`其中的实现来确保理解它们是如何工作的。\n",
  336. "\n",
  337. "你可以通过运行以下程序来确保实现是正确的:"
  338. ]
  339. },
  340. {
  341. "cell_type": "code",
  342. "execution_count": null,
  343. "metadata": {},
  344. "outputs": [],
  345. "source": [
  346. "np.random.seed(231)\n",
  347. "num_classes, num_inputs = 10, 50\n",
  348. "x = 0.001 * np.random.randn(num_inputs, num_classes)\n",
  349. "y = np.random.randint(num_classes, size=num_inputs)\n",
  350. "\n",
  351. "dx_num = eval_numerical_gradient(lambda x: svm_loss(x, y)[0], x, verbose=False)\n",
  352. "loss, dx = svm_loss(x, y)\n",
  353. "\n",
  354. "# Test svm_loss function. Loss should be around 9 and dx error should be around the order of e-9\n",
  355. "print('Testing svm_loss:')\n",
  356. "print('loss: ', loss)\n",
  357. "print('dx error: ', rel_error(dx_num, dx))\n",
  358. "\n",
  359. "dx_num = eval_numerical_gradient(lambda x: softmax_loss(x, y)[0], x, verbose=False)\n",
  360. "loss, dx = softmax_loss(x, y)\n",
  361. "\n",
  362. "# Test softmax_loss function. Loss should be close to 2.3 and dx error should be around e-8\n",
  363. "print('\\nTesting softmax_loss:')\n",
  364. "print('loss: ', loss)\n",
  365. "print('dx error: ', rel_error(dx_num, dx))"
  366. ]
  367. },
  368. {
  369. "cell_type": "markdown",
  370. "metadata": {},
  371. "source": [
  372. "# 两层网络\n",
  373. "\n",
  374. "在之前的作业中,你已经实现了一个简单的两层神经网络。现在你已经模块化地实现了一些层,你将使用这些模块重新实现两层网络。\n",
  375. "\n",
  376. "打开文件`daseCV/classifiers/fc_net`。并完成`TwoLayerNet`类的实现。这个类将作为这个作业中其他网络的模块,所以请通读它以确保你理解了这个API。\n",
  377. "你可以运行下面的单元来测试您的实现。\n"
  378. ]
  379. },
  380. {
  381. "cell_type": "code",
  382. "execution_count": null,
  383. "metadata": {},
  384. "outputs": [],
  385. "source": [
  386. "np.random.seed(231)\n",
  387. "N, D, H, C = 3, 5, 50, 7\n",
  388. "X = np.random.randn(N, D)\n",
  389. "y = np.random.randint(C, size=N)\n",
  390. "\n",
  391. "std = 1e-3\n",
  392. "model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)\n",
  393. "\n",
  394. "print('Testing initialization ... ')\n",
  395. "W1_std = abs(model.params['W1'].std() - std)\n",
  396. "b1 = model.params['b1']\n",
  397. "W2_std = abs(model.params['W2'].std() - std)\n",
  398. "b2 = model.params['b2']\n",
  399. "assert W1_std < std / 10, 'First layer weights do not seem right'\n",
  400. "assert np.all(b1 == 0), 'First layer biases do not seem right'\n",
  401. "assert W2_std < std / 10, 'Second layer weights do not seem right'\n",
  402. "assert np.all(b2 == 0), 'Second layer biases do not seem right'\n",
  403. "\n",
  404. "print('Testing test-time forward pass ... ')\n",
  405. "model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)\n",
  406. "model.params['b1'] = np.linspace(-0.1, 0.9, num=H)\n",
  407. "model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)\n",
  408. "model.params['b2'] = np.linspace(-0.9, 0.1, num=C)\n",
  409. "X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T\n",
  410. "scores = model.loss(X)\n",
  411. "correct_scores = np.asarray(\n",
  412. " [[11.53165108, 12.2917344, 13.05181771, 13.81190102, 14.57198434, 15.33206765, 16.09215096],\n",
  413. " [12.05769098, 12.74614105, 13.43459113, 14.1230412, 14.81149128, 15.49994135, 16.18839143],\n",
  414. " [12.58373087, 13.20054771, 13.81736455, 14.43418138, 15.05099822, 15.66781506, 16.2846319 ]])\n",
  415. "scores_diff = np.abs(scores - correct_scores).sum()\n",
  416. "assert scores_diff < 1e-6, 'Problem with test-time forward pass'\n",
  417. "\n",
  418. "print('Testing training loss (no regularization)')\n",
  419. "y = np.asarray([0, 5, 1])\n",
  420. "loss, grads = model.loss(X, y)\n",
  421. "correct_loss = 3.4702243556\n",
  422. "assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'\n",
  423. "\n",
  424. "model.reg = 1.0\n",
  425. "loss, grads = model.loss(X, y)\n",
  426. "correct_loss = 26.5948426952\n",
  427. "assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'\n",
  428. "\n",
  429. "# Errors should be around e-7 or less\n",
  430. "for reg in [0.0, 0.7]:\n",
  431. " print('Running numeric gradient check with reg = ', reg)\n",
  432. " model.reg = reg\n",
  433. " loss, grads = model.loss(X, y)\n",
  434. "\n",
  435. " for name in sorted(grads):\n",
  436. " f = lambda _: model.loss(X, y)[0]\n",
  437. " grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)\n",
  438. " print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
  439. ]
  440. },
  441. {
  442. "cell_type": "markdown",
  443. "metadata": {},
  444. "source": [
  445. "# Solver\n",
  446. "\n",
  447. "在之前的作业中,模型的训练逻辑与模型本身是耦合的。在这次作业中,按照更加模块化的设计,我们将模型的训练逻辑划分为单独的类。\n",
  448. "\n",
  449. "打开文件`daseCV/solver`,通读一遍以熟悉API。然后使用一个`Sovler`实例来训练一个`TwoLayerNet`,它可以在验证集上达到至少`50%`的精度。"
  450. ]
  451. },
  452. {
  453. "cell_type": "code",
  454. "execution_count": null,
  455. "metadata": {},
  456. "outputs": [],
  457. "source": [
  458. "model = TwoLayerNet()\n",
  459. "solver = None\n",
  460. "\n",
  461. "##############################################################################\n",
  462. "# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least #\n",
  463. "# 50% accuracy on the validation set. #\n",
  464. "##############################################################################\n",
  465. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  466. "\n",
  467. "pass\n",
  468. "\n",
  469. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  470. "##############################################################################\n",
  471. "# END OF YOUR CODE #\n",
  472. "##############################################################################"
  473. ]
  474. },
  475. {
  476. "cell_type": "code",
  477. "execution_count": null,
  478. "metadata": {},
  479. "outputs": [],
  480. "source": [
  481. "# Run this cell to visualize training loss and train / val accuracy\n",
  482. "\n",
  483. "plt.subplot(2, 1, 1)\n",
  484. "plt.title('Training loss')\n",
  485. "plt.plot(solver.loss_history, 'o')\n",
  486. "plt.xlabel('Iteration')\n",
  487. "\n",
  488. "plt.subplot(2, 1, 2)\n",
  489. "plt.title('Accuracy')\n",
  490. "plt.plot(solver.train_acc_history, '-o', label='train')\n",
  491. "plt.plot(solver.val_acc_history, '-o', label='val')\n",
  492. "plt.plot([0.5] * len(solver.val_acc_history), 'k--')\n",
  493. "plt.xlabel('Epoch')\n",
  494. "plt.legend(loc='lower right')\n",
  495. "plt.gcf().set_size_inches(15, 12)\n",
  496. "plt.show()"
  497. ]
  498. },
  499. {
  500. "cell_type": "markdown",
  501. "metadata": {},
  502. "source": [
  503. "# 多层网络\n",
  504. "\n",
  505. "接下来,请实现一个带有任意数量的隐层的全连接网络。\n",
  506. "\n",
  507. "阅读`daseCV/classifiers/fc_net.py`中的`FullyConnectedNet`类。\n",
  508. "\n",
  509. "实现初始化、前向传播和反向传播的函数,暂时不要考虑实现dropout或batch/layer normalization,我们将在后面添加上去。"
  510. ]
  511. },
  512. {
  513. "cell_type": "markdown",
  514. "metadata": {},
  515. "source": [
  516. "## 初始化loss和梯度检查\n",
  517. "\n",
  518. "刚开始要做完整性检查,运行以下代码来检查初始loss,并对有正则化和无正则化的网络进行梯度检查。请问初始的loss合理吗?\n",
  519. "\n",
  520. "在梯度检查中,你应该期望得到1e-7或更少的errors。"
  521. ]
  522. },
  523. {
  524. "cell_type": "code",
  525. "execution_count": null,
  526. "metadata": {},
  527. "outputs": [],
  528. "source": [
  529. "np.random.seed(231)\n",
  530. "N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
  531. "X = np.random.randn(N, D)\n",
  532. "y = np.random.randint(C, size=(N,))\n",
  533. "\n",
  534. "for reg in [0, 3.14]:\n",
  535. " print('Running check with reg = ', reg)\n",
  536. " model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
  537. " reg=reg, weight_scale=5e-2, dtype=np.float64)\n",
  538. "\n",
  539. " loss, grads = model.loss(X, y)\n",
  540. " print('Initial loss: ', loss)\n",
  541. " \n",
  542. " # Most of the errors should be on the order of e-7 or smaller. \n",
  543. " # NOTE: It is fine however to see an error for W2 on the order of e-5\n",
  544. " # for the check when reg = 0.0\n",
  545. " for name in sorted(grads):\n",
  546. " f = lambda _: model.loss(X, y)[0]\n",
  547. " grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
  548. " print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))"
  549. ]
  550. },
  551. {
  552. "cell_type": "markdown",
  553. "metadata": {},
  554. "source": [
  555. "实现另一个完整性检查,请确保你可以过拟合50个图像的小数据集。首先,我们将尝试一个三层网络,每个隐藏层有100个单元。在接下来的代码中,调整**learning rate**和**weight initialization scale**以达到过拟合,在20 epoch内达到100%的训练精度。"
  556. ]
  557. },
  558. {
  559. "cell_type": "code",
  560. "execution_count": null,
  561. "metadata": {},
  562. "outputs": [],
  563. "source": [
  564. "# TODO: Use a three-layer Net to overfit 50 training examples by \n",
  565. "# tweaking just the learning rate and initialization scale.\n",
  566. "\n",
  567. "num_train = 50\n",
  568. "small_data = {\n",
  569. " 'X_train': data['X_train'][:num_train],\n",
  570. " 'y_train': data['y_train'][:num_train],\n",
  571. " 'X_val': data['X_val'],\n",
  572. " 'y_val': data['y_val'],\n",
  573. "}\n",
  574. "\n",
  575. "weight_scale = 1e-2 # Experiment with this!\n",
  576. "learning_rate = 1e-2 # Experiment with this!\n",
  577. "model = FullyConnectedNet([100, 100],\n",
  578. " weight_scale=weight_scale, dtype=np.float64)\n",
  579. "solver = Solver(model, small_data,\n",
  580. " print_every=10, num_epochs=20, batch_size=25,\n",
  581. " update_rule='sgd',\n",
  582. " optim_config={\n",
  583. " 'learning_rate': learning_rate,\n",
  584. " }\n",
  585. " )\n",
  586. "solver.train()\n",
  587. "\n",
  588. "plt.plot(solver.loss_history, 'o')\n",
  589. "plt.title('Training loss history')\n",
  590. "plt.xlabel('Iteration')\n",
  591. "plt.ylabel('Training loss')\n",
  592. "plt.show()"
  593. ]
  594. },
  595. {
  596. "cell_type": "markdown",
  597. "metadata": {},
  598. "source": [
  599. "现在尝试使用一个五层的网络,每层100个单元,对50张图片进行训练。同样,你将调整learning rate和weight initialization scale比例,你应该能够在20个epoch内实现100%的训练精度。"
  600. ]
  601. },
  602. {
  603. "cell_type": "code",
  604. "execution_count": null,
  605. "metadata": {},
  606. "outputs": [],
  607. "source": [
  608. "# TODO: Use a five-layer Net to overfit 50 training examples by \n",
  609. "# tweaking just the learning rate and initialization scale.\n",
  610. "\n",
  611. "num_train = 50\n",
  612. "small_data = {\n",
  613. " 'X_train': data['X_train'][:num_train],\n",
  614. " 'y_train': data['y_train'][:num_train],\n",
  615. " 'X_val': data['X_val'],\n",
  616. " 'y_val': data['y_val'],\n",
  617. "}\n",
  618. "\n",
  619. "weight_scale = 1e-1 # Experiment with this!\n",
  620. "learning_rate = 2e-3 # Experiment with this!\n",
  621. "model = FullyConnectedNet([100, 100, 100, 100],\n",
  622. " weight_scale=weight_scale, dtype=np.float64)\n",
  623. "solver = Solver(model, small_data,\n",
  624. " print_every=10, num_epochs=20, batch_size=25,\n",
  625. " update_rule='sgd',\n",
  626. " optim_config={\n",
  627. " 'learning_rate': learning_rate,\n",
  628. " }\n",
  629. " )\n",
  630. "solver.train()\n",
  631. "\n",
  632. "plt.plot(solver.loss_history, 'o')\n",
  633. "plt.title('Training loss history')\n",
  634. "plt.xlabel('Iteration')\n",
  635. "plt.ylabel('Training loss')\n",
  636. "plt.show()"
  637. ]
  638. },
  639. {
  640. "cell_type": "markdown",
  641. "metadata": {
  642. "tags": [
  643. "pdf-inline"
  644. ]
  645. },
  646. "source": [
  647. "#### Inline Question 2: \n",
  648. "你注意到训练三层网和训练五层网难度的区别了吗?根据你的经验,哪个网络对initalization scale更敏感?为什么会这样呢?\n",
  649. "\n",
  650. "## Answer:\n",
  651. "[FILL THIS IN]\n"
  652. ]
  653. },
  654. {
  655. "cell_type": "markdown",
  656. "metadata": {},
  657. "source": [
  658. "# 更新规则\n",
  659. "\n",
  660. "到目前为止,我们使用了普通的随机梯度下降法(SGD)作为我们的更新规则。更复杂的更新规则可以更容易地训练深度网络。我们将实现一些最常用的更新规则,并将它们与普通的SGD进行比较。"
  661. ]
  662. },
  663. {
  664. "cell_type": "markdown",
  665. "metadata": {},
  666. "source": [
  667. "# SGD+Momentum\n",
  668. "\n",
  669. "带动量的随机梯度下降法是一种广泛使用的更新规则,它使深度网络的收敛速度快于普通的随机梯度下降法。更多信息参见http://cs231n.github.io/neural-networks-3/#sgd 动量更新部分。\n",
  670. "\n",
  671. "打开文件`daseCV/optim`,并阅读该文件顶部的文档,以确保你理解了该API。在函数`sgd_momentum`中实现SGD+动量更新规则,并运行以下代码检查你的实现。你会看到errors小于e-8。"
  672. ]
  673. },
  674. {
  675. "cell_type": "code",
  676. "execution_count": null,
  677. "metadata": {},
  678. "outputs": [],
  679. "source": [
  680. "from daseCV.optim import sgd_momentum\n",
  681. "\n",
  682. "N, D = 4, 5\n",
  683. "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
  684. "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
  685. "v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
  686. "\n",
  687. "config = {'learning_rate': 1e-3, 'velocity': v}\n",
  688. "next_w, _ = sgd_momentum(w, dw, config=config)\n",
  689. "\n",
  690. "expected_next_w = np.asarray([\n",
  691. " [ 0.1406, 0.20738947, 0.27417895, 0.34096842, 0.40775789],\n",
  692. " [ 0.47454737, 0.54133684, 0.60812632, 0.67491579, 0.74170526],\n",
  693. " [ 0.80849474, 0.87528421, 0.94207368, 1.00886316, 1.07565263],\n",
  694. " [ 1.14244211, 1.20923158, 1.27602105, 1.34281053, 1.4096 ]])\n",
  695. "expected_velocity = np.asarray([\n",
  696. " [ 0.5406, 0.55475789, 0.56891579, 0.58307368, 0.59723158],\n",
  697. " [ 0.61138947, 0.62554737, 0.63970526, 0.65386316, 0.66802105],\n",
  698. " [ 0.68217895, 0.69633684, 0.71049474, 0.72465263, 0.73881053],\n",
  699. " [ 0.75296842, 0.76712632, 0.78128421, 0.79544211, 0.8096 ]])\n",
  700. "\n",
  701. "# Should see relative errors around e-8 or less\n",
  702. "print('next_w error: ', rel_error(next_w, expected_next_w))\n",
  703. "print('velocity error: ', rel_error(expected_velocity, config['velocity']))"
  704. ]
  705. },
  706. {
  707. "cell_type": "markdown",
  708. "metadata": {},
  709. "source": [
  710. "当你完成了上面的步骤,运行以下代码来训练一个具有SGD和SGD+momentum的六层网络。你应该看到SGD+momentum更新规则收敛得更快。\n"
  711. ]
  712. },
  713. {
  714. "cell_type": "code",
  715. "execution_count": null,
  716. "metadata": {},
  717. "outputs": [],
  718. "source": [
  719. "num_train = 4000\n",
  720. "small_data = {\n",
  721. " 'X_train': data['X_train'][:num_train],\n",
  722. " 'y_train': data['y_train'][:num_train],\n",
  723. " 'X_val': data['X_val'],\n",
  724. " 'y_val': data['y_val'],\n",
  725. "}\n",
  726. "\n",
  727. "solvers = {}\n",
  728. "\n",
  729. "for update_rule in ['sgd', 'sgd_momentum']:\n",
  730. " print('running with ', update_rule)\n",
  731. " model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",
  732. "\n",
  733. " solver = Solver(model, small_data,\n",
  734. " num_epochs=5, batch_size=100,\n",
  735. " update_rule=update_rule,\n",
  736. " optim_config={\n",
  737. " 'learning_rate': 5e-3,\n",
  738. " },\n",
  739. " verbose=True)\n",
  740. " solvers[update_rule] = solver\n",
  741. " solver.train()\n",
  742. " print()\n",
  743. "\n",
  744. "plt.subplot(3, 1, 1)\n",
  745. "plt.title('Training loss')\n",
  746. "plt.xlabel('Iteration')\n",
  747. "\n",
  748. "plt.subplot(3, 1, 2)\n",
  749. "plt.title('Training accuracy')\n",
  750. "plt.xlabel('Epoch')\n",
  751. "\n",
  752. "plt.subplot(3, 1, 3)\n",
  753. "plt.title('Validation accuracy')\n",
  754. "plt.xlabel('Epoch')\n",
  755. "\n",
  756. "for update_rule, solver in solvers.items():\n",
  757. " plt.subplot(3, 1, 1)\n",
  758. " plt.plot(solver.loss_history, 'o', label=\"loss_%s\" % update_rule)\n",
  759. " \n",
  760. " plt.subplot(3, 1, 2)\n",
  761. " plt.plot(solver.train_acc_history, '-o', label=\"train_acc_%s\" % update_rule)\n",
  762. "\n",
  763. " plt.subplot(3, 1, 3)\n",
  764. " plt.plot(solver.val_acc_history, '-o', label=\"val_acc_%s\" % update_rule)\n",
  765. " \n",
  766. "for i in [1, 2, 3]:\n",
  767. " plt.subplot(3, 1, i)\n",
  768. " plt.legend(loc='upper center', ncol=4)\n",
  769. "plt.gcf().set_size_inches(15, 15)\n",
  770. "plt.show()"
  771. ]
  772. },
  773. {
  774. "cell_type": "markdown",
  775. "metadata": {},
  776. "source": [
  777. "# RMSProp and Adam\n",
  778. "\n",
  779. "RMSProp [1] 和Adam [2] 是另外两个更新规则,它们通过使用梯度的二阶矩平均值来设置每个参数的学习速率。\n",
  780. "\n",
  781. "在文件`daseCV/optim`中实现`RMSProp`函数和`Adam`函数,并使用下面的代码来检查您的实现。\n",
  782. "\n",
  783. "[1] Tijmen Tieleman and Geoffrey Hinton. \"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.\" COURSERA: Neural Networks for Machine Learning 4 (2012).\n",
  784. "\n",
  785. "[2] Diederik Kingma and Jimmy Ba, \"Adam: A Method for Stochastic Optimization\", ICLR 2015."
  786. ]
  787. },
  788. {
  789. "cell_type": "code",
  790. "execution_count": null,
  791. "metadata": {},
  792. "outputs": [],
  793. "source": [
  794. "# Test RMSProp implementation\n",
  795. "from daseCV.optim import rmsprop\n",
  796. "\n",
  797. "N, D = 4, 5\n",
  798. "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
  799. "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
  800. "cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
  801. "\n",
  802. "config = {'learning_rate': 1e-2, 'cache': cache}\n",
  803. "next_w, _ = rmsprop(w, dw, config=config)\n",
  804. "\n",
  805. "expected_next_w = np.asarray([\n",
  806. " [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],\n",
  807. " [-0.132737, -0.08078555, -0.02881884, 0.02316247, 0.07515774],\n",
  808. " [ 0.12716641, 0.17918792, 0.23122175, 0.28326742, 0.33532447],\n",
  809. " [ 0.38739248, 0.43947102, 0.49155973, 0.54365823, 0.59576619]])\n",
  810. "expected_cache = np.asarray([\n",
  811. " [ 0.5976, 0.6126277, 0.6277108, 0.64284931, 0.65804321],\n",
  812. " [ 0.67329252, 0.68859723, 0.70395734, 0.71937285, 0.73484377],\n",
  813. " [ 0.75037008, 0.7659518, 0.78158892, 0.79728144, 0.81302936],\n",
  814. " [ 0.82883269, 0.84469141, 0.86060554, 0.87657507, 0.8926 ]])\n",
  815. "\n",
  816. "# You should see relative errors around e-7 or less\n",
  817. "print('next_w error: ', rel_error(expected_next_w, next_w))\n",
  818. "print('cache error: ', rel_error(expected_cache, config['cache']))"
  819. ]
  820. },
  821. {
  822. "cell_type": "code",
  823. "execution_count": null,
  824. "metadata": {},
  825. "outputs": [],
  826. "source": [
  827. "# Test Adam implementation\n",
  828. "from daseCV.optim import adam\n",
  829. "\n",
  830. "N, D = 4, 5\n",
  831. "w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)\n",
  832. "dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)\n",
  833. "m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)\n",
  834. "v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)\n",
  835. "\n",
  836. "config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}\n",
  837. "next_w, _ = adam(w, dw, config=config)\n",
  838. "\n",
  839. "expected_next_w = np.asarray([\n",
  840. " [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],\n",
  841. " [-0.1380274, -0.08544591, -0.03286534, 0.01971428, 0.0722929],\n",
  842. " [ 0.1248705, 0.17744702, 0.23002243, 0.28259667, 0.33516969],\n",
  843. " [ 0.38774145, 0.44031188, 0.49288093, 0.54544852, 0.59801459]])\n",
  844. "expected_v = np.asarray([\n",
  845. " [ 0.69966, 0.68908382, 0.67851319, 0.66794809, 0.65738853,],\n",
  846. " [ 0.64683452, 0.63628604, 0.6257431, 0.61520571, 0.60467385,],\n",
  847. " [ 0.59414753, 0.58362676, 0.57311152, 0.56260183, 0.55209767,],\n",
  848. " [ 0.54159906, 0.53110598, 0.52061845, 0.51013645, 0.49966, ]])\n",
  849. "expected_m = np.asarray([\n",
  850. " [ 0.48, 0.49947368, 0.51894737, 0.53842105, 0.55789474],\n",
  851. " [ 0.57736842, 0.59684211, 0.61631579, 0.63578947, 0.65526316],\n",
  852. " [ 0.67473684, 0.69421053, 0.71368421, 0.73315789, 0.75263158],\n",
  853. " [ 0.77210526, 0.79157895, 0.81105263, 0.83052632, 0.85 ]])\n",
  854. "\n",
  855. "# You should see relative errors around e-7 or less\n",
  856. "print('next_w error: ', rel_error(expected_next_w, next_w))\n",
  857. "print('v error: ', rel_error(expected_v, config['v']))\n",
  858. "print('m error: ', rel_error(expected_m, config['m']))"
  859. ]
  860. },
  861. {
  862. "cell_type": "markdown",
  863. "metadata": {},
  864. "source": [
  865. "当你完成了上面RMSProp和Adam函数后,运行下面的代码训练一对网络,其中分别使用了上述两个方法"
  866. ]
  867. },
  868. {
  869. "cell_type": "code",
  870. "execution_count": null,
  871. "metadata": {},
  872. "outputs": [],
  873. "source": [
  874. "learning_rates = {'rmsprop': 1e-4, 'adam': 1e-3}\n",
  875. "for update_rule in ['adam', 'rmsprop']:\n",
  876. " print('running with ', update_rule)\n",
  877. " model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)\n",
  878. "\n",
  879. " solver = Solver(model, small_data,\n",
  880. " num_epochs=5, batch_size=100,\n",
  881. " update_rule=update_rule,\n",
  882. " optim_config={\n",
  883. " 'learning_rate': learning_rates[update_rule]\n",
  884. " },\n",
  885. " verbose=True)\n",
  886. " solvers[update_rule] = solver\n",
  887. " solver.train()\n",
  888. " print()\n",
  889. "\n",
  890. "plt.subplot(3, 1, 1)\n",
  891. "plt.title('Training loss')\n",
  892. "plt.xlabel('Iteration')\n",
  893. "\n",
  894. "plt.subplot(3, 1, 2)\n",
  895. "plt.title('Training accuracy')\n",
  896. "plt.xlabel('Epoch')\n",
  897. "\n",
  898. "plt.subplot(3, 1, 3)\n",
  899. "plt.title('Validation accuracy')\n",
  900. "plt.xlabel('Epoch')\n",
  901. "\n",
  902. "for update_rule, solver in list(solvers.items()):\n",
  903. " plt.subplot(3, 1, 1)\n",
  904. " plt.plot(solver.loss_history, 'o', label=update_rule)\n",
  905. " \n",
  906. " plt.subplot(3, 1, 2)\n",
  907. " plt.plot(solver.train_acc_history, '-o', label=update_rule)\n",
  908. "\n",
  909. " plt.subplot(3, 1, 3)\n",
  910. " plt.plot(solver.val_acc_history, '-o', label=update_rule)\n",
  911. " \n",
  912. "for i in [1, 2, 3]:\n",
  913. " plt.subplot(3, 1, i)\n",
  914. " plt.legend(loc='upper center', ncol=4)\n",
  915. "plt.gcf().set_size_inches(15, 15)\n",
  916. "plt.show()"
  917. ]
  918. },
  919. {
  920. "cell_type": "markdown",
  921. "metadata": {
  922. "tags": [
  923. "pdf-inline"
  924. ]
  925. },
  926. "source": [
  927. "## Inline Question 3:\n",
  928. "\n",
  929. "AdaGrad,类似于Adam,是一个per-parameter优化方法,它使用以下更新规则:\n",
  930. "\n",
  931. "```\n",
  932. "cache += dw**2\n",
  933. "w += - learning_rate * dw / (np.sqrt(cache) + eps)\n",
  934. "```\n",
  935. "\n",
  936. "当使用AdaGrad训练一个网络时,更新的值会变得非常小,而且他的网络学习的非常慢。利用你对AdaGrad更新规则的了解,解释为什么更新的值会变得非常小? Adam会有同样的问题吗?\n",
  937. "\n",
  938. "## Answer: \n",
  939. "[FILL THIS IN]\n"
  940. ]
  941. },
  942. {
  943. "cell_type": "markdown",
  944. "metadata": {},
  945. "source": [
  946. "# 训练一个效果足够好的模型!\n",
  947. "\n",
  948. "在CIFAR-10上尽可能训练最好的全连接模型,将最好的模型存储在`best_model`变量中。我们要求你在验证集上获得至少50%的准确性。\n",
  949. "\n",
  950. "如果你细心的话,应该是有可能得到55%以上精度的,但我们不苛求你达到这么高的精度。在后面的作业上,我们会要求你们在CIFAR-10上训练最好的卷积神经网络,我们希望你们把精力放在卷积网络上,而不是全连接网络上。\n",
  951. "\n",
  952. "在做这部分之前完成`BatchNormalization.ipynb`和`Dropout.ipynb`可能会对你有帮助,因为这些技术可以帮助你训练强大的模型。"
  953. ]
  954. },
  955. {
  956. "cell_type": "code",
  957. "execution_count": null,
  958. "metadata": {},
  959. "outputs": [],
  960. "source": [
  961. "best_model = None\n",
  962. "################################################################################\n",
  963. "# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might #\n",
  964. "# find batch/layer normalization and dropout useful. Store your best model in #\n",
  965. "# the best_model variable. #\n",
  966. "################################################################################\n",
  967. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  968. "\n",
  969. "pass\n",
  970. "\n",
  971. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  972. "################################################################################\n",
  973. "# END OF YOUR CODE #\n",
  974. "################################################################################"
  975. ]
  976. },
  977. {
  978. "cell_type": "markdown",
  979. "metadata": {},
  980. "source": [
  981. "# 测试你的模型!\n",
  982. "\n",
  983. "在验证和测试集上运行您的最佳模型。验证集的准确率应达到50%以上。"
  984. ]
  985. },
  986. {
  987. "cell_type": "code",
  988. "execution_count": null,
  989. "metadata": {},
  990. "outputs": [],
  991. "source": [
  992. "y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)\n",
  993. "y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)\n",
  994. "print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())\n",
  995. "print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())"
  996. ]
  997. },
  998. {
  999. "cell_type": "markdown",
  1000. "metadata": {},
  1001. "source": [
  1002. "---\n",
  1003. "# 重要\n",
  1004. "\n",
  1005. "这里是作业的结尾处,请执行以下步骤:\n",
  1006. "\n",
  1007. "1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n",
  1008. "2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"
  1009. ]
  1010. },
  1011. {
  1012. "cell_type": "code",
  1013. "execution_count": null,
  1014. "metadata": {},
  1015. "outputs": [],
  1016. "source": [
  1017. "import os\n",
  1018. "\n",
  1019. "FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",
  1020. "FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n",
  1021. "\n",
  1022. "for files in FILES_TO_SAVE:\n",
  1023. " with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",
  1024. " f.write(''.join(open(files).readlines()))"
  1025. ]
  1026. }
  1027. ],
  1028. "metadata": {
  1029. "kernelspec": {
  1030. "display_name": "Python 3",
  1031. "language": "python",
  1032. "name": "python3"
  1033. },
  1034. "language_info": {
  1035. "codemirror_mode": {
  1036. "name": "ipython",
  1037. "version": 3
  1038. },
  1039. "file_extension": ".py",
  1040. "mimetype": "text/x-python",
  1041. "name": "python",
  1042. "nbconvert_exporter": "python",
  1043. "pygments_lexer": "ipython3",
  1044. "version": "3.7.0"
  1045. }
  1046. },
  1047. "nbformat": 4,
  1048. "nbformat_minor": 4
  1049. }