DaSE-Computer-Vision-2021
Você não pode selecionar mais de 25 tópicos Os tópicos devem começar com uma letra ou um número, podem incluir traços ('-') e podem ter até 35 caracteres.

560 linhas
18 KiB

  1. {
  2. "cells": [
  3. {
  4. "cell_type": "code",
  5. "execution_count": null,
  6. "metadata": {},
  7. "outputs": [],
  8. "source": [
  9. "from google.colab import drive\n",
  10. "\n",
  11. "drive.mount('/content/drive', force_remount=True)\n",
  12. "\n",
  13. "# 输入daseCV所在的路径\n",
  14. "# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",
  15. "# 例如 'CV/assignments/assignment1/daseCV/'\n",
  16. "FOLDERNAME = None\n",
  17. "\n",
  18. "assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",
  19. "\n",
  20. "%cd drive/My\\ Drive\n",
  21. "%cp -r $FOLDERNAME ../../\n",
  22. "%cd ../../\n",
  23. "%cd daseCV/datasets/\n",
  24. "!bash get_datasets.sh\n",
  25. "%cd ../../"
  26. ]
  27. },
  28. {
  29. "cell_type": "markdown",
  30. "metadata": {
  31. "tags": [
  32. "pdf-title"
  33. ]
  34. },
  35. "source": [
  36. "# 实现一个神经网络\n",
  37. "\n",
  38. "在这个练习中,我们将开发一个具有全连接层的神经网络来进行分类任务,并在CIFAR-10数据集上进行测试。"
  39. ]
  40. },
  41. {
  42. "cell_type": "code",
  43. "execution_count": null,
  44. "metadata": {
  45. "tags": [
  46. "pdf-ignore"
  47. ]
  48. },
  49. "outputs": [],
  50. "source": [
  51. "# 一些初始化设置\n",
  52. "\n",
  53. "import numpy as np\n",
  54. "import matplotlib.pyplot as plt\n",
  55. "\n",
  56. "from daseCV.classifiers.neural_net import TwoLayerNet\n",
  57. "\n",
  58. "%matplotlib inline\n",
  59. "plt.rcParams['figure.figsize'] = (10.0, 8.0) # 设置默认绘图大小\n",
  60. "plt.rcParams['image.interpolation'] = 'nearest'\n",
  61. "plt.rcParams['image.cmap'] = 'gray'\n",
  62. "\n",
  63. "# 自动重载外部模块的详细资料可以查看下面链接\n",
  64. "# http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
  65. "%load_ext autoreload\n",
  66. "%autoreload 2\n",
  67. "\n",
  68. "def rel_error(x, y):\n",
  69. " \"\"\" returns relative error \"\"\"\n",
  70. " return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))"
  71. ]
  72. },
  73. {
  74. "cell_type": "markdown",
  75. "metadata": {
  76. "tags": [
  77. "pdf-ignore"
  78. ]
  79. },
  80. "source": [
  81. "在文件`daseCV/classifiers/neural_net`中使用一个类`TwoLayerNet`表示我们的网络实例。网络参数存储在实例变量`self.params`中, 其中键是参数名,值是numpy数组。\n",
  82. "下面,我们初始化玩具数据和一个玩具模型,我们将使用它来开发具体代码。"
  83. ]
  84. },
  85. {
  86. "cell_type": "code",
  87. "execution_count": null,
  88. "metadata": {
  89. "tags": [
  90. "pdf-ignore"
  91. ]
  92. },
  93. "outputs": [],
  94. "source": [
  95. "# 创建一个小网络和一些玩具数据\n",
  96. "# 注意,我们设置了可重复实验的随机种子。\n",
  97. "\n",
  98. "input_size = 4\n",
  99. "hidden_size = 10\n",
  100. "num_classes = 3\n",
  101. "num_inputs = 5\n",
  102. "\n",
  103. "def init_toy_model():\n",
  104. " np.random.seed(0)\n",
  105. " return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)\n",
  106. "\n",
  107. "def init_toy_data():\n",
  108. " np.random.seed(1)\n",
  109. " X = 10 * np.random.randn(num_inputs, input_size)\n",
  110. " y = np.array([0, 1, 2, 2, 1])\n",
  111. " return X, y\n",
  112. "\n",
  113. "net = init_toy_model()\n",
  114. "X, y = init_toy_data()"
  115. ]
  116. },
  117. {
  118. "cell_type": "markdown",
  119. "metadata": {},
  120. "source": [
  121. "# 前向传播:计算scores\n",
  122. "\n",
  123. "打开文件`daseCV/classifiers/neural_net`,查看`TwoLayerNet.loss`函数。这个函数与你之前在SVM和Softmax写的损失函数非常相似:输入数据和权重,计算类别的scores、loss和参数上的梯度。\n",
  124. "\n",
  125. "实现前向传播的第一部分:使用权重和偏差来计算所有输入的scores。"
  126. ]
  127. },
  128. {
  129. "cell_type": "code",
  130. "execution_count": null,
  131. "metadata": {},
  132. "outputs": [],
  133. "source": [
  134. "scores = net.loss(X)\n",
  135. "print('Your scores:')\n",
  136. "print(scores)\n",
  137. "print()\n",
  138. "print('correct scores:')\n",
  139. "correct_scores = np.asarray([\n",
  140. " [-0.81233741, -1.27654624, -0.70335995],\n",
  141. " [-0.17129677, -1.18803311, -0.47310444],\n",
  142. " [-0.51590475, -1.01354314, -0.8504215 ],\n",
  143. " [-0.15419291, -0.48629638, -0.52901952],\n",
  144. " [-0.00618733, -0.12435261, -0.15226949]])\n",
  145. "print(correct_scores)\n",
  146. "print()\n",
  147. "\n",
  148. "# The difference should be very small. We get < 1e-7\n",
  149. "print('Difference between your scores and correct scores:')\n",
  150. "print(np.sum(np.abs(scores - correct_scores)))"
  151. ]
  152. },
  153. {
  154. "cell_type": "markdown",
  155. "metadata": {},
  156. "source": [
  157. "# 反向传播: 计算损失\n",
  158. "\n",
  159. "在同一个函数中,编码实现第二个部分,计算损失值。"
  160. ]
  161. },
  162. {
  163. "cell_type": "code",
  164. "execution_count": null,
  165. "metadata": {},
  166. "outputs": [],
  167. "source": [
  168. "loss, _ = net.loss(X, y, reg=0.05) #reg为0.1\n",
  169. "correct_loss = 1.30378789133\n",
  170. "\n",
  171. "# should be very small, we get < 1e-12\n",
  172. "print('Difference between your loss and correct loss:')\n",
  173. "print(np.sum(np.abs(loss - correct_loss)))"
  174. ]
  175. },
  176. {
  177. "cell_type": "markdown",
  178. "metadata": {},
  179. "source": [
  180. "# 反向传播\n",
  181. "\n",
  182. "实现函数的其余部分。计算关于变量`W1`, `b1`, `W2`, `b2`的梯度。当你正确实现了前向传播的代码后(hopefully!),你可以用数值梯度检查debug你的反向传播:"
  183. ]
  184. },
  185. {
  186. "cell_type": "code",
  187. "execution_count": null,
  188. "metadata": {},
  189. "outputs": [],
  190. "source": [
  191. "from daseCV.gradient_check import eval_numerical_gradient\n",
  192. "\n",
  193. "# 使用数值梯度检查反向传播的代码。\n",
  194. "# 如果你的代码是正确的,那么对于W1、W2、b1和b2,\n",
  195. "# 数值梯度和解析梯度之间的差异应该小于1e-8。\n",
  196. "\n",
  197. "loss, grads = net.loss(X, y, reg=0.05)\n",
  198. "\n",
  199. "# these should all be less than 1e-8 or so\n",
  200. "for param_name in grads:\n",
  201. " f = lambda W: net.loss(X, y, reg=0.05)[0]\n",
  202. " param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)\n",
  203. " print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))"
  204. ]
  205. },
  206. {
  207. "cell_type": "markdown",
  208. "metadata": {},
  209. "source": [
  210. "# 训练网络\n",
  211. "\n",
  212. "我们使用随机梯度下降(SGD)训练网络,类似于SVM和Softmax。查看`TwoLayerNet.train`函数并填写训练代码中缺失的部分。这与SVM和Softmax分类器的训练过程非常相似。您还必须实现`TwoLayerNet.predict`,即在网络训练过程中周期性地进行预测,以持续追踪网络的准确率\n",
  213. "\n",
  214. "当你完成了这个函数吼,运行下面的代码,在玩具数据上训练一个两层网络。你的训练损失应该少于0.02。"
  215. ]
  216. },
  217. {
  218. "cell_type": "code",
  219. "execution_count": null,
  220. "metadata": {},
  221. "outputs": [],
  222. "source": [
  223. "net = init_toy_model()\n",
  224. "stats = net.train(X, y, X, y,\n",
  225. " learning_rate=1e-1, reg=5e-6,\n",
  226. " num_iters=100, verbose=False)\n",
  227. "\n",
  228. "print('Final training loss: ', stats['loss_history'][-1])\n",
  229. "\n",
  230. "# plot the loss history\n",
  231. "plt.plot(stats['loss_history'])\n",
  232. "plt.xlabel('iteration')\n",
  233. "plt.ylabel('training loss')\n",
  234. "plt.title('Training Loss history')\n",
  235. "plt.show()"
  236. ]
  237. },
  238. {
  239. "cell_type": "markdown",
  240. "metadata": {},
  241. "source": [
  242. "# 加载数据\n",
  243. "\n",
  244. "现在你已经实现了一个两层的神经网络,通过了梯度检查,并且在玩具数据有效工作,现在可以加载我们喜欢的CIFAR-10数据了(我不喜欢(╯‵□′)╯︵┴─┴ ),这样就可以训练真实数据集上的分类器。"
  245. ]
  246. },
  247. {
  248. "cell_type": "code",
  249. "execution_count": null,
  250. "metadata": {
  251. "tags": [
  252. "pdf-ignore"
  253. ]
  254. },
  255. "outputs": [],
  256. "source": [
  257. "from daseCV.data_utils import load_CIFAR10\n",
  258. "\n",
  259. "def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):\n",
  260. " \"\"\"\n",
  261. " Load the CIFAR-10 dataset from disk and perform preprocessing to prepare\n",
  262. " it for the two-layer neural net classifier. These are the same steps as\n",
  263. " we used for the SVM, but condensed to a single function. \n",
  264. " \"\"\"\n",
  265. " # Load the raw CIFAR-10 data\n",
  266. " cifar10_dir = 'daseCV/datasets/cifar-10-batches-py'\n",
  267. " \n",
  268. " # 清除变量,防止多次加载数据(这可能会导致内存问题)\n",
  269. " try:\n",
  270. " del X_train, y_train\n",
  271. " del X_test, y_test\n",
  272. " print('Clear previously loaded data.')\n",
  273. " except:\n",
  274. " pass\n",
  275. "\n",
  276. " X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)\n",
  277. " \n",
  278. " # Subsample the data\n",
  279. " mask = list(range(num_training, num_training + num_validation))\n",
  280. " X_val = X_train[mask]\n",
  281. " y_val = y_train[mask]\n",
  282. " mask = list(range(num_training))\n",
  283. " X_train = X_train[mask]\n",
  284. " y_train = y_train[mask]\n",
  285. " mask = list(range(num_test))\n",
  286. " X_test = X_test[mask]\n",
  287. " y_test = y_test[mask]\n",
  288. "\n",
  289. " # Normalize the data: subtract the mean image\n",
  290. " mean_image = np.mean(X_train, axis=0)\n",
  291. " X_train -= mean_image\n",
  292. " X_val -= mean_image\n",
  293. " X_test -= mean_image\n",
  294. "\n",
  295. " # Reshape data to rows\n",
  296. " X_train = X_train.reshape(num_training, -1)\n",
  297. " X_val = X_val.reshape(num_validation, -1)\n",
  298. " X_test = X_test.reshape(num_test, -1)\n",
  299. "\n",
  300. " return X_train, y_train, X_val, y_val, X_test, y_test\n",
  301. "\n",
  302. "\n",
  303. "# Invoke the above function to get our data.\n",
  304. "X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()\n",
  305. "print('Train data shape: ', X_train.shape)\n",
  306. "print('Train labels shape: ', y_train.shape)\n",
  307. "print('Validation data shape: ', X_val.shape)\n",
  308. "print('Validation labels shape: ', y_val.shape)\n",
  309. "print('Test data shape: ', X_test.shape)\n",
  310. "print('Test labels shape: ', y_test.shape)"
  311. ]
  312. },
  313. {
  314. "cell_type": "markdown",
  315. "metadata": {},
  316. "source": [
  317. "# 训练网络\n",
  318. "\n",
  319. "我们使用SGD训练网络。此外,在训练过程中,我们采用指数学习率衰减计划,把学习率乘以衰减率来降低学习率。"
  320. ]
  321. },
  322. {
  323. "cell_type": "code",
  324. "execution_count": null,
  325. "metadata": {
  326. "tags": [
  327. "code"
  328. ]
  329. },
  330. "outputs": [],
  331. "source": [
  332. "input_size = 32 * 32 * 3\n",
  333. "hidden_size = 50\n",
  334. "num_classes = 10\n",
  335. "net = TwoLayerNet(input_size, hidden_size, num_classes)\n",
  336. "\n",
  337. "# Train the network\n",
  338. "stats = net.train(X_train, y_train, X_val, y_val,\n",
  339. " num_iters=1000, batch_size=200,\n",
  340. " learning_rate=1e-4, learning_rate_decay=0.95,\n",
  341. " reg=0.25, verbose=True)\n",
  342. "\n",
  343. "# Predict on the validation set\n",
  344. "val_acc = (net.predict(X_val) == y_val).mean()\n",
  345. "print('Validation accuracy: ', val_acc)\n"
  346. ]
  347. },
  348. {
  349. "cell_type": "markdown",
  350. "metadata": {},
  351. "source": [
  352. "# Debug 训练过程\n",
  353. "\n",
  354. "使用默认参数,验证集的验证精度应该在0.29左右。太差了\n",
  355. "\n",
  356. "解决这个问题的一种策略是在训练过程中绘制损失函数, 以及训练集和验证集的准确度。\n",
  357. "\n",
  358. "另一种策略是把网络的第一层权重可视化。在大多数以视觉数据为训练对象的神经网络中,第一层的权值在可视化时通常会显示有趣的结构。"
  359. ]
  360. },
  361. {
  362. "cell_type": "code",
  363. "execution_count": null,
  364. "metadata": {},
  365. "outputs": [],
  366. "source": [
  367. "# Plot the loss function and train / validation accuracies\n",
  368. "plt.subplot(2, 1, 1)\n",
  369. "plt.plot(stats['loss_history'])\n",
  370. "plt.title('Loss history')\n",
  371. "plt.xlabel('Iteration')\n",
  372. "plt.ylabel('Loss')\n",
  373. "\n",
  374. "plt.subplot(2, 1, 2)\n",
  375. "plt.plot(stats['train_acc_history'], label='train')\n",
  376. "plt.plot(stats['val_acc_history'], label='val')\n",
  377. "plt.title('Classification accuracy history')\n",
  378. "plt.xlabel('Epoch')\n",
  379. "plt.ylabel('Classification accuracy')\n",
  380. "plt.legend()\n",
  381. "plt.show()"
  382. ]
  383. },
  384. {
  385. "cell_type": "code",
  386. "execution_count": null,
  387. "metadata": {},
  388. "outputs": [],
  389. "source": [
  390. "from daseCV.vis_utils import visualize_grid\n",
  391. "\n",
  392. "# Visualize the weights of the network\n",
  393. "\n",
  394. "def show_net_weights(net):\n",
  395. " W1 = net.params['W1']\n",
  396. " W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)\n",
  397. " plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))\n",
  398. " plt.gca().axis('off')\n",
  399. " plt.show()\n",
  400. "\n",
  401. "show_net_weights(net)"
  402. ]
  403. },
  404. {
  405. "cell_type": "markdown",
  406. "metadata": {},
  407. "source": [
  408. "# 调整超参数\n",
  409. "\n",
  410. "**What's wrong?**. 查看上面的可视化,我们可以看到损失或多或少是线性下降的,这似乎表明学习率可能太小了。此外,训练的准确度和验证的准确度之间没有差距,这说明我们使用的模型容量较小,我们应该增加模型的大小。另一方面,对于一个非常大的模型,我们期望看到更多的过拟合,这表现为训练和验证准确度之间有非常大的差距。\n",
  411. "\n",
  412. "**Tuning**. 调整超参数并了解它们如何影响最终的性能是使用神经网络的一个重要部分,因此我们希望你进行大量实践。下面,你应该试验各种超参数的不同值,包括隐层大小、学习率、训练周期数和正则化强度。你也可以考虑调整学习速率衰减,但是这个实验中默认值应该能够获得良好的性能。\n",
  413. "\n",
  414. "**Approximate results**. 你应该在验证集上获得超过48%的分类准确率。我们最好的模型在验证集上获得超过52%的准确率。\n",
  415. "\n",
  416. "**Experiment**: 在这个练习中,你的任务是使用一个全连接的神经网络,在CIFAR-10上获得尽可能好的结果(52%可以作为参考)。您可以自由地实现自己的技术(例如,使用PCA来降低维度,或添加dropout,或添加特征,等等)。"
  417. ]
  418. },
  419. {
  420. "cell_type": "markdown",
  421. "metadata": {
  422. "tags": [
  423. "pdf-inline"
  424. ]
  425. },
  426. "source": [
  427. "**在下面说明你的超参数搜索过程**\n",
  428. "\n",
  429. "$\\color{blue}{你的回答: }$"
  430. ]
  431. },
  432. {
  433. "cell_type": "code",
  434. "execution_count": null,
  435. "metadata": {
  436. "tags": [
  437. "code"
  438. ]
  439. },
  440. "outputs": [],
  441. "source": [
  442. "best_net = None # store the best model into this \n",
  443. "\n",
  444. "#################################################################################\n",
  445. "# TODO:使用验证集调整超参数。 将您的最佳模型存储在best_net中。\n",
  446. "# 使用上面用过的可视化手段可能能够帮助你调试网络。\n",
  447. "# 可视化结果与上面比较差的网络有明显的差别。\n",
  448. "# 手工调整超参数可能很有趣,但是你会发现编写代码自动扫描超参数的可能组合会很有用。 \n",
  449. "#################################################################################\n",
  450. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  451. "\n",
  452. "pass\n",
  453. "\n",
  454. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n"
  455. ]
  456. },
  457. {
  458. "cell_type": "code",
  459. "execution_count": null,
  460. "metadata": {},
  461. "outputs": [],
  462. "source": [
  463. "# visualize the weights of the best network\n",
  464. "show_net_weights(best_net)"
  465. ]
  466. },
  467. {
  468. "cell_type": "markdown",
  469. "metadata": {},
  470. "source": [
  471. "# 在测试集上面测试\n",
  472. "\n",
  473. "当你完成实验时,你可以在测试集上评估你最终的模型;你应该得到48%以上的准确度。"
  474. ]
  475. },
  476. {
  477. "cell_type": "code",
  478. "execution_count": null,
  479. "metadata": {},
  480. "outputs": [],
  481. "source": [
  482. "test_acc = (best_net.predict(X_test) == y_test).mean()\n",
  483. "print('Test accuracy: ', test_acc)"
  484. ]
  485. },
  486. {
  487. "cell_type": "markdown",
  488. "metadata": {
  489. "tags": [
  490. "pdf-inline"
  491. ]
  492. },
  493. "source": [
  494. "**问题 2**\n",
  495. "\n",
  496. "\n",
  497. "现在您已经完成训练了一个神经网络分类器,您可能会发现您的测试精度远远低于训练精度。我们可以用什么方法来缩小这种差距?选出下列正确的选项\n",
  498. "\n",
  499. "1. 在更大的数据集上训练\n",
  500. "2. 增加更多的隐藏单元\n",
  501. "3. 增加正则化强度\n",
  502. "4. 其他\n",
  503. "\n",
  504. "$\\color{blue}{\\textit Your Answer:}$\n",
  505. "\n",
  506. "$\\color{blue}{\\textit Your Explanation:}$\n"
  507. ]
  508. },
  509. {
  510. "cell_type": "markdown",
  511. "metadata": {},
  512. "source": [
  513. "---\n",
  514. "# 重要\n",
  515. "\n",
  516. "这里是作业的结尾处,请执行以下步骤:\n",
  517. "\n",
  518. "1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n",
  519. "2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"
  520. ]
  521. },
  522. {
  523. "cell_type": "code",
  524. "execution_count": null,
  525. "metadata": {},
  526. "outputs": [],
  527. "source": [
  528. "import os\n",
  529. "\n",
  530. "FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",
  531. "FILES_TO_SAVE = ['daseCV/classifiers/neural_net.py']\n",
  532. "\n",
  533. "for files in FILES_TO_SAVE:\n",
  534. " with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",
  535. " f.write(''.join(open(files).readlines()))"
  536. ]
  537. }
  538. ],
  539. "metadata": {
  540. "kernelspec": {
  541. "display_name": "Python 3",
  542. "language": "python",
  543. "name": "python3"
  544. },
  545. "language_info": {
  546. "codemirror_mode": {
  547. "name": "ipython",
  548. "version": 3
  549. },
  550. "file_extension": ".py",
  551. "mimetype": "text/x-python",
  552. "name": "python",
  553. "nbconvert_exporter": "python",
  554. "pygments_lexer": "ipython3",
  555. "version": "3.7.0"
  556. }
  557. },
  558. "nbformat": 4,
  559. "nbformat_minor": 1
  560. }