DaSE-Computer-Vision-2021
Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.

610 lignes
21 KiB

  1. {
  2. "cells": [
  3. {
  4. "cell_type": "code",
  5. "execution_count": null,
  6. "metadata": {},
  7. "outputs": [],
  8. "source": [
  9. "from google.colab import drive\n",
  10. "\n",
  11. "drive.mount('/content/drive', force_remount=True)\n",
  12. "\n",
  13. "# 输入daseCV所在的路径\n",
  14. "# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",
  15. "# 例如 'CV/assignments/assignment1/daseCV/'\n",
  16. "FOLDERNAME = None\n",
  17. "\n",
  18. "assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",
  19. "\n",
  20. "%cd drive/My\\ Drive\n",
  21. "%cp -r $FOLDERNAME ../../\n",
  22. "%cd ../../\n",
  23. "%cd daseCV/datasets/\n",
  24. "!bash get_datasets.sh\n",
  25. "%cd ../../"
  26. ]
  27. },
  28. {
  29. "cell_type": "markdown",
  30. "metadata": {
  31. "tags": [
  32. "pdf-title"
  33. ]
  34. },
  35. "source": [
  36. "# K-近邻算法 (kNN) 练习\n",
  37. "\n",
  38. "*补充并完成本练习。*\n",
  39. "\n",
  40. "kNN分类器包含两个阶段:\n",
  41. "\n",
  42. "- 训练阶段,分类器获取训练数据并简单地记住它。\n",
  43. "- 测试阶段, kNN将测试图像与所有训练图像进行比较,并计算出前k个最相似的训练示例的标签来对每个测试图像进行分类。\n",
  44. "- 对k值进行交叉验证\n",
  45. "\n",
  46. "在本练习中,您将实现这些步骤,并了解基本的图像分类、交叉验证和熟练编写高效矢量化代码的能力。"
  47. ]
  48. },
  49. {
  50. "cell_type": "code",
  51. "execution_count": null,
  52. "metadata": {
  53. "tags": [
  54. "pdf-ignore"
  55. ]
  56. },
  57. "outputs": [],
  58. "source": [
  59. "# 运行notebook的一些初始化代码\n",
  60. "\n",
  61. "import random\n",
  62. "import numpy as np\n",
  63. "from daseCV.data_utils import load_CIFAR10\n",
  64. "import matplotlib.pyplot as plt\n",
  65. "\n",
  66. "# 使得matplotlib的图像在当前页显示而不是新的窗口。\n",
  67. "%matplotlib inline\n",
  68. "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",
  69. "plt.rcParams['image.interpolation'] = 'nearest'\n",
  70. "plt.rcParams['image.cmap'] = 'gray'\n",
  71. "\n",
  72. "# 一些更神奇的,使notebook重新加载外部的python模块;\n",
  73. "# 参见 http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
  74. "%load_ext autoreload\n",
  75. "%autoreload 2"
  76. ]
  77. },
  78. {
  79. "cell_type": "code",
  80. "execution_count": null,
  81. "metadata": {
  82. "tags": [
  83. "pdf-ignore"
  84. ]
  85. },
  86. "outputs": [],
  87. "source": [
  88. "# 加载未处理的 CIFAR-10 数据.\n",
  89. "cifar10_dir = 'daseCV/datasets/cifar-10-batches-py'\n",
  90. "\n",
  91. "# 清理变量以防止多次加载数据(这可能会导致内存问题)\n",
  92. "try:\n",
  93. " del X_train, y_train\n",
  94. " del X_test, y_test\n",
  95. " print('Clear previously loaded data.')\n",
  96. "except:\n",
  97. " pass\n",
  98. "\n",
  99. "X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)\n",
  100. "\n",
  101. "# 作为健全性检查,我们打印出训练和测试数据的形状。\n",
  102. "print('Training data shape: ', X_train.shape)\n",
  103. "print('Training labels shape: ', y_train.shape)\n",
  104. "print('Test data shape: ', X_test.shape)\n",
  105. "print('Test labels shape: ', y_test.shape)"
  106. ]
  107. },
  108. {
  109. "cell_type": "code",
  110. "execution_count": null,
  111. "metadata": {
  112. "tags": [
  113. "pdf-ignore"
  114. ]
  115. },
  116. "outputs": [],
  117. "source": [
  118. "# 可视化数据集中的一些示例。\n",
  119. "# 我们展示了训练图像的所有类别的一些示例。\n",
  120. "classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\n",
  121. "num_classes = len(classes)\n",
  122. "samples_per_class = 7\n",
  123. "for y, cls in enumerate(classes):\n",
  124. " idxs = np.flatnonzero(y_train == y) # flatnonzero表示返回所给数列的非零项的索引值,这里表示返回所有属于y类的索引\n",
  125. " idxs = np.random.choice(idxs, samples_per_class, replace=False) # replace表示抽取的样本是否能重复\n",
  126. " for i, idx in enumerate(idxs):\n",
  127. " plt_idx = i * num_classes + y + 1\n",
  128. " plt.subplot(samples_per_class, num_classes, plt_idx)\n",
  129. " plt.imshow(X_train[idx].astype('uint8'))\n",
  130. " plt.axis('off')\n",
  131. " if i == 0:\n",
  132. " plt.title(cls)\n",
  133. "plt.show()"
  134. ]
  135. },
  136. {
  137. "cell_type": "code",
  138. "execution_count": null,
  139. "metadata": {
  140. "tags": [
  141. "pdf-ignore"
  142. ]
  143. },
  144. "outputs": [],
  145. "source": [
  146. "# 在练习中使用更小的子样本可以提高代码的效率\n",
  147. "num_training = 5000\n",
  148. "mask = list(range(num_training))\n",
  149. "X_train = X_train[mask]\n",
  150. "y_train = y_train[mask]\n",
  151. "\n",
  152. "num_test = 500\n",
  153. "mask = list(range(num_test))\n",
  154. "X_test = X_test[mask]\n",
  155. "y_test = y_test[mask]\n",
  156. "\n",
  157. "# 将图像数据调整为行\n",
  158. "X_train = np.reshape(X_train, (X_train.shape[0], -1))\n",
  159. "X_test = np.reshape(X_test, (X_test.shape[0], -1))\n",
  160. "print(X_train.shape, X_test.shape)"
  161. ]
  162. },
  163. {
  164. "cell_type": "code",
  165. "execution_count": null,
  166. "metadata": {
  167. "tags": [
  168. "pdf-ignore"
  169. ]
  170. },
  171. "outputs": [],
  172. "source": [
  173. "from daseCV.classifiers import KNearestNeighbor\n",
  174. "\n",
  175. "# 创建一个kNN分类器实例。\n",
  176. "# 请记住,kNN分类器的训练并不会做什么: \n",
  177. "# 分类器仅记住数据并且不做进一步处理\n",
  178. "classifier = KNearestNeighbor()\n",
  179. "classifier.train(X_train, y_train)"
  180. ]
  181. },
  182. {
  183. "cell_type": "markdown",
  184. "metadata": {},
  185. "source": [
  186. "现在,我们要使用kNN分类器对测试数据进行分类。回想一下,我们可以将该过程分为两个步骤: \n",
  187. "\n",
  188. "1. 首先,我们必须计算所有测试样本与所有训练样本之间的距离。 \n",
  189. "2. 给定这些距离,对于每个测试示例,我们找到k个最接近的示例,并让它们对标签进行投票\n",
  190. "\n",
  191. "让我们开始计算所有训练和测试示例之间的距离矩阵。 假设有 **Ntr** 的训练样本和 **Nte** 的测试样本, 该过程的结果存储在一个 **Nte x Ntr** 矩阵中,其中每个元素 (i,j) 表示的是第 i 个测试样本和第 j 个 训练样本的距离。\n",
  192. "\n",
  193. "**注意: 在完成此notebook中的三个距离的计算时请不要使用numpy提供的np.linalg.norm()函数。**\n",
  194. "\n",
  195. "首先打开 `daseCV/classifiers/k_nearest_neighbor.py` 并且补充完成函数 `compute_distances_two_loops` ,这个函数使用双重循环(效率十分低下)来计算距离矩阵。"
  196. ]
  197. },
  198. {
  199. "cell_type": "code",
  200. "execution_count": null,
  201. "metadata": {},
  202. "outputs": [],
  203. "source": [
  204. "# 打开 daseCV/classifiers/k_nearest_neighbor.py 并且补充完成\n",
  205. "# compute_distances_two_loops.\n",
  206. "\n",
  207. "# 测试你的代码:\n",
  208. "dists = classifier.compute_distances_two_loops(X_test)\n",
  209. "print(dists.shape)"
  210. ]
  211. },
  212. {
  213. "cell_type": "code",
  214. "execution_count": null,
  215. "metadata": {},
  216. "outputs": [],
  217. "source": [
  218. "# 我们可视化距离矩阵:每行代表一个测试样本与训练样本的距离\n",
  219. "plt.imshow(dists, interpolation='none')\n",
  220. "plt.show()"
  221. ]
  222. },
  223. {
  224. "cell_type": "markdown",
  225. "metadata": {
  226. "tags": [
  227. "pdf-inline"
  228. ]
  229. },
  230. "source": [
  231. "**问题 1** \n",
  232. "\n",
  233. "请注意距离矩阵中的结构化图案,其中某些行或列的可见亮度更高。(请注意,使用默认的配色方案,黑色表示低距离,而白色表示高距离。)\n",
  234. "\n",
  235. "- 数据中导致行亮度更高的原因是什么?\n",
  236. "- 那列方向的是什么原因呢?\n",
  237. "\n",
  238. "$\\color{blue}{\\textit 答:}$ *在这里做出回答*\n",
  239. "\n"
  240. ]
  241. },
  242. {
  243. "cell_type": "code",
  244. "execution_count": null,
  245. "metadata": {},
  246. "outputs": [],
  247. "source": [
  248. "# 现在实现函数predict_labels并运行以下代码:\n",
  249. "# 我们使用k = 1(这是最近的邻居)。\n",
  250. "y_test_pred = classifier.predict_labels(dists, k=1)\n",
  251. "\n",
  252. "# 计算并打印出预测的精度\n",
  253. "num_correct = np.sum(y_test_pred == y_test)\n",
  254. "accuracy = float(num_correct) / num_test\n",
  255. "print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))"
  256. ]
  257. },
  258. {
  259. "cell_type": "markdown",
  260. "metadata": {},
  261. "source": [
  262. "你预期的精度应该为 `27%` 左右。 现在让我们尝试更大的 `k`, 比如 `k = 5`:"
  263. ]
  264. },
  265. {
  266. "cell_type": "code",
  267. "execution_count": null,
  268. "metadata": {},
  269. "outputs": [],
  270. "source": [
  271. "y_test_pred = classifier.predict_labels(dists, k=5)\n",
  272. "num_correct = np.sum(y_test_pred == y_test)\n",
  273. "accuracy = float(num_correct) / num_test\n",
  274. "print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))"
  275. ]
  276. },
  277. {
  278. "cell_type": "markdown",
  279. "metadata": {},
  280. "source": [
  281. "你应该能看到一个比 `k = 1` 稍微好一点的结果。"
  282. ]
  283. },
  284. {
  285. "cell_type": "markdown",
  286. "metadata": {
  287. "tags": [
  288. "pdf-inline"
  289. ]
  290. },
  291. "source": [
  292. "**问题 2**\n",
  293. "\n",
  294. "我们还可以使用其他距离指标,例如L1距离。\n",
  295. "\n",
  296. "记图像 $I_k$ 的每个位置 $(i,j)$ 的像素值为 $p_{ij}^{(k)}$,\n",
  297. "\n",
  298. "所有图像上的所有像素的均值 $\\mu$ 为 \n",
  299. "\n",
  300. "$$\\mu=\\frac{1}{nhw}\\sum_{k=1}^n\\sum_{i=1}^{h}\\sum_{j=1}^{w}p_{ij}^{(k)}$$\n",
  301. "\n",
  302. "并且所有图像的每个像素的均值 $\\mu_{ij}$ 为\n",
  303. "\n",
  304. "$$\\mu_{ij}=\\frac{1}{n}\\sum_{k=1}^np_{ij}^{(k)}.$$\n",
  305. "\n",
  306. "标准差 $\\sigma$ 以及每个像素的标准差 $\\sigma_{ij}$ 的定义与之类似。\n",
  307. "\n",
  308. "以下哪个预处理步骤不会改变使用L1距离的最近邻分类器的效果?选择所有符合条件的答案。\n",
  309. "1. 减去均值 $\\mu$ ($\\tilde{p}_{ij}^{(k)}=p_{ij}^{(k)}-\\mu$.)\n",
  310. "2. 减去每个像素均值 $\\mu_{ij}$ ($\\tilde{p}_{ij}^{(k)}=p_{ij}^{(k)}-\\mu_{ij}$.)\n",
  311. "3. 减去均值 $\\mu$ 然后除以标准偏差 $\\sigma$.\n",
  312. "4. 减去每个像素均值 $\\mu_{ij}$ 并除以每个素标准差 $\\sigma_{ij}$.\n",
  313. "5. 旋转数据的坐标轴。\n",
  314. "\n",
  315. "$\\color{blue}{\\textit 你的回答:}$\n",
  316. "\n",
  317. "\n",
  318. "$\\color{blue}{\\textit 你的解释:}$\n"
  319. ]
  320. },
  321. {
  322. "cell_type": "code",
  323. "execution_count": null,
  324. "metadata": {
  325. "tags": [
  326. "pdf-ignore-input"
  327. ]
  328. },
  329. "outputs": [],
  330. "source": [
  331. "# 现在,通过部分矢量化并且使用单层循环的来加快距离矩阵的计算。\n",
  332. "# 需要实现函数compute_distances_one_loop并运行以下代码:\n",
  333. "\n",
  334. "dists_one = classifier.compute_distances_one_loop(X_test)\n",
  335. "\n",
  336. "# 为了确保我们的矢量化实现正确,我们要保证它的结果与最原始的实现方式结果一致。\n",
  337. "# 有很多方法可以确定两个矩阵是否相似。最简单的方法之一就是Frobenius范数。 \n",
  338. "# 如果您以前从未了解过Frobenius范数,它其实是两个矩阵的所有元素之差的平方和的平方根;\n",
  339. "# 换句话说,就是将矩阵重整为向量并计算它们之间的欧几里得距离。\n",
  340. "\n",
  341. "difference = np.linalg.norm(dists - dists_one, ord='fro')\n",
  342. "print('One loop difference was: %f' % (difference, ))\n",
  343. "if difference < 0.001:\n",
  344. " print('Good! The distance matrices are the same')\n",
  345. "else:\n",
  346. " print('Uh-oh! The distance matrices are different')"
  347. ]
  348. },
  349. {
  350. "cell_type": "code",
  351. "execution_count": null,
  352. "metadata": {
  353. "scrolled": true,
  354. "tags": [
  355. "pdf-ignore-input"
  356. ]
  357. },
  358. "outputs": [],
  359. "source": [
  360. "# 现在完成compute_distances_no_loops实现完全矢量化的版本并运行代码\n",
  361. "dists_two = classifier.compute_distances_no_loops(X_test)\n",
  362. "\n",
  363. "# 检查距离矩阵是否与我们之前计算出的矩阵一致:\n",
  364. "difference = np.linalg.norm(dists - dists_two, ord='fro')\n",
  365. "print('No loop difference was: %f' % (difference, ))\n",
  366. "if difference < 0.001:\n",
  367. " print('Good! The distance matrices are the same')\n",
  368. "else:\n",
  369. " print('Uh-oh! The distance matrices are different')"
  370. ]
  371. },
  372. {
  373. "cell_type": "code",
  374. "execution_count": null,
  375. "metadata": {
  376. "tags": [
  377. "pdf-ignore-input"
  378. ]
  379. },
  380. "outputs": [],
  381. "source": [
  382. "# 让我们比较一下三种实现方式的速度\n",
  383. "def time_function(f, *args):\n",
  384. " \"\"\"\n",
  385. " Call a function f with args and return the time (in seconds) that it took to execute.\n",
  386. " \"\"\"\n",
  387. " import time\n",
  388. " tic = time.time()\n",
  389. " f(*args)\n",
  390. " toc = time.time()\n",
  391. " return toc - tic\n",
  392. "\n",
  393. "two_loop_time = time_function(classifier.compute_distances_two_loops, X_test)\n",
  394. "print('Two loop version took %f seconds' % two_loop_time)\n",
  395. "\n",
  396. "one_loop_time = time_function(classifier.compute_distances_one_loop, X_test)\n",
  397. "print('One loop version took %f seconds' % one_loop_time)\n",
  398. "\n",
  399. "no_loop_time = time_function(classifier.compute_distances_no_loops, X_test)\n",
  400. "print('No loop version took %f seconds' % no_loop_time)\n",
  401. "\n",
  402. "# 你应该会看到使用完全矢量化的实现会有明显更佳的性能!\n",
  403. "\n",
  404. "# 注意:在部分计算机上,当您从两层循环转到单层循环时,\n",
  405. "# 您可能看不到速度的提升,甚至可能会看到速度变慢。"
  406. ]
  407. },
  408. {
  409. "cell_type": "markdown",
  410. "metadata": {},
  411. "source": [
  412. "### 交叉验证\n",
  413. "\n",
  414. "我们已经实现了kNN分类器,并且可以设置k = 5。现在,将通过交叉验证来确定此超参数的最佳值。"
  415. ]
  416. },
  417. {
  418. "cell_type": "code",
  419. "execution_count": null,
  420. "metadata": {
  421. "tags": [
  422. "code"
  423. ]
  424. },
  425. "outputs": [],
  426. "source": [
  427. "num_folds = 5\n",
  428. "k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]\n",
  429. "\n",
  430. "X_train_folds = []\n",
  431. "y_train_folds = []\n",
  432. "################################################################################\n",
  433. "# 需要完成的事情: \n",
  434. "# 将训练数据分成多个部分。拆分后,X_train_folds和y_train_folds均应为长度为num_folds的列表,\n",
  435. "# 其中y_train_folds [i]是X_train_folds [i]中各点的标签向量。\n",
  436. "# 提示:查阅numpy的array_split函数。 \n",
  437. "################################################################################\n",
  438. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  439. "\n",
  440. "pass\n",
  441. "\n",
  442. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  443. "\n",
  444. "# A dictionary holding the accuracies for different values of k that we find when running cross-validation.\n",
  445. "# 一个字典,存储我们进行交叉验证时不同k的值的精度。\n",
  446. "# 运行交叉验证后,k_to_accuracies[k]应该是长度为num_folds的列表,存储了k值下的精度值。\n",
  447. "k_to_accuracies = {}\n",
  448. "\n",
  449. "\n",
  450. "################################################################################\n",
  451. "# 需要完成的事情: \n",
  452. "# 执行k的交叉验证,以找到k的最佳值。\n",
  453. "# 对于每个可能的k值,运行k-最近邻算法 num_folds 次,\n",
  454. "# 在每次循环下,你都会用所有拆分的数据(除了其中一个需要作为验证集)作为训练数据。\n",
  455. "# 然后存储所有的精度结果到k_to_accuracies[k]中。 \n",
  456. "################################################################################\n",
  457. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  458. "\n",
  459. "# 交叉验证。有时候,训练集数量较小(因此验证集的数量更小),人们会使用一种被称为\n",
  460. "# 交叉验证的方法,这种方法更加复杂些。还是用刚才的例子,如果是交叉验证集,我们就\n",
  461. "# 不是取1000个图像,而是将训练集平均分成5份,其中4份用来训练,1份用来验证。然后\n",
  462. "# 我们循环着取其中4份来训练,其中1份来验证,最后取所有5次验证结果的平均值作为算\n",
  463. "# 法验证结果。\n",
  464. "\n",
  465. "for k in k_choices:\n",
  466. " k_to_accuracies[k] = []\n",
  467. " for i in range(num_folds):\n",
  468. " # prepare training data for the current fold\n",
  469. " X_train_fold = np.concatenate([ fold for j, fold in enumerate(X_train_folds) if i != j ])\n",
  470. " y_train_fold = np.concatenate([ fold for j, fold in enumerate(y_train_folds) if i != j ])\n",
  471. " \n",
  472. " # use of k-nearest-neighbor algorithm\n",
  473. " classifier.train(X_train_fold, y_train_fold)\n",
  474. " y_pred_fold = classifier.predict(X_train_folds[i], k=k, num_loops=0)\n",
  475. "\n",
  476. " # Compute the fraction of correctly predicted examples\n",
  477. " num_correct = np.sum(y_pred_fold == y_train_folds[i])\n",
  478. " accuracy = float(num_correct) / X_train_folds[i].shape[0]\n",
  479. " k_to_accuracies[k].append(accuracy)\n",
  480. "\n",
  481. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  482. "\n",
  483. "# 打印出计算的精度\n",
  484. "for k in sorted(k_to_accuracies):\n",
  485. " for accuracy in k_to_accuracies[k]:\n",
  486. " print('k = %d, accuracy = %f' % (k, accuracy))"
  487. ]
  488. },
  489. {
  490. "cell_type": "code",
  491. "execution_count": null,
  492. "metadata": {
  493. "tags": [
  494. "pdf-ignore-input"
  495. ]
  496. },
  497. "outputs": [],
  498. "source": [
  499. "# 绘制原始观察结果\n",
  500. "for k in k_choices:\n",
  501. " accuracies = k_to_accuracies[k]\n",
  502. " plt.scatter([k] * len(accuracies), accuracies)\n",
  503. "\n",
  504. "# 用与标准偏差相对应的误差线绘制趋势线\n",
  505. "accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])\n",
  506. "accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])\n",
  507. "plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)\n",
  508. "plt.title('Cross-validation on k')\n",
  509. "plt.xlabel('k')\n",
  510. "plt.ylabel('Cross-validation accuracy')\n",
  511. "plt.show()"
  512. ]
  513. },
  514. {
  515. "cell_type": "code",
  516. "execution_count": null,
  517. "metadata": {},
  518. "outputs": [],
  519. "source": [
  520. "# 根据上述交叉验证结果,为k选择最佳值,使用所有训练数据重新训练分类器,\n",
  521. "# 并在测试中对其进行测试数据。您应该能够在测试数据上获得28%以上的准确性。\n",
  522. "\n",
  523. "best_k = k_choices[accuracies_mean.argmax()]\n",
  524. "\n",
  525. "classifier = KNearestNeighbor()\n",
  526. "classifier.train(X_train, y_train)\n",
  527. "y_test_pred = classifier.predict(X_test, k=best_k)\n",
  528. "\n",
  529. "# Compute and display the accuracy\n",
  530. "num_correct = np.sum(y_test_pred == y_test)\n",
  531. "accuracy = float(num_correct) / num_test\n",
  532. "print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))"
  533. ]
  534. },
  535. {
  536. "cell_type": "markdown",
  537. "metadata": {
  538. "tags": [
  539. "pdf-inline"
  540. ]
  541. },
  542. "source": [
  543. "**问题 3**\n",
  544. "\n",
  545. "下列关于$k$-NN的陈述中哪些是在分类器中正确的设置,并且对所有的$k$都有效?选择所有符合条件的选项。\n",
  546. "\n",
  547. "1. k-NN分类器的决策边界是线性的。\n",
  548. "2. 1-NN的训练误差将始终低于5-NN。\n",
  549. "3. 1-NN的测试误差将始终低于5-NN。\n",
  550. "4. 使用k-NN分类器对测试示例进行分类所需的时间随训练集的大小而增加。\n",
  551. "5. 以上都不是。\n",
  552. "\n",
  553. "$\\color{blue}{\\textit 你的回答:}$\n",
  554. "\n",
  555. "\n",
  556. "$\\color{blue}{\\textit 你的解释:}$\n",
  557. "\n"
  558. ]
  559. },
  560. {
  561. "cell_type": "markdown",
  562. "metadata": {},
  563. "source": [
  564. "---\n",
  565. "# 重要\n",
  566. "\n",
  567. "这里是作业的结尾处,请执行以下步骤:\n",
  568. "\n",
  569. "1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n",
  570. "2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"
  571. ]
  572. },
  573. {
  574. "cell_type": "code",
  575. "execution_count": null,
  576. "metadata": {},
  577. "outputs": [],
  578. "source": [
  579. "import os\n",
  580. "\n",
  581. "FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",
  582. "FILES_TO_SAVE = ['daseCV/classifiers/k_nearest_neighbor.py']\n",
  583. "\n",
  584. "for files in FILES_TO_SAVE:\n",
  585. " with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",
  586. " f.write(''.join(open(files).readlines()))"
  587. ]
  588. }
  589. ],
  590. "metadata": {
  591. "kernelspec": {
  592. "display_name": "Python 3",
  593. "language": "python",
  594. "name": "python3"
  595. },
  596. "language_info": {
  597. "codemirror_mode": {
  598. "name": "ipython",
  599. "version": 3
  600. },
  601. "file_extension": ".py",
  602. "mimetype": "text/x-python",
  603. "name": "python",
  604. "nbconvert_exporter": "python",
  605. "pygments_lexer": "ipython3",
  606. "version": "3.7.0"
  607. }
  608. },
  609. "nbformat": 4,
  610. "nbformat_minor": 1
  611. }