DaSE-Computer-Vision-2021
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

409 line
13 KiB

  1. {
  2. "cells": [
  3. {
  4. "cell_type": "code",
  5. "execution_count": null,
  6. "metadata": {},
  7. "outputs": [],
  8. "source": [
  9. "from google.colab import drive\n",
  10. "\n",
  11. "drive.mount('/content/drive', force_remount=True)\n",
  12. "\n",
  13. "# 输入daseCV所在的路径\n",
  14. "# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",
  15. "# 例如 'CV/assignments/assignment1/daseCV/'\n",
  16. "FOLDERNAME = None\n",
  17. "\n",
  18. "assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",
  19. "\n",
  20. "%cd drive/My\\ Drive\n",
  21. "%cp -r $FOLDERNAME ../../\n",
  22. "%cd ../../\n",
  23. "%cd daseCV/datasets/\n",
  24. "!bash get_datasets.sh\n",
  25. "%cd ../../"
  26. ]
  27. },
  28. {
  29. "cell_type": "markdown",
  30. "metadata": {
  31. "tags": [
  32. "pdf-title"
  33. ]
  34. },
  35. "source": [
  36. "# 图像特征练习\n",
  37. "*补充并完成本练习。*\n",
  38. "\n",
  39. "我们已经看到,通过在输入图像的像素上训练线性分类器,从而在图像分类任务上达到一个合理的性能。在本练习中,我们将展示我们可以通过对线性分类器(不是在原始像素上,而是在根据原始像素计算出的特征上)进行训练来改善分类性能。\n",
  40. "\n",
  41. "你将在此notebook中完成本练习的所有工作。"
  42. ]
  43. },
  44. {
  45. "cell_type": "code",
  46. "execution_count": null,
  47. "metadata": {
  48. "tags": [
  49. "pdf-ignore"
  50. ]
  51. },
  52. "outputs": [],
  53. "source": [
  54. "import random\n",
  55. "import numpy as np\n",
  56. "from daseCV.data_utils import load_CIFAR10\n",
  57. "import matplotlib.pyplot as plt\n",
  58. "\n",
  59. "\n",
  60. "%matplotlib inline\n",
  61. "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",
  62. "plt.rcParams['image.interpolation'] = 'nearest'\n",
  63. "plt.rcParams['image.cmap'] = 'gray'\n",
  64. "\n",
  65. "# for auto-reloading extenrnal modules\n",
  66. "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
  67. "%load_ext autoreload\n",
  68. "%autoreload 2"
  69. ]
  70. },
  71. {
  72. "cell_type": "markdown",
  73. "metadata": {
  74. "tags": [
  75. "pdf-ignore"
  76. ]
  77. },
  78. "source": [
  79. "## 数据加载\n",
  80. "与之前的练习类似,我们将从磁盘加载CIFAR-10数据。"
  81. ]
  82. },
  83. {
  84. "cell_type": "code",
  85. "execution_count": null,
  86. "metadata": {
  87. "tags": [
  88. "pdf-ignore"
  89. ]
  90. },
  91. "outputs": [],
  92. "source": [
  93. "from daseCV.features import color_histogram_hsv, hog_feature\n",
  94. "\n",
  95. "def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):\n",
  96. " # Load the raw CIFAR-10 data\n",
  97. " cifar10_dir = 'daseCV/datasets/cifar-10-batches-py'\n",
  98. "\n",
  99. " # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)\n",
  100. " try:\n",
  101. " del X_train, y_train\n",
  102. " del X_test, y_test\n",
  103. " print('Clear previously loaded data.')\n",
  104. " except:\n",
  105. " pass\n",
  106. "\n",
  107. " X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)\n",
  108. " \n",
  109. " # Subsample the data\n",
  110. " mask = list(range(num_training, num_training + num_validation))\n",
  111. " X_val = X_train[mask]\n",
  112. " y_val = y_train[mask]\n",
  113. " mask = list(range(num_training))\n",
  114. " X_train = X_train[mask]\n",
  115. " y_train = y_train[mask]\n",
  116. " mask = list(range(num_test))\n",
  117. " X_test = X_test[mask]\n",
  118. " y_test = y_test[mask]\n",
  119. " \n",
  120. " return X_train, y_train, X_val, y_val, X_test, y_test\n",
  121. "\n",
  122. "X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()"
  123. ]
  124. },
  125. {
  126. "cell_type": "markdown",
  127. "metadata": {
  128. "tags": [
  129. "pdf-ignore"
  130. ]
  131. },
  132. "source": [
  133. "## 特征提取\n",
  134. "对于每一张图片我们将会计算它的方向梯度直方图(英語:Histogram of oriented gradient,简称HOG)以及在HSV颜色空间使用色相通道的颜色直方图。\n",
  135. "\n",
  136. "简单来讲,HOG能提取图片的纹理信息而忽略颜色信息,颜色直方图则提取出颜色信息而忽略纹理信息。\n",
  137. "因此,我们希望将两者结合使用而不是单独使用任一个。去实现这个设想是一个十分有趣的事情。\n",
  138. "\n",
  139. "`hog_feature` 和 `color_histogram_hsv`两个函数都可以对单个图像进行运算并返回改图像的一个特征向量。\n",
  140. "extract_features函数输入一个图像集合和一个特征函数列表然后对每张图片运行每个特征函数,\n",
  141. "然后将结果存储在一个矩阵中,矩阵的每一列是单个图像的所有特征向量的串联。"
  142. ]
  143. },
  144. {
  145. "cell_type": "code",
  146. "execution_count": null,
  147. "metadata": {
  148. "scrolled": true,
  149. "tags": [
  150. "pdf-ignore"
  151. ]
  152. },
  153. "outputs": [],
  154. "source": [
  155. "from daseCV.features import *\n",
  156. "\n",
  157. "num_color_bins = 10 # Number of bins in the color histogram\n",
  158. "feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]\n",
  159. "X_train_feats = extract_features(X_train, feature_fns, verbose=True)\n",
  160. "X_val_feats = extract_features(X_val, feature_fns)\n",
  161. "X_test_feats = extract_features(X_test, feature_fns)\n",
  162. "\n",
  163. "# Preprocessing: Subtract the mean feature\n",
  164. "mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)\n",
  165. "X_train_feats -= mean_feat\n",
  166. "X_val_feats -= mean_feat\n",
  167. "X_test_feats -= mean_feat\n",
  168. "\n",
  169. "# Preprocessing: Divide by standard deviation. This ensures that each feature\n",
  170. "# has roughly the same scale.\n",
  171. "std_feat = np.std(X_train_feats, axis=0, keepdims=True)\n",
  172. "X_train_feats /= std_feat\n",
  173. "X_val_feats /= std_feat\n",
  174. "X_test_feats /= std_feat\n",
  175. "\n",
  176. "# Preprocessing: Add a bias dimension\n",
  177. "X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])\n",
  178. "X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])\n",
  179. "X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])"
  180. ]
  181. },
  182. {
  183. "cell_type": "markdown",
  184. "metadata": {},
  185. "source": [
  186. "## 使用特征训练SVM\n",
  187. "使用之前作业完成的多分类SVM代码来训练上面提取的特征。这应该比原始数据直接在SVM上训练会去的更好的效果。"
  188. ]
  189. },
  190. {
  191. "cell_type": "code",
  192. "execution_count": null,
  193. "metadata": {
  194. "tags": [
  195. "code"
  196. ]
  197. },
  198. "outputs": [],
  199. "source": [
  200. "# 使用验证集调整学习率和正则化强度\n",
  201. "\n",
  202. "from daseCV.classifiers.linear_classifier import LinearSVM\n",
  203. "\n",
  204. "learning_rates = [1e-9, 1e-8, 1e-7]\n",
  205. "regularization_strengths = [5e4, 5e5, 5e6]\n",
  206. "\n",
  207. "results = {}\n",
  208. "best_val = -1\n",
  209. "best_svm = None\n",
  210. "\n",
  211. "################################################################################\n",
  212. "# 你需要做的: \n",
  213. "# 使用验证集设置学习率和正则化强度。\n",
  214. "# 这应该与你对SVM所做的验证相同;\n",
  215. "# 将训练最好的的分类器保存在best_svm中。\n",
  216. "# 你可能还想在颜色直方图中使用不同数量的bins。\n",
  217. "# 如果你细心一点应该能够在验证集上获得接近0.44的准确性。 \n",
  218. "################################################################################\n",
  219. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  220. "\n",
  221. "pass\n",
  222. "\n",
  223. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  224. "\n",
  225. "# Print out results.\n",
  226. "for lr, reg in sorted(results):\n",
  227. " train_accuracy, val_accuracy = results[(lr, reg)]\n",
  228. " print('lr %e reg %e train accuracy: %f val accuracy: %f' % (\n",
  229. " lr, reg, train_accuracy, val_accuracy))\n",
  230. " \n",
  231. "print('best validation accuracy achieved during cross-validation: %f' % best_val)"
  232. ]
  233. },
  234. {
  235. "cell_type": "code",
  236. "execution_count": null,
  237. "metadata": {},
  238. "outputs": [],
  239. "source": [
  240. "# Evaluate your trained SVM on the test set\n",
  241. "y_test_pred = best_svm.predict(X_test_feats)\n",
  242. "test_accuracy = np.mean(y_test == y_test_pred)\n",
  243. "print(test_accuracy)"
  244. ]
  245. },
  246. {
  247. "cell_type": "code",
  248. "execution_count": null,
  249. "metadata": {},
  250. "outputs": [],
  251. "source": [
  252. "# 直观了解算法工作原理的一种重要方法是可视化它所犯的错误。\n",
  253. "# 在此可视化中,我们显示了当前系统未正确分类的图像示例。\n",
  254. "# 第一列显示的图像是我们的系统标记为“ plane”,但其真实标记不是“ plane”。\n",
  255. "\n",
  256. "examples_per_class = 8\n",
  257. "classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\n",
  258. "for cls, cls_name in enumerate(classes):\n",
  259. " idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]\n",
  260. " idxs = np.random.choice(idxs, examples_per_class, replace=False)\n",
  261. " for i, idx in enumerate(idxs):\n",
  262. " plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)\n",
  263. " plt.imshow(X_test[idx].astype('uint8'))\n",
  264. " plt.axis('off')\n",
  265. " if i == 0:\n",
  266. " plt.title(cls_name)\n",
  267. "plt.show()"
  268. ]
  269. },
  270. {
  271. "cell_type": "markdown",
  272. "metadata": {
  273. "tags": [
  274. "pdf-inline"
  275. ]
  276. },
  277. "source": [
  278. "**问题 1:**\n",
  279. "\n",
  280. "描述你看到的错误分类结果。你认为他们有道理吗?\n",
  281. "\n",
  282. "$\\color{blue}{\\textit 答:}$ **在这里写上你的回答**"
  283. ]
  284. },
  285. {
  286. "cell_type": "markdown",
  287. "metadata": {},
  288. "source": [
  289. "## 图像特征神经网络\n",
  290. "在之前的练习中,我们看到在原始像素上训练两层神经网络比线性分类器具有更好的分类精度。在这里,我们已经看到使用图像特征的线性分类器优于使用原始像素的线性分类器。\n",
  291. "为了完整起见,我们还应该尝试在图像特征上训练神经网络。这种方法应优于以前所有的方法:你应该能够轻松地在测试集上达到55%以上的分类精度;我们最好的模型可达到约60%的精度。"
  292. ]
  293. },
  294. {
  295. "cell_type": "code",
  296. "execution_count": null,
  297. "metadata": {
  298. "tags": [
  299. "pdf-ignore"
  300. ]
  301. },
  302. "outputs": [],
  303. "source": [
  304. "# Preprocessing: Remove the bias dimension\n",
  305. "# Make sure to run this cell only ONCE\n",
  306. "print(X_train_feats.shape)\n",
  307. "X_train_feats = X_train_feats[:, :-1]\n",
  308. "X_val_feats = X_val_feats[:, :-1]\n",
  309. "X_test_feats = X_test_feats[:, :-1]\n",
  310. "\n",
  311. "print(X_train_feats.shape)"
  312. ]
  313. },
  314. {
  315. "cell_type": "code",
  316. "execution_count": null,
  317. "metadata": {
  318. "tags": [
  319. "code"
  320. ]
  321. },
  322. "outputs": [],
  323. "source": [
  324. "from daseCV.classifiers.neural_net import TwoLayerNet\n",
  325. "\n",
  326. "input_dim = X_train_feats.shape[1]\n",
  327. "hidden_dim = 500\n",
  328. "num_classes = 10\n",
  329. "best_acc = 0.0\n",
  330. "\n",
  331. "net = TwoLayerNet(input_dim, hidden_dim, num_classes)\n",
  332. "best_net = None\n",
  333. "\n",
  334. "################################################################################\n",
  335. "# TODO: 使用图像特征训练两层神经网络。\n",
  336. "# 您可能希望像上一节中那样对各种参数进行交叉验证。\n",
  337. "# 将最佳的模型存储在best_net变量中。 \n",
  338. "################################################################################\n",
  339. "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n",
  340. "\n",
  341. "pass\n",
  342. "\n",
  343. "# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n"
  344. ]
  345. },
  346. {
  347. "cell_type": "code",
  348. "execution_count": null,
  349. "metadata": {},
  350. "outputs": [],
  351. "source": [
  352. "# 在测试集上运行得到的最好的神经网络分类器,应该能够获得55%以上的准确性。\n",
  353. "\n",
  354. "test_acc = (best_net.predict(X_test_feats) == y_test).mean()\n",
  355. "print(test_acc)"
  356. ]
  357. },
  358. {
  359. "cell_type": "markdown",
  360. "metadata": {},
  361. "source": [
  362. "---\n",
  363. "# 重要\n",
  364. "\n",
  365. "这里是作业的结尾处,请执行以下步骤:\n",
  366. "\n",
  367. "1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n",
  368. "2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"
  369. ]
  370. },
  371. {
  372. "cell_type": "code",
  373. "execution_count": null,
  374. "metadata": {},
  375. "outputs": [],
  376. "source": [
  377. "import os\n",
  378. "\n",
  379. "FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",
  380. "FILES_TO_SAVE = []\n",
  381. "\n",
  382. "for files in FILES_TO_SAVE:\n",
  383. " with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",
  384. " f.write(''.join(open(files).readlines()))"
  385. ]
  386. }
  387. ],
  388. "metadata": {
  389. "kernelspec": {
  390. "display_name": "Python 3",
  391. "language": "python",
  392. "name": "python3"
  393. },
  394. "language_info": {
  395. "codemirror_mode": {
  396. "name": "ipython",
  397. "version": 3
  398. },
  399. "file_extension": ".py",
  400. "mimetype": "text/x-python",
  401. "name": "python",
  402. "nbconvert_exporter": "python",
  403. "pygments_lexer": "ipython3",
  404. "version": "3.7.0"
  405. }
  406. },
  407. "nbformat": 4,
  408. "nbformat_minor": 1
  409. }