{
|
|
"cells": [
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.colab import drive\n",
|
|
"\n",
|
|
"drive.mount('/content/drive', force_remount=True)\n",
|
|
"\n",
|
|
"# 输入daseCV所在的路径\n",
|
|
"# 'daseCV' 文件夹包括 '.py', 'classifiers' 和'datasets'文件夹\n",
|
|
"# 例如 'CV/assignments/assignment1/daseCV/'\n",
|
|
"FOLDERNAME = None\n",
|
|
"\n",
|
|
"assert FOLDERNAME is not None, \"[!] Enter the foldername.\"\n",
|
|
"\n",
|
|
"%cd drive/My\\ Drive\n",
|
|
"%cp -r $FOLDERNAME ../../\n",
|
|
"%cd ../../\n",
|
|
"%cd daseCV/datasets/\n",
|
|
"!bash get_datasets.sh\n",
|
|
"%cd ../../"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": [
|
|
"pdf-title"
|
|
]
|
|
},
|
|
"source": [
|
|
"# Dropout\n",
|
|
"Dropout [1] 是一种通过在正向传播中将一些输出随机设置为零,神经网络正则化的方法。在这个练习中,你将实现一个dropout层,并修改你的全连接网络使其可选择的使用dropout\n",
|
|
"\n",
|
|
"[1] [Geoffrey E. Hinton et al, \"Improving neural networks by preventing co-adaptation of feature detectors\", arXiv 2012](https://arxiv.org/abs/1207.0580)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": [
|
|
"pdf-ignore"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# As usual, a bit of setup\n",
|
|
"from __future__ import print_function\n",
|
|
"import time\n",
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"from daseCV.classifiers.fc_net import *\n",
|
|
"from daseCV.data_utils import get_CIFAR10_data\n",
|
|
"from daseCV.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array\n",
|
|
"from daseCV.solver import Solver\n",
|
|
"\n",
|
|
"%matplotlib inline\n",
|
|
"plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",
|
|
"plt.rcParams['image.interpolation'] = 'nearest'\n",
|
|
"plt.rcParams['image.cmap'] = 'gray'\n",
|
|
"\n",
|
|
"# for auto-reloading external modules\n",
|
|
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
|
|
"%load_ext autoreload\n",
|
|
"%autoreload 2\n",
|
|
"\n",
|
|
"\n",
|
|
"def rel_error(x, y):\n",
|
|
" \"\"\" returns relative error \"\"\"\n",
|
|
" return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": [
|
|
"pdf-ignore"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Load the (preprocessed) CIFAR10 data.\n",
|
|
"\n",
|
|
"data = get_CIFAR10_data()\n",
|
|
"for k, v in data.items():\n",
|
|
" print('%s: ' % k, v.shape)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Dropout 正向传播\n",
|
|
"在文件 `daseCV/layers.py` 中完成dropout的正向传播过程。由于dropout在训练和测试期间的行为是不同的,因此请确保两种模式下都实现完成。\n",
|
|
"\n",
|
|
"完成此操作后,运行下面的cell以测试你的代码。"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"np.random.seed(231)\n",
|
|
"x = np.random.randn(500, 500) + 10\n",
|
|
"\n",
|
|
"for p in [0.25, 0.4, 0.7]:\n",
|
|
" out, _ = dropout_forward(x, {'mode': 'train', 'p': p})\n",
|
|
" out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})\n",
|
|
"\n",
|
|
" print('Running tests with p = ', p)\n",
|
|
" print('Mean of input: ', x.mean())\n",
|
|
" print('Mean of train-time output: ', out.mean())\n",
|
|
" print('Mean of test-time output: ', out_test.mean())\n",
|
|
" print('Fraction of train-time output set to zero: ', (out == 0).mean())\n",
|
|
" print('Fraction of test-time output set to zero: ', (out_test == 0).mean())\n",
|
|
" print()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Dropout 反向传播\n",
|
|
"在文件 `daseCV/layers.py` 中完成dropout的反向传播。完成之后运行以下cell以对你的实现代码进行梯度检查。"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"np.random.seed(231)\n",
|
|
"x = np.random.randn(10, 10) + 10\n",
|
|
"dout = np.random.randn(*x.shape)\n",
|
|
"\n",
|
|
"dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}\n",
|
|
"out, cache = dropout_forward(x, dropout_param)\n",
|
|
"dx = dropout_backward(dout, cache)\n",
|
|
"dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)\n",
|
|
"\n",
|
|
"# Error should be around e-10 or less\n",
|
|
"print('dx relative error: ', rel_error(dx, dx_num))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": [
|
|
"pdf-inline"
|
|
]
|
|
},
|
|
"source": [
|
|
"## 问题 1:\n",
|
|
"如果我们不利用inverted dropout,在训练的时候直接将dropout后的值除以 `p`,会发生什么?为什么会这样呢?\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"## 回答:\n",
|
|
"[FILL THIS IN]\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 全连接网络的Dropout\n",
|
|
"\n",
|
|
"修改`daseCV/classifiers/fc_net.py`文件完成使用dropout的部分。具体来说,如果网络的构造函数收到的`dropout`参数值不为1,则应在每个ReLU之后添加一个dropout层。完成之后,运行以下命令以对你的代码进行梯度检查。"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"np.random.seed(231)\n",
|
|
"N, D, H1, H2, C = 2, 15, 20, 30, 10\n",
|
|
"X = np.random.randn(N, D)\n",
|
|
"y = np.random.randint(C, size=(N,))\n",
|
|
"\n",
|
|
"for dropout in [1, 0.75, 0.5]:\n",
|
|
" print('Running check with dropout = ', dropout)\n",
|
|
" model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,\n",
|
|
" weight_scale=5e-2, dtype=np.float64,\n",
|
|
" dropout=dropout, seed=123)\n",
|
|
"\n",
|
|
" loss, grads = model.loss(X, y)\n",
|
|
" print('Initial loss: ', loss)\n",
|
|
" \n",
|
|
" # Relative errors should be around e-6 or less; Note that it's fine\n",
|
|
" # if for dropout=1 you have W2 error be on the order of e-5.\n",
|
|
" for name in sorted(grads):\n",
|
|
" f = lambda _: model.loss(X, y)[0]\n",
|
|
" grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)\n",
|
|
" print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))\n",
|
|
" print()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 正则化实验\n",
|
|
"作为实验,我们将在500个样本上训练一对双层网络:一个不使用dropout,另一个使用概率为0.25的dropout。之后,我们将可视化这两个网络训练和验证的准确度。"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"scrolled": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Train two identical nets, one with dropout and one without\n",
|
|
"np.random.seed(231)\n",
|
|
"num_train = 500\n",
|
|
"small_data = {\n",
|
|
" 'X_train': data['X_train'][:num_train],\n",
|
|
" 'y_train': data['y_train'][:num_train],\n",
|
|
" 'X_val': data['X_val'],\n",
|
|
" 'y_val': data['y_val'],\n",
|
|
"}\n",
|
|
"\n",
|
|
"solvers = {}\n",
|
|
"dropout_choices = [1, 0.25]\n",
|
|
"for dropout in dropout_choices:\n",
|
|
" model = FullyConnectedNet([500], dropout=dropout)\n",
|
|
" print(dropout)\n",
|
|
"\n",
|
|
" solver = Solver(model, small_data,\n",
|
|
" num_epochs=25, batch_size=100,\n",
|
|
" update_rule='adam',\n",
|
|
" optim_config={\n",
|
|
" 'learning_rate': 5e-4,\n",
|
|
" },\n",
|
|
" verbose=True, print_every=100)\n",
|
|
" solver.train()\n",
|
|
" solvers[dropout] = solver\n",
|
|
" print()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Plot train and validation accuracies of the two models\n",
|
|
"\n",
|
|
"train_accs = []\n",
|
|
"val_accs = []\n",
|
|
"for dropout in dropout_choices:\n",
|
|
" solver = solvers[dropout]\n",
|
|
" train_accs.append(solver.train_acc_history[-1])\n",
|
|
" val_accs.append(solver.val_acc_history[-1])\n",
|
|
"\n",
|
|
"plt.subplot(3, 1, 1)\n",
|
|
"for dropout in dropout_choices:\n",
|
|
" plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)\n",
|
|
"plt.title('Train accuracy')\n",
|
|
"plt.xlabel('Epoch')\n",
|
|
"plt.ylabel('Accuracy')\n",
|
|
"plt.legend(ncol=2, loc='lower right')\n",
|
|
" \n",
|
|
"plt.subplot(3, 1, 2)\n",
|
|
"for dropout in dropout_choices:\n",
|
|
" plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)\n",
|
|
"plt.title('Val accuracy')\n",
|
|
"plt.xlabel('Epoch')\n",
|
|
"plt.ylabel('Accuracy')\n",
|
|
"plt.legend(ncol=2, loc='lower right')\n",
|
|
"\n",
|
|
"plt.gcf().set_size_inches(15, 15)\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": [
|
|
"pdf-inline"
|
|
]
|
|
},
|
|
"source": [
|
|
"## 问题 2:\n",
|
|
"对比有无dropout的验证和训练的精度,你对使用dropout作为正则化有何建议?\n",
|
|
"\n",
|
|
"## 回答:\n",
|
|
"[FILL THIS IN]\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": [
|
|
"pdf-inline"
|
|
]
|
|
},
|
|
"source": [
|
|
"## 问题三 3:\n",
|
|
"假设我们正在训练一个深层的全连接网络用以进行图像分类,并隐层之后dropout(通过使用概率p进行参数化)。如果我们担心过度拟合而决定减小隐层的大小(即每层中的节点数)时,应该如何修改p(如果有的话)?\n",
|
|
"\n",
|
|
"## 回答:\n",
|
|
"[FILL THIS IN]\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"# 重要\n",
|
|
"\n",
|
|
"这里是作业的结尾处,请执行以下步骤:\n",
|
|
"\n",
|
|
"1. 点击`File -> Save`或者用`control+s`组合键,确保你最新的的notebook的作业已经保存到谷歌云。\n",
|
|
"2. 执行以下代码确保 `.py` 文件保存回你的谷歌云。"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"FOLDER_TO_SAVE = os.path.join('drive/My Drive/', FOLDERNAME)\n",
|
|
"FILES_TO_SAVE = ['daseCV/classifiers/cnn.py', 'daseCV/classifiers/fc_net.py']\n",
|
|
"\n",
|
|
"for files in FILES_TO_SAVE:\n",
|
|
" with open(os.path.join(FOLDER_TO_SAVE, '/'.join(files.split('/')[1:])), 'w') as f:\n",
|
|
" f.write(''.join(open(files).readlines()))"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.0"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|