sklearn+pytorch第一次试验

This commit is contained in:
yinkanglong_lab
2021-03-25 10:20:16 +08:00
parent f0760dcad4
commit b4eccabfb9
14 changed files with 1239 additions and 84 deletions

View File

@@ -23,7 +23,7 @@
* [2.2. 流形学习](21.md)
* [2.3. 聚类](22.md)
* [2.4. 双聚类](23.md)
* [2.5. 分解成分中的信号(矩阵分解问题)](24.md)
* [2.5. 特征分解降维(矩阵分解问题)](24.md)
* [2.6. 协方差估计](25.md)
* [2.7. 新奇和异常值检测](26.md)
* [2.8. 密度估计](27.md)

13
Sklearn/说明.md Normal file
View File

@@ -0,0 +1,13 @@
## 参考文献
> 附加sklearn分类的实验过程以实战工程索引
* cookbook入门教程以流程索引
* sklearn官方教程以功能索引
* sklearn API以库名称索引
## sklearn理解
1. 按工作流程cookbook数据加载、数据处理、分类、结果验证、模型使用
2. 按官方文档doc各种功能数据加载、特征选择、监督、无监督、评分
3. 按api文档各种类库

View File

@@ -2,27 +2,145 @@
> 原文:<https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html>
**作者** [Soumith Chintala](http://soumith.ch)
<https://www.youtube.com/embed/u7x8RXwLKcA>
## 什么是 PyTorch
PyTorch 是基于以下两个目的而打造的python科学计算框架
* 无缝替换NumPy并且通过利用GPU的算力来实现神经网络的加速。
* 通过自动微分机制,来让神经网络的实现变得更加容易。
## 本次教程的目标:
* 深入了解PyTorch的张量单元以及如何使用Pytorch来搭建神经网络。
* 自己动手训练一个小型神经网络来实现图像的分类。
## 目录
## 1 目录
* [02Pytorch](02Pytorch.md)
* [03Tensor张量](03Tensor.md)
* [04autograd](04Autograd.md)
* [05神经网络](05NN.md)
* [06图像分类器](06Classification.md)
## 2 概述
### 什么是 PyTorch
PyTorch 是基于以下两个目的而打造的python科学计算框架
* 无缝替换NumPy并且通过利用GPU的算力来实现神经网络的加速。
* 通过自动微分机制,来让神经网络的实现变得更加容易。
### 本次教程的目标:
* 深入了解PyTorch的张量单元以及如何使用Pytorch来搭建神经网络。
* 自己动手训练一个小型神经网络来实现图像的分类。
## 3 模块
> 当前需要了解的模块
1. torch
2. torch.Tensor
3. torch.nn.*
4. torch.nn.Function
5. torch.optim
6. torch.util.data
7. torch.util.tensorboard
## 1.1 torch模块
```
import torch
torch.randn()
torch.from_numpy()
torch.linspace()
torch.ones()
torch.eye()
```
包含了多维张量的数据结构以及基于其上的多种数学操作。另外它也提供了多种工具其中一些可以更有效地对张量和任意类型进行序列化。具体包括pytorch张量的生成以及运算、切片、连接等操作还包括神经网络中经常使用的激活函数比如sigmoid、relu、tanh还提供了与numpy的交互操作
* 该模块定义了大量对tensor操作的**方法**能够返回tensor
## 1.2 torch.Tensor模块类
```
import torch
a = torch.tensor()
# 返回操作和自操作
a.exp()
a.exp_()
```
numpy作为Python中数据分析的专业第三方库比Python自带的Math库速度更快。同样的在PyTorch中有一个类似于numpy的库称为Tensor。Tensor可谓是神经网络界的numpy
* 定义了**tensor对象**的一系列操作。自己的操作。
## 1.3 torch.sparse
在做nlp任务时有个特点就是特征矩阵是稀疏矩阵。torch.sparse模块定义了稀疏张量采用的是COO格式主要方法是用一个长整型定义非零元素的位置用浮点数张量定义对应非零元素的值。稀疏张量之间可以做加减乘除和矩阵乘法。从而有效地存储和处理大多数元素为零的张量。
## 1.4 torch.cuda
该模块定义了与cuda运算的一系列函数比如检查系统的cuda是否可用在多GPU情况下查看显示当前进程对应的GPU序号清除GPU上的缓存设置GPU的计算流同步GPU上执行的所有核函数等。
## 1.5 torch.nn.*模块
torch.nn是pytorch神经网络模块化的核心这个模块下面有很多子模块包括卷积层nn.ConvNd和线性层全连接层nn.Linear等。当构建深度学习模型的时候可以通过继承nn.Module类并重写forward方法来实现一个新的神经网络。另外torch.nn中也定义了一系列的损失函数包括平方损失函数torch.nn.MSELoss、交叉熵损失函数torch.nn.CrossEntropyLoss等。
* 定义了一系列对象化的算子层。包含各种tensor表示的参数。包含一系列子模块。对象化后一个算子表示一层的输入输出和操作。而函数化的时候指标是单个路径上的输入输出操作。
* **nn.Module** 表示神经网络对象。包含forward()/parameters()/modules()/zero_grad()
* **nn.Sequential**处理序列化的module
* **nn.Conv1d/nn.Conv2d/nn.Conv3d**卷积层模块
* **nn.MaxPool1d/nn.MaxPool2d/nn.MaxPool3d**最大池化层模块
* **nn.AvgPool1d/nn.AvgPool2d/nn.AvgPool3d**平均池化层
* **nn.ReLU/nn.ELU/nn.Sigmod/nn.Tanh/nn.LogSigmod**线性全连接层
* **nn.Softmin/nn.Softmax/nn.Softshrink/nn.Softsign/nn.Softplus**全连接层
* **nn.BatchNomal1d/nn.BatchNoraml2d/nn.BatchNoraml3d**归一化层
* **nn.RNN**循环神经元层
* **nn.Function**函数化的各种算子。参考下一个
## 1.6 torch.nn.Functional函数模块
该模块定义了一些与神经网络相关的函数包括卷积函数和池化函数等torch.nn中定义的模块一般会调用torch.nn.functional里的函数比如nn.ConvNd会调用torch.nn.functional.convNd函数。另外torch.nn.functional里面还定义了一些不常用的激活函数包括torch.nn.functional.relu6和torch.nn.functional.elu等。
* 定义了一系列函数化的算子。包含各种tensor表示的参数。可以把它看做nn的一个子模块。而且提供更加精细快速的神经网络构建过程
## 1.7 torch.nn.init模块
该模块定义了神经网络权重的初始化包括均匀初始化torch.nn.init.uniform_和正太分布归一化torch.nn.init.normal_等。值得注意得是在pytorch中函数或者方法如果以下划线结尾则这个方法会直接改变作用张量的值因此这些方法会直接改变传入张量的值同时会返回改变后的张量。
## 1.8 torch.optim模块
torch.optim模块定义了一系列的优化器比如**torch.optim.SGD、torch.optim.AdaGrad、torch.optim.RMSProp、torch.optim.Adam**等。还包含学习率衰减的算法的模块torch.optim.lr_scheduler这个模块包含了学习率阶梯下降算法torch.optim.lr_scheduler.StepLR和余弦退火算法torch.optim.lr_scheduler.CosineAnnealingLR
* torch.optim.SGD、
* torch.optim.AdaGrad、
* torch.optim.RMSProp、
* torch.optim.Adam
## 1.9 torch.autograd模块
该模块是pytorch的自动微分算法模块定义了一系列自动微分函数包括torch.autograd.backward函数主要用于在求得损失函数后进行反向梯度传播。torch.autograd.grad函数用于一个标量张量即只有一个分量的张量对另一个张量求导以及在代码中设置不参与求导的部分。另外这个模块还内置了数值梯度功能和检查自动微分引擎是否输出正确结果的功能。
## 1.10 torch.distributed模块
torch.distributed是pytorch的分布式计算模块主要功能是提供pytorch的并行运行环境其主要支持的后端有MPI、Gloo和NCCL三种。pytorch的分布式工作原理主要是启动多个并行的进程每个进程都拥有一个模型的备份然后输入不同的训练数据到多个并行的进程计算损失函数每个进行独立地做反向传播最后对所有进程权重张量的梯度做归约Redue。用到后端的部分主要是数据的广播Broadcast和数据的收集Gather其中前者是把数据从一个节点进程传播到另一个节点进程比如归约后梯度张量的传播后者则把数据从其它节点转移到当前节点比如把梯度张量从其它节点转移到某个特定的节点然后对所有的张量求平均。pytorch的分布式计算模块不但提供了后端的一个包装还提供了一些启动方式来启动多个进程包括但不限于通过网络TCP、环境变量、共享文件等。
## 1.11 torch.distributions模块
该模块提供了一系列类使得pytorch能够对不同的分布进行采样并且生成概率采样过程的计算图。在一些应用过程中比如强化学习经常会使用一个深度学习模型来模拟在不同环境条件下采取的策略其最后的输出是不同动作的概率。当深度学习模型输出概率之后需要根据概率对策略进行采样来模拟当前的策略概率分布最后用梯度下降方法来让最优策略的概率最大这个算法称为策略梯度算法Policy Gradient。实际上因为采样的输出结果是离散的无法直接求导所以不能使用反keh.distributions.Categorical类pytorch还支持其它分布。比如torch.distributions.Normal类支持连续的正太分布的采样可以用于连续的强化学习的策略。
## 1.12 torch.hub模块
该模块提供了一系列预训练的模型供用户使用。比如可以通过torch.hub.list函数来获取某个模型镜像站点的模型信息。通过torch.hub.load来载入预训练的模型载入后的模型可以保存到本地并可以看到这些模型对应类支持的方法。
## 1.13 torch.jit模块
该模块是pytorch的即时编译器模块。这个模块存在的意义是把pytorch的动态图转换成可以优化和序列化的静态图其主要工作原理是通过预先定义好的张量追踪整个动态图的构建过程得到最终构建出来的动态图然后转换为静态图。通过JIT得到的静态图可以被保存并且被pytorch其它前端如C++语言的前端支持。另外JIT也可以用来生成其它格式的神经网络描述文件如ONNX。torch.jit支持两种模式即脚本模式ScriptModule和追踪模式Tracing。两者都能构建静态图区别在于前者支持控制流后者不支持但是前者支持的神经网络模块比后者少。
## 1.14 torch.multiprocessing模块
该模块定义了pytorch中的多进程API可以启动不同的进程每个进程运行不同的深度学习模型并且能够在进程间共享张量。共享的张量可以在CPU上也可以在GPU上多进程API还提供了与python原生的多进程API即multiprocessing库相同的一系列函数包括锁Lock和队列Queue等。
## 1.15 torch.random模块
该模块提供了一系列的方法来保存和设置随机数生成器的状态包括使用get_rng_state函数获取当前随机数生成器的状态set_rng_state函数设置当前随机数生成器状态并且可以使用manual_seed函数来设置随机种子也可以使用initial_seed函数来得到程序初始的随机种子。因为神经网络的训练是一个随机的过程包括数据的输入、权重的初始化都具有一定的随机性。设置一个统一的随机种子可以有效地帮助我们测试不同神经网络地表现有助于调试神经网络地结构。
## 1.16 torch.onnx模块
该模块定义了pytorch导出和载入ONNX格式地深度学习模型描述文件。ONNX格式地存在是为了方便不同深度学习框架之间交换模型。引入这个模块可以方便pytorch导出模型给其它深度学习框架使用或者让pytorch载入其它深度学习框架构建地深度学习模型。
## 1.17 torch.utils模块
该模块提供了一系列地工具来帮助神经网络地训练、测试和结构优化。这个模块主要包含以下6个子模块
### 1 torch.utils.bottleneck模块
该模块可以用来检查深度学习模型中模块地运行时间,从而可以找到性能瓶颈的那些模块,通过优化那些模块的运行时间,从而优化整个深度学习的模型的性能。
### 2 torch.utils.checkpoint模块
该模块可以用来节约深度学习使用的内存。通过前面的介绍我们知道,因为要进行梯度反向传播,在构建计算图的时候需要保存中间的数据,而这些数据大大增加了深度学习的内存消耗。为了减少内存消耗,让迷你批次的大小得到提高,从而提升深度学习模型的性能和优化时的稳定性,我们可以通过这个模块记录中间数据的计算过程,然后丢弃这些中间数据,等需要用到的时候再重新计算这些数据。这个模块设计的核心思想是以计算时间换内存空间,如果使用得当,深度学习模型的性能可以有很大的提升。
### 3 torch.utils.cpp_extension模块
该模块定义了pytorch的C++扩展其主要包含两个类CppExtension定义了使用C++来编写的扩展模块的源代码相关信息CUDAExtension则定义了C++/CUDA编写的扩展模块的源代码相关信息。再某些情况下用户可能使用C++实现某些张量运算和神经网络结构比如pytorch没有类似功能的模块或者类似功能的模块性能比较低该模块就提供了一个方法能够让python来调用C++/CUDA编写的深度学习扩展模块。在底层上这个扩展模块使用了pybind11,保持了接口的轻量性并使得pytorch易于被扩展。
### 4 torch.utils.data模块
该模块引入了 **数据集Dataset和数据载入器DataLoader** 的概念,前者代表包含了所有数据的数据集,通过索引能够得到某一条特定的数据,后者通过对数据集的包装,可以对数据集进行 **随机排列Shuffle和采样Sample** ,得到一系列打乱数据的迷你批次。
### 5 torch.util.dlpacl模块
该模块定义了pytorch张量和DLPackz张量存储格式之间的转换用于不同框架之间张量数据的交换。
### 6 torch.utils.tensorboard模块
该模块是pytorch对TensorBoard数据可视化工具的支持。TensorBoard原来是TensorFlow自带的数据可视化工具能够显示深度学习模型在训练过程中损失函数、张量权重的直方图以及模型训练过程中输出的文本、图像和视频等。TensorBoard的功能非常强大而且是基于可交互的动态网页设计的使用者可以通过预先提供的一系列功能来输出特定的训练过程的细节如某一神经网络层的权重的直方图以及训练过程中某一段时间的损失函数等pytorch支持TensorBoard可视化后在训练过程中可以很方便地观察中间输出地张量也可以方便地调试深度学习模型。

View File

@@ -400,22 +400,3 @@ inputs, labels = data[0].to(device), data[1].to(device)
## 在多个 GPU 上进行训练
如果您想使用所有 GPU 来获得更大的大规模加速,请查看[可选:数据并行](data_parallel_tutorial.html)。
## 我下一步要去哪里?
* [训练神经网络玩视频游戏](../../intermediate/reinforcement_q_learning.html)
* [在 imagenet 上训练最先进的 ResNet 网络](https://github.com/pytorch/examples/tree/master/imagenet)
* [使用生成对抗网络训练人脸生成器](https://github.com/pytorch/examples/tree/master/dcgan)
* [使用递归 LSTM 网络训练单词级语言模型](https://github.com/pytorch/examples/tree/master/word_language_model)
* [更多示例](https://github.com/pytorch/examples)
* [更多教程](https://github.com/pytorch/tutorials)
* [在论坛上讨论 PyTorch](https://discuss.pytorch.org/)
* [在 Slack 上与其他用户聊天](https://pytorch.slack.com/messages/beginner/)
**脚本的总运行时间**2 分钟 39.965 秒)
[下载 Python 源码:`cifar10_tutorial.py`](https://pytorch.org/tutorials/_downloads/ba100c1433c3c42a16709bb6a2ed0f85/cifar10_tutorial.py)
[下载 Jupyter 笔记本:`cifar10_tutorial.ipynb`](https://pytorch.org/tutorials/_downloads/17a7c7cb80916fcdf921097825a0f562/cifar10_tutorial.ipynb)
[由 Sphinx 画廊](https://sphinx-gallery.readthedocs.io)生成的画廊

View File

@@ -9,7 +9,7 @@ PyTorch 提供设计精美的模块和类[`torch.nn`](https://pytorch.org/docs/s
**本教程假定您已经安装了 PyTorch并且熟悉张量操作的基础知识。** (如果您熟悉 Numpy 数组操作,将会发现此处使用的 PyTorch 张量操作几乎相同)。
## MNIST 数据集
## 0 MNIST 数据集
我们将使用经典的 [MNIST](http://deeplearning.net/data/mnist/) 数据集该数据集由手绘数字的黑白图像组成0 到 9 之间)。
@@ -95,7 +95,7 @@ tensor(0) tensor(9)
```
## 从零开始的神经网络(没有`torch.nn`
## 1 从零开始的神经网络(没有`torch.nn`
首先,我们仅使用 PyTorch 张量操作创建模型。 我们假设您已经熟悉神经网络的基础知识。 (如果不是,则可以在 [course.fast.ai](https://course.fast.ai) 中学习它们)。
@@ -253,7 +253,7 @@ tensor(0.0811, grad_fn=<NegBackward>) tensor(1.)
```
## 使用`torch.nn.functional`
## 2 使用`torch.nn.functional`
现在,我们将重构代码,使其执行与以前相同的操作,只是我们将开始利用 PyTorch 的`nn`类使其更加简洁和灵活。 从这里开始的每一步,我们都应该使代码中的一个或多个:更短,更易理解和/或更灵活。
@@ -285,7 +285,7 @@ tensor(0.0811, grad_fn=<NllLossBackward>) tensor(1.)
```
## 使用`nn.Module`重构
## 3 使用`nn.Module`重构
接下来,我们将使用`nn.Module``nn.Parameter`进行更清晰,更简洁的训练循环。 我们将`nn.Module`子类化(它本身是一个类并且能够跟踪状态)。 在这种情况下,我们要创建一个类,该类包含前进步骤的权重,偏置和方法。 `nn.Module`具有许多我们将要使用的属性和方法(例如`.parameters()``.zero_grad()`)。
@@ -385,7 +385,7 @@ tensor(0.0808, grad_fn=<NllLossBackward>)
```
## 使用`nn.Linear`重构
## 4 使用`nn.Linear`重构
我们继续重构我们的代码。 代替手动定义和初始化`self.weights``self.bias`并计算`xb  @ self.weights + self.bias`,我们将对线性层使用 Pytorch 类[`nn.Linear`](https://pytorch.org/docs/stable/nn.html#linear-layers),这将为我们完成所有工作。 Pytorch 具有许多类型的预定义层,可以大大简化我们的代码,并且通常也可以使其速度更快。
@@ -431,7 +431,7 @@ tensor(0.0824, grad_fn=<NllLossBackward>)
```
## 使用`optim`重构
## 5 使用`optim`重构
Pytorch 还提供了一个包含各种优化算法的包`torch.optim`。 我们可以使用优化器中的`step`方法采取向前的步骤,而不是手动更新每个参数。
@@ -494,7 +494,7 @@ tensor(0.0823, grad_fn=<NllLossBackward>)
```
## 使用`Dataset`重构
## 6 使用`Dataset`重构
PyTorch 有一个抽象的`Dataset`类。 数据集可以是具有`__len__`函数(由 Python 的标准`len`函数调用)和具有`__getitem__`函数作为对其进行索引的一种方法。 [本教程](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html)演示了一个不错的示例,该示例创建一个自定义`FacialLandmarkDataset`类作为`Dataset`的子类。
@@ -551,7 +551,7 @@ tensor(0.0819, grad_fn=<NllLossBackward>)
```
## 使用`DataLoader`重构
## 7 使用`DataLoader`重构
Pytorch 的`DataLoader`负责批量管理。 您可以从任何`Dataset`创建一个`DataLoader``DataLoader`使迭代迭代变得更加容易。 不必使用`train_ds[i*bs : i*bs+bs]``DataLoader`会自动为我们提供每个小批量。
@@ -605,7 +605,7 @@ tensor(0.0821, grad_fn=<NllLossBackward>)
得益于 Pytorch 的`nn.Module``nn.Parameter``Dataset``DataLoader`,我们的训练循环现在变得更小,更容易理解。 现在,让我们尝试添加在实践中创建有效模型所需的基本功能。
## 添加验证
## 8 添加验证
在第 1 节中,我们只是试图建立一个合理的训练循环以用于我们的训练数据。 实际上,您也应该**始终**具有[验证集](https://www.fast.ai/2017/11/13/validation-sets/),以便识别您是否过拟合。
@@ -655,7 +655,7 @@ for epoch in range(epochs):
```
## 创建`fit()``get_data()`
## 8 创建`fit()``get_data()`
现在,我们将自己进行一些重构。 由于我们经历了两次相似的过程来计算训练集和验证集的损失,因此我们将其设为自己的函数`loss_batch`,该函数可计算一批损失。
@@ -726,7 +726,7 @@ fit(epochs, model, loss_func, opt, train_dl, valid_dl)
您可以使用这些基本的 3 行代码来训练各种各样的模型。 让我们看看是否可以使用它们来训练卷积神经网络CNN
## 切换到 CNN
## 9 切换到 CNN
现在,我们将构建具有三个卷积层的神经网络。 由于上一节中的任何功能都不假设任何有关模型形式的信息,因此我们将能够使用它们来训练 CNN而无需进行任何修改。
@@ -819,7 +819,7 @@ fit(epochs, model, loss_func, opt, train_dl, valid_dl)
```
## 包装`DataLoader`
## 9 包装`DataLoader`
Our CNN is fairly concise, but it only works with MNIST, because:
@@ -886,7 +886,7 @@ fit(epochs, model, loss_func, opt, train_dl, valid_dl)
```
## 使用您的 GPU
## 10 使用您的 GPU
如果您足够幸运地能够使用具有 CUDA 功能的 GPU可以从大多数云提供商处以每小时 0.50 美元的价格租用一个),则可以使用它来加速代码。 首先检查您的 GPU 是否在 Pytorch 中正常工作:
@@ -959,12 +959,4 @@ fit(epochs, model, loss_func, opt, train_dl, valid_dl)
> * `functional`:一个模块(通常按照惯例导入到 `F` 名称空间中),其中包含激活函数,损失函数等。 以及卷积和线性层等层的无状态版本。
> * `torch.optim`:包含诸如 `SGD` 的优化程序,这些优化程序在后退步骤
> * `Dataset` 中更新 `Parameter` 的权重。 具有 `__len__``__getitem__` 的对象,包括 Pytorch 提供的类,例如 `TensorDataset`
> * `DataLoader`:获取任何 `Dataset` 并创建一个迭代器,该迭代器返回批量数据。
**脚本的总运行时间**0 分钟 57.062 秒)
[下载 Python 源码:`nn_tutorial.py`](../_downloads/a6246751179fbfb7cad9222ef1c16617/nn_tutorial.py)
[下载 Jupyter 笔记本:`nn_tutorial.ipynb`](../_downloads/5ddab57bb7482fbcc76722617dd47324/nn_tutorial.ipynb)
[由 Sphinx 画廊](https://sphinx-gallery.readthedocs.io)生成的画廊
> * `DataLoader`:获取任何 `Dataset` 并创建一个迭代器,该迭代器返回批量数据。

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,914 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\nWhat is `torch.nn` *really*?\n============================\nby Jeremy Howard, `fast.ai <https://www.fast.ai>`_. Thanks to Rachel Thomas and Francisco Ingham.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We recommend running this tutorial as a notebook, not a script. To download the notebook (.ipynb) file,\nclick the link at the top of the page.\n\nPyTorch provides the elegantly designed modules and classes `torch.nn <https://pytorch.org/docs/stable/nn.html>`_ ,\n`torch.optim <https://pytorch.org/docs/stable/optim.html>`_ ,\n`Dataset <https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset>`_ ,\nand `DataLoader <https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader>`_\nto help you create and train neural networks.\nIn order to fully utilize their power and customize\nthem for your problem, you need to really understand exactly what they're\ndoing. To develop this understanding, we will first train basic neural net\non the MNIST data set without using any features from these models; we will\ninitially only use the most basic PyTorch tensor functionality. Then, we will\nincrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or\n``DataLoader`` at a time, showing exactly what each piece does, and how it\nworks to make the code either more concise, or more flexible.\n\n**This tutorial assumes you already have PyTorch installed, and are familiar\nwith the basics of tensor operations.** (If you're familiar with Numpy array\noperations, you'll find the PyTorch tensor operations used here nearly identical).\n\nMNIST data setup\n----------------\n\nWe will use the classic `MNIST <http://deeplearning.net/data/mnist/>`_ dataset,\nwhich consists of black-and-white images of hand-drawn digits (between 0 and 9).\n\nWe will use `pathlib <https://docs.python.org/3/library/pathlib.html>`_\nfor dealing with paths (part of the Python 3 standard library), and will\ndownload the dataset using\n`requests <http://docs.python-requests.org/en/master/>`_. We will only\nimport modules when we use them, so you can see exactly what's being\nused at each point.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pathlib import Path\nimport requests\n\nDATA_PATH = Path(\"data\")\nPATH = DATA_PATH / \"mnist\"\n\nPATH.mkdir(parents=True, exist_ok=True)\n\nURL = \"https://github.com/pytorch/tutorials/raw/master/_static/\"\nFILENAME = \"mnist.pkl.gz\"\n\nif not (PATH / FILENAME).exists():\n content = requests.get(URL + FILENAME).content\n (PATH / FILENAME).open(\"wb\").write(content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This dataset is in numpy array format, and has been stored using pickle,\na python-specific format for serializing data.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pickle\nimport gzip\n\nwith gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each image is 28 x 28, and is being stored as a flattened row of length\n784 (=28x28). Let's take a look at one; we need to reshape it to 2d\nfirst.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from matplotlib import pyplot\nimport numpy as np\n\npyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\nprint(x_train.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to\nconvert our data.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\n\nx_train, y_train, x_valid, y_valid = map(\n torch.tensor, (x_train, y_train, x_valid, y_valid)\n)\nn, c = x_train.shape\nx_train, x_train.shape, y_train.min(), y_train.max()\nprint(x_train, y_train)\nprint(x_train.shape)\nprint(y_train.min(), y_train.max())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Neural net from scratch (no torch.nn)\n---------------------------------------------\n\nLet's first create a model using nothing but PyTorch tensor operations. We're assuming\nyou're already familiar with the basics of neural networks. (If you're not, you can\nlearn them at `course.fast.ai <https://course.fast.ai>`_).\n\nPyTorch provides methods to create random or zero-filled tensors, which we will\nuse to create our weights and bias for a simple linear model. These are just regular\ntensors, with one very special addition: we tell PyTorch that they require a\ngradient. This causes PyTorch to record all of the operations done on the tensor,\nso that it can calculate the gradient during back-propagation *automatically*!\n\nFor the weights, we set ``requires_grad`` **after** the initialization, since we\ndon't want that step included in the gradient. (Note that a trailing ``_`` in\nPyTorch signifies that the operation is performed in-place.)\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>We are initializing the weights here with\n `Xavier initialisation <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_\n (by multiplying with 1/sqrt(n)).</p></div>\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import math\n\nweights = torch.randn(784, 10) / math.sqrt(784)\nweights.requires_grad_()\nbias = torch.zeros(10, requires_grad=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thanks to PyTorch's ability to calculate gradients automatically, we can\nuse any standard Python function (or callable object) as a model! So\nlet's just write a plain matrix multiplication and broadcasted addition\nto create a simple linear model. We also need an activation function, so\nwe'll write `log_softmax` and use it. Remember: although PyTorch\nprovides lots of pre-written loss functions, activation functions, and\nso forth, you can easily write your own using plain python. PyTorch will\neven create fast GPU or vectorized CPU code for your function\nautomatically.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def log_softmax(x):\n return x - x.exp().sum(-1).log().unsqueeze(-1)\n\ndef model(xb):\n return log_softmax(xb @ weights + bias)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above, the ``@`` stands for the dot product operation. We will call\nour function on one batch of data (in this case, 64 images). This is\none *forward pass*. Note that our predictions won't be any better than\nrandom at this stage, since we start with random weights.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bs = 64 # batch size\n\nxb = x_train[0:bs] # a mini-batch from x\npreds = model(xb) # predictions\npreds[0], preds.shape\nprint(preds[0], preds.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you see, the ``preds`` tensor contains not only the tensor values, but also a\ngradient function. We'll use this later to do backprop.\n\nLet's implement negative log-likelihood to use as the loss function\n(again, we can just use standard Python):\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def nll(input, target):\n return -input[range(target.shape[0]), target].mean()\n\nloss_func = nll"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check our loss with our random model, so we can see if we improve\nafter a backprop pass later.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"yb = y_train[0:bs]\nprint(loss_func(preds, yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's also implement a function to calculate the accuracy of our model.\nFor each prediction, if the index with the largest value matches the\ntarget value, then the prediction was correct.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def accuracy(out, yb):\n preds = torch.argmax(out, dim=1)\n return (preds == yb).float().mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check the accuracy of our random model, so we can see if our\naccuracy improves as our loss improves.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(accuracy(preds, yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now run a training loop. For each iteration, we will:\n\n- select a mini-batch of data (of size ``bs``)\n- use the model to make predictions\n- calculate the loss\n- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``\n and ``bias``.\n\nWe now use these gradients to update the weights and bias. We do this\nwithin the ``torch.no_grad()`` context manager, because we do not want these\nactions to be recorded for our next calculation of the gradient. You can read\nmore about how PyTorch's Autograd records operations\n`here <https://pytorch.org/docs/stable/notes/autograd.html>`_.\n\nWe then set the\ngradients to zero, so that we are ready for the next loop.\nOtherwise, our gradients would record a running tally of all the operations\nthat had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is\nalready stored, rather than replacing them).\n\n.. tip:: You can use the standard python debugger to step through PyTorch\n code, allowing you to check the various variable values at each step.\n Uncomment ``set_trace()`` below to try it out.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from IPython.core.debugger import set_trace\n\nlr = 0.5 # learning rate\nepochs = 2 # how many epochs to train for\n\nfor epoch in range(epochs):\n for i in range((n - 1) // bs + 1):\n # set_trace()\n start_i = i * bs\n end_i = start_i + bs\n xb = x_train[start_i:end_i]\n yb = y_train[start_i:end_i]\n pred = model(xb)\n loss = loss_func(pred, yb)\n\n loss.backward()\n with torch.no_grad():\n weights -= weights.grad * lr\n bias -= bias.grad * lr\n weights.grad.zero_()\n bias.grad.zero_()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's it: we've created and trained a minimal neural network (in this case, a\nlogistic regression, since we have no hidden layers) entirely from scratch!\n\nLet's check the loss and accuracy and compare those to what we got\nearlier. We expect that the loss will have decreased and accuracy to\nhave increased, and they have.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using torch.nn.functional\n------------------------------\n\nWe will now refactor our code, so that it does the same thing as before, only\nwe'll start taking advantage of PyTorch's ``nn`` classes to make it more concise\nand flexible. At each step from here, we should be making our code one or more\nof: shorter, more understandable, and/or more flexible.\n\nThe first and easiest step is to make our code shorter by replacing our\nhand-written activation and loss functions with those from ``torch.nn.functional``\n(which is generally imported into the namespace ``F`` by convention). This module\ncontains all the functions in the ``torch.nn`` library (whereas other parts of the\nlibrary contain classes). As well as a wide range of loss and activation\nfunctions, you'll also find here some convenient functions for creating neural\nnets, such as pooling functions. (There are also functions for doing convolutions,\nlinear layers, etc, but as we'll see, these are usually better handled using\nother parts of the library.)\n\nIf you're using negative log likelihood loss and log softmax activation,\nthen Pytorch provides a single function ``F.cross_entropy`` that combines\nthe two. So we can even remove the activation function from our model.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch.nn.functional as F\n\nloss_func = F.cross_entropy\n\ndef model(xb):\n return xb @ weights + bias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we no longer call ``log_softmax`` in the ``model`` function. Let's\nconfirm that our loss and accuracy are the same as before:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Refactor using nn.Module\n-----------------------------\nNext up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more\nconcise training loop. We subclass ``nn.Module`` (which itself is a class and\nable to keep track of state). In this case, we want to create a class that\nholds our weights, bias, and method for the forward step. ``nn.Module`` has a\nnumber of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)\nwhich we will be using.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a\n class we'll be using a lot. ``nn.Module`` is not to be confused with the Python\n concept of a (lowercase ``m``) `module <https://docs.python.org/3/tutorial/modules.html>`_,\n which is a file of Python code that can be imported.</p></div>\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from torch import nn\n\nclass Mnist_Logistic(nn.Module):\n def __init__(self):\n super().__init__()\n self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n self.bias = nn.Parameter(torch.zeros(10))\n\n def forward(self, xb):\n return xb @ self.weights + self.bias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we're now using an object instead of just using a function, we\nfirst have to instantiate our model:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = Mnist_Logistic()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can calculate the loss in the same way as before. Note that\n``nn.Module`` objects are used as if they are functions (i.e they are\n*callable*), but behind the scenes Pytorch will call our ``forward``\nmethod automatically.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Previously for our training loop we had to update the values for each parameter\nby name, and manually zero out the grads for each parameter separately, like this:\n::\n with torch.no_grad():\n weights -= weights.grad * lr\n bias -= bias.grad * lr\n weights.grad.zero_()\n bias.grad.zero_()\n\n\nNow we can take advantage of model.parameters() and model.zero_grad() (which\nare both defined by PyTorch for ``nn.Module``) to make those steps more concise\nand less prone to the error of forgetting some of our parameters, particularly\nif we had a more complicated model:\n::\n with torch.no_grad():\n for p in model.parameters(): p -= p.grad * lr\n model.zero_grad()\n\n\nWe'll wrap our little training loop in a ``fit`` function so we can run it\nagain later.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def fit():\n for epoch in range(epochs):\n for i in range((n - 1) // bs + 1):\n start_i = i * bs\n end_i = start_i + bs\n xb = x_train[start_i:end_i]\n yb = y_train[start_i:end_i]\n pred = model(xb)\n loss = loss_func(pred, yb)\n\n loss.backward()\n with torch.no_grad():\n for p in model.parameters():\n p -= p.grad * lr\n model.zero_grad()\n\nfit()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's double-check that our loss has gone down:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Refactor using nn.Linear\n-------------------------\n\nWe continue to refactor our code. Instead of manually defining and\ninitializing ``self.weights`` and ``self.bias``, and calculating ``xb @\nself.weights + self.bias``, we will instead use the Pytorch class\n`nn.Linear <https://pytorch.org/docs/stable/nn.html#linear-layers>`_ for a\nlinear layer, which does all that for us. Pytorch has many types of\npredefined layers that can greatly simplify our code, and often makes it\nfaster too.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"class Mnist_Logistic(nn.Module):\n def __init__(self):\n super().__init__()\n self.lin = nn.Linear(784, 10)\n\n def forward(self, xb):\n return self.lin(xb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We instantiate our model and calculate the loss in the same way as before:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = Mnist_Logistic()\nprint(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are still able to use our same ``fit`` method as before.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fit()\n\nprint(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Refactor using optim\n------------------------------\n\nPytorch also has a package with various optimization algorithms, ``torch.optim``.\nWe can use the ``step`` method from our optimizer to take a forward step, instead\nof manually updating each parameter.\n\nThis will let us replace our previous manually coded optimization step:\n::\n with torch.no_grad():\n for p in model.parameters(): p -= p.grad * lr\n model.zero_grad()\n\nand instead use just:\n::\n opt.step()\n opt.zero_grad()\n\n(``optim.zero_grad()`` resets the gradient to 0 and we need to call it before\ncomputing the gradient for the next minibatch.)\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from torch import optim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll define a little function to create our model and optimizer so we\ncan reuse it in the future.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def get_model():\n model = Mnist_Logistic()\n return model, optim.SGD(model.parameters(), lr=lr)\n\nmodel, opt = get_model()\nprint(loss_func(model(xb), yb))\n\nfor epoch in range(epochs):\n for i in range((n - 1) // bs + 1):\n start_i = i * bs\n end_i = start_i + bs\n xb = x_train[start_i:end_i]\n yb = y_train[start_i:end_i]\n pred = model(xb)\n loss = loss_func(pred, yb)\n\n loss.backward()\n opt.step()\n opt.zero_grad()\n\nprint(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Refactor using Dataset\n------------------------------\n\nPyTorch has an abstract Dataset class. A Dataset can be anything that has\na ``__len__`` function (called by Python's standard ``len`` function) and\na ``__getitem__`` function as a way of indexing into it.\n`This tutorial <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`_\nwalks through a nice example of creating a custom ``FacialLandmarkDataset`` class\nas a subclass of ``Dataset``.\n\nPyTorch's `TensorDataset <https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#TensorDataset>`_\nis a Dataset wrapping tensors. By defining a length and way of indexing,\nthis also gives us a way to iterate, index, and slice along the first\ndimension of a tensor. This will make it easier to access both the\nindependent and dependent variables in the same line as we train.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from torch.utils.data import TensorDataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,\nwhich will be easier to iterate over and slice.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_ds = TensorDataset(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Previously, we had to iterate through minibatches of x and y values separately:\n::\n xb = x_train[start_i:end_i]\n yb = y_train[start_i:end_i]\n\n\nNow, we can do these two steps together:\n::\n xb,yb = train_ds[i*bs : i*bs+bs]\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model, opt = get_model()\n\nfor epoch in range(epochs):\n for i in range((n - 1) // bs + 1):\n xb, yb = train_ds[i * bs: i * bs + bs]\n pred = model(xb)\n loss = loss_func(pred, yb)\n\n loss.backward()\n opt.step()\n opt.zero_grad()\n\nprint(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Refactor using DataLoader\n------------------------------\n\nPytorch's ``DataLoader`` is responsible for managing batches. You can\ncreate a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier\nto iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,\nthe DataLoader gives us each minibatch automatically.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from torch.utils.data import DataLoader\n\ntrain_ds = TensorDataset(x_train, y_train)\ntrain_dl = DataLoader(train_ds, batch_size=bs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Previously, our loop iterated over batches (xb, yb) like this:\n::\n for i in range((n-1)//bs + 1):\n xb,yb = train_ds[i*bs : i*bs+bs]\n pred = model(xb)\n\nNow, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:\n::\n for xb,yb in train_dl:\n pred = model(xb)\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model, opt = get_model()\n\nfor epoch in range(epochs):\n for xb, yb in train_dl:\n pred = model(xb)\n loss = loss_func(pred, yb)\n\n loss.backward()\n opt.step()\n opt.zero_grad()\n\nprint(loss_func(model(xb), yb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thanks to Pytorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,\nour training loop is now dramatically smaller and easier to understand. Let's\nnow try to add the basic features necessary to create effective models in practice.\n\nAdd validation\n-----------------------\n\nIn section 1, we were just trying to get a reasonable training loop set up for\nuse on our training data. In reality, you **always** should also have\na `validation set <https://www.fast.ai/2017/11/13/validation-sets/>`_, in order\nto identify if you are overfitting.\n\nShuffling the training data is\n`important <https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks>`_\nto prevent correlation between batches and overfitting. On the other hand, the\nvalidation loss will be identical whether we shuffle the validation set or not.\nSince shuffling takes extra time, it makes no sense to shuffle the validation data.\n\nWe'll use a batch size for the validation set that is twice as large as\nthat for the training set. This is because the validation set does not\nneed backpropagation and thus takes less memory (it doesn't need to\nstore the gradients). We take advantage of this to use a larger batch\nsize and compute the loss more quickly.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_ds = TensorDataset(x_train, y_train)\ntrain_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n\nvalid_ds = TensorDataset(x_valid, y_valid)\nvalid_dl = DataLoader(valid_ds, batch_size=bs * 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will calculate and print the validation loss at the end of each epoch.\n\n(Note that we always call ``model.train()`` before training, and ``model.eval()``\nbefore inference, because these are used by layers such as ``nn.BatchNorm2d``\nand ``nn.Dropout`` to ensure appropriate behaviour for these different phases.)\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model, opt = get_model()\n\nfor epoch in range(epochs):\n model.train()\n for xb, yb in train_dl:\n pred = model(xb)\n loss = loss_func(pred, yb)\n\n loss.backward()\n opt.step()\n opt.zero_grad()\n\n model.eval()\n with torch.no_grad():\n valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n\n print(epoch, valid_loss / len(valid_dl))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create fit() and get_data()\n----------------------------------\n\nWe'll now do a little refactoring of our own. Since we go through a similar\nprocess twice of calculating the loss for both the training set and the\nvalidation set, let's make that into its own function, ``loss_batch``, which\ncomputes the loss for one batch.\n\nWe pass an optimizer in for the training set, and use it to perform\nbackprop. For the validation set, we don't pass an optimizer, so the\nmethod doesn't perform backprop.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def loss_batch(model, loss_func, xb, yb, opt=None):\n loss = loss_func(model(xb), yb)\n\n if opt is not None:\n loss.backward()\n opt.step()\n opt.zero_grad()\n\n return loss.item(), len(xb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"``fit`` runs the necessary operations to train our model and compute the\ntraining and validation losses for each epoch.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n\ndef fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n for epoch in range(epochs):\n model.train()\n for xb, yb in train_dl:\n loss_batch(model, loss_func, xb, yb, opt)\n\n model.eval()\n with torch.no_grad():\n losses, nums = zip(\n *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n )\n val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n\n print(epoch, val_loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"``get_data`` returns dataloaders for the training and validation sets.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def get_data(train_ds, valid_ds, bs):\n return (\n DataLoader(train_ds, batch_size=bs, shuffle=True),\n DataLoader(valid_ds, batch_size=bs * 2),\n )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, our whole process of obtaining the data loaders and fitting the\nmodel can be run in 3 lines of code:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\nmodel, opt = get_model()\nfit(epochs, model, loss_func, opt, train_dl, valid_dl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use these basic 3 lines of code to train a wide variety of models.\nLet's see if we can use them to train a convolutional neural network (CNN)!\n\nSwitch to CNN\n-------------\n\nWe are now going to build our neural network with three convolutional layers.\nBecause none of the functions in the previous section assume anything about\nthe model form, we'll be able to use them to train a CNN without any modification.\n\nWe will use Pytorch's predefined\n`Conv2d <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>`_ class\nas our convolutional layer. We define a CNN with 3 convolutional layers.\nEach convolution is followed by a ReLU. At the end, we perform an\naverage pooling. (Note that ``view`` is PyTorch's version of numpy's\n``reshape``)\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"class Mnist_CNN(nn.Module):\n def __init__(self):\n super().__init__()\n self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n\n def forward(self, xb):\n xb = xb.view(-1, 1, 28, 28)\n xb = F.relu(self.conv1(xb))\n xb = F.relu(self.conv2(xb))\n xb = F.relu(self.conv3(xb))\n xb = F.avg_pool2d(xb, 4)\n return xb.view(-1, xb.size(1))\n\nlr = 0.1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Momentum <https://cs231n.github.io/neural-networks-3/#sgd>`_ is a variation on\nstochastic gradient descent that takes previous updates into account as well\nand generally leads to faster training.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = Mnist_CNN()\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n\nfit(epochs, model, loss_func, opt, train_dl, valid_dl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"nn.Sequential\n------------------------\n\n``torch.nn`` has another handy class we can use to simplify our code:\n`Sequential <https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential>`_ .\nA ``Sequential`` object runs each of the modules contained within it, in a\nsequential manner. This is a simpler way of writing our neural network.\n\nTo take advantage of this, we need to be able to easily define a\n**custom layer** from a given function. For instance, PyTorch doesn't\nhave a `view` layer, and we need to create one for our network. ``Lambda``\nwill create a layer that we can then use when defining a network with\n``Sequential``.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"class Lambda(nn.Module):\n def __init__(self, func):\n super().__init__()\n self.func = func\n\n def forward(self, x):\n return self.func(x)\n\n\ndef preprocess(x):\n return x.view(-1, 1, 28, 28)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model created with ``Sequential`` is simply:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = nn.Sequential(\n Lambda(preprocess),\n nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n nn.ReLU(),\n nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n nn.ReLU(),\n nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n nn.ReLU(),\n nn.AvgPool2d(4),\n Lambda(lambda x: x.view(x.size(0), -1)),\n)\n\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n\nfit(epochs, model, loss_func, opt, train_dl, valid_dl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wrapping DataLoader\n-----------------------------\n\nOur CNN is fairly concise, but it only works with MNIST, because:\n - It assumes the input is a 28\\*28 long vector\n - It assumes that the final CNN grid size is 4\\*4 (since that's the average\npooling kernel size we used)\n\nLet's get rid of these two assumptions, so our model works with any 2d\nsingle channel image. First, we can remove the initial Lambda layer by\nmoving the data preprocessing into a generator:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def preprocess(x, y):\n return x.view(-1, 1, 28, 28), y\n\n\nclass WrappedDataLoader:\n def __init__(self, dl, func):\n self.dl = dl\n self.func = func\n\n def __len__(self):\n return len(self.dl)\n\n def __iter__(self):\n batches = iter(self.dl)\n for b in batches:\n yield (self.func(*b))\n\ntrain_dl, valid_dl = get_data(train_ds, valid_ds, bs)\ntrain_dl = WrappedDataLoader(train_dl, preprocess)\nvalid_dl = WrappedDataLoader(valid_dl, preprocess)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which\nallows us to define the size of the *output* tensor we want, rather than\nthe *input* tensor we have. As a result, our model will work with any\nsize input.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = nn.Sequential(\n nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n nn.ReLU(),\n nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n nn.ReLU(),\n nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n nn.ReLU(),\n nn.AdaptiveAvgPool2d(1),\n Lambda(lambda x: x.view(x.size(0), -1)),\n)\n\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try it out:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using your GPU\n---------------\n\nIf you're lucky enough to have access to a CUDA-capable GPU (you can\nrent one for about $0.50/hour from most cloud providers) you can\nuse it to speed up your code. First check that your GPU is working in\nPytorch:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(torch.cuda.is_available())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then create a device object for it:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dev = torch.device(\n \"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's update ``preprocess`` to move batches to the GPU:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def preprocess(x, y):\n return x.view(-1, 1, 28, 28).to(dev), y.to(dev)\n\n\ntrain_dl, valid_dl = get_data(train_ds, valid_ds, bs)\ntrain_dl = WrappedDataLoader(train_dl, preprocess)\nvalid_dl = WrappedDataLoader(valid_dl, preprocess)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can move our model to the GPU.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model.to(dev)\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should find it runs faster now:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Closing thoughts\n-----------------\n\nWe now have a general data pipeline and training loop which you can use for\ntraining many types of models using Pytorch. To see how simple training a model\ncan now be, take a look at the `mnist_sample` sample notebook.\n\nOf course, there are many things you'll want to add, such as data augmentation,\nhyperparameter tuning, monitoring training, transfer learning, and so forth.\nThese features are available in the fastai library, which has been developed\nusing the same design approach shown in this tutorial, providing a natural\nnext step for practitioners looking to take their models further.\n\nWe promised at the start of this tutorial we'd explain through example each of\n``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize\nwhat we've seen:\n\n - **torch.nn**\n\n + ``Module``: creates a callable which behaves like a function, but can also\n contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it\n contains and can zero all their gradients, loop through them for weight updates, etc.\n + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights\n that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated\n + ``functional``: a module(usually imported into the ``F`` namespace by convention)\n which contains activation functions, loss functions, etc, as well as non-stateful\n versions of layers such as convolutional and linear layers.\n - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights\n of ``Parameter`` during the backward step\n - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,\n including classes provided with Pytorch such as ``TensorDataset``\n - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0
pytorch/实战/9.ipynb Normal file
View File

View File

@@ -11,7 +11,7 @@
> - 静态数据处理
> - 动态数据处理
### 机器学习算法实践(四周)
### **机器学习算法实践(四周)**
> 尝试各种机器学习算法,主要使用主流的算法框架。寻找网络教程,完成这一块,还有很多书。机器学习算法,应该由浅入深。不能像之前那样想先学会所有的原理,自己手动实现这些原理,然后使用别人的框架实现这些过程,在别人的基础上进行改进。当前应该吧顺序颠倒过来。从使用开始,逐渐理解底层的东西,然后做出适当的修改。
>
@@ -37,15 +37,30 @@
- [ ] pysyft学习
- [ ] pysyft联邦学习实现
- fate
- [ ] 学习fate的使用
- [ ] 使用fate完成开发
- 复习
- python系列复习
- [ ] python
- [ ] numpy
- [ ] matplotlib
- [ ] pandas
- [ ] sklearn复习
- [ ] pytorch复习
### 联邦学习文章复现(四周)
> 由于只是太多太乱,感觉需要二轮的复习。
> 1. 首先,学习完成所有的内容,大致会用,完成第一次开发。
> 2. 然后看论文进行第二轮复习完成第二轮开发。第二论复习主要使用xmind进行知识的整理和规划。第二轮开发主要是结合论文中的方法对论文中方法进行复现。
### **联邦学习文章复现(四周)**
> 阅读当前最新的联邦学习文章。使用别人的代码复现联邦学习的过程。复习和复现
- [ ] DeepAMD
- [ ] CIC
### 恶意软件数据处理(四周)
### **恶意软件数据处理(四周)**
> 针对恶意软件使用机器学习算法。尝试单机或者使用联邦学习解决恶意软件中的问题。
@@ -53,7 +68,7 @@
- [ ] 静态数据处理
- [ ] 动态数据处理
### 任务安排
## 安排
1. 每天 sklearn 一个机器学习小算法
2. 每天一个联邦学习框架
@@ -61,7 +76,7 @@
4. 每天数据集处理一小步。
## 机器学习的目标
## 目标
### 机器学习原理与实践的教程
@@ -86,5 +101,7 @@
- 深度学习基础
- [ ] TensorFlow
- [ ] tensorflow federated
- [ ] pytorch
- [ ] pysyft
## 收获

View File

@@ -0,0 +1,10 @@
## 计划
- [ ] pytorch动态数据分类
- [ ] sklearn静态数据分类
- [ ] pytorch静态数据分类
- [ ] 静态数据预处理脚本
- [ ] 动态数据预处理脚本
## 收获

View File

@@ -0,0 +1,90 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0-final"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python3",
"display_name": "Python 3.8.0 64-bit",
"metadata": {
"interpreter": {
"hash": "38740d3277777e2cd7c6c2cc9d8addf5118fdf3f82b1b39231fd12aeac8aee8b"
}
}
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"word\n2\n4\n"
]
}
],
"source": [
"import csv\n",
"data = csv.reader(open('test.csv'))\n",
"\n",
"# 智能按行读取Pythonlist\n",
"for row in data:\n",
" print(row[1])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[[1. 2.]\n [3. 4.]]\n"
]
}
],
"source": [
"import numpy as np \n",
"data = np.loadtxt('test.csv',delimiter=',',dtype=float,skiprows=1)\n",
"print(data)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" hello word\n0 1.0 2.0\n1 3.0 4.0\n"
]
}
],
"source": [
"import pandas as pd \n",
"data = pd.read_csv('test.csv',delimiter=',',skiprows=0,dtype=float)\n",
"print(data)"
]
}
]
}

View File

@@ -0,0 +1,3 @@
hello,word
1,2
3,4
1 hello word
2 1 2
3 3 4