# Theano 在 Windows 上的配置
注意:不建议在 `windows` 进行 `theano` 的配置。
务必确认你的显卡支持 `CUDA`。
我个人的电脑搭载的是 `Windows 10 x64` 系统,显卡是 `Nvidia GeForce GTX 850M`。
## 安装 theano
首先是用 `anaconda` 安装 `theano`:
```py
conda install mingw libpython
pip install theano
```
## 安装 VS 和 CUDA
按顺序安装这两个软件:
* 安装 Visual Studio 2010/2012/2013
* 安装 对应的 x64 或 x86 CUDA
Cuda 的版本与电脑的显卡兼容。
我安装的是 Visual Studio 2012 和 CUDA v7.0v。
## 配置环境变量
`CUDA` 会自动帮你添加一个 `CUDA_PATH` 环境变量(环境变量在 控制面板->系统与安全->系统->高级系统设置 中),表示你的 `CUDA` 安装位置,我的电脑上为:
* `CUDA_PATH`
* `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0`
我们配置两个相关变量:
* `CUDA_BIN_PATH`
* `%CUDA_PATH%\bin`
* `CUDA_LIB_PATH`
* `%CUDA_PATH%\lib\Win32`
接下来在 `Path` 环境变量的后面加上:
* `Minicoda` 中关于 `mingw` 的项:
* `C:\Miniconda\MinGW\bin;`
* `C:\Miniconda\MinGW\x86_64-w64-mingw32\lib;`
* `VS` 中的 `cl` 编译命令:
* `C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin;`
* `C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE;`
生成测试文件:
In [1]:
```py
%%file test_theano.py
from theano import config
print 'using device:', config.device
```
```py
Writing test_theano.py
```
我们可以通过临时设置环境变量 `THEANO_FLAGS` 来改变 `theano` 的运行模式,在 linux 下,临时环境变量直接用:
```py
THEANO_FLAGS=xxx
```
就可以完成,设置完成之后,该环境变量只在当前的命令窗口有效,你可以这样运行你的代码:
```py
THEANO_FLAGS=xxx python .py
```
在 `Windows` 下,需要使用 `set` 命令来临时设置环境变量,所以运行方式为:
```py
set THEANO_FLAGS=xxx && python .py
```
In [2]:
```py
import sys
if sys.platform == 'win32':
!set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py
else:
!THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py
```
```py
using device: cpu
```
In [3]:
```py
if sys.platform == 'win32':
!set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
else:
!THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py
```
```py
Using gpu device 0: Tesla C2075 (CNMeM is disabled)
using device: gpu
```
测试 `CPU` 和 `GPU` 的差异:
In [4]:
```py
%%file test_theano.py
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
```
```py
Overwriting test_theano.py
```
In [5]:
```py
if sys.platform == 'win32':
!set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py
else:
!THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py
```
```py
Looping 1000 times took 3.498123 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]
Used the cpu
```
In [6]:
```py
if sys.platform == 'win32':
!set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
else:
!THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py
```
```py
Using gpu device 0: Tesla C2075 (CNMeM is disabled)
Looping 1000 times took 0.847006 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
```
可以看到 `GPU` 明显要比 `CPU` 快。
使用 `GPU` 模式的 `T.exp(x)` 可以获得更快的加速效果:
In [7]:
```py
%%file test_theano.py
from theano import function, config, shared, sandbox
import theano.sandbox.cuda.basic_ops
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), 'float32'))
f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
print("Numpy result is %s" % (numpy.asarray(r),))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
```
```py
Overwriting test_theano.py
```
In [8]:
```py
if sys.platform == 'win32':
!set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
else:
!THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py
```
```py
Using gpu device 0: Tesla C2075 (CNMeM is disabled)
Looping 1000 times took 0.318359 seconds
Result is
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
```
In [9]:
```py
!rm test_theano.py
```
## 配置 .theanorc.txt
我们可以在个人文件夹下配置 .theanorc.txt 文件来省去每次都使用环境变量设置的麻烦:
例如我现在的 .theanorc.txt 配置为:
```py
[global]
device = gpu
floatX = float32
[nvcc]
fastmath = True
flags = -LC:\Miniconda\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin
[gcc]
cxxflags = -LC:\Miniconda\MinGW
```
具体这些配置有什么作用之后可以查看官网上的教程。