# Theano 在 Windows 上的配置 注意:不建议在 `windows` 进行 `theano` 的配置。 务必确认你的显卡支持 `CUDA`。 我个人的电脑搭载的是 `Windows 10 x64` 系统,显卡是 `Nvidia GeForce GTX 850M`。 ## 安装 theano 首先是用 `anaconda` 安装 `theano`: ```py conda install mingw libpython pip install theano ``` ## 安装 VS 和 CUDA 按顺序安装这两个软件: * 安装 Visual Studio 2010/2012/2013 * 安装 对应的 x64 或 x86 CUDA Cuda 的版本与电脑的显卡兼容。 我安装的是 Visual Studio 2012 和 CUDA v7.0v。 ## 配置环境变量 `CUDA` 会自动帮你添加一个 `CUDA_PATH` 环境变量(环境变量在 控制面板->系统与安全->系统->高级系统设置 中),表示你的 `CUDA` 安装位置,我的电脑上为: * `CUDA_PATH` * `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0` 我们配置两个相关变量: * `CUDA_BIN_PATH` * `%CUDA_PATH%\bin` * `CUDA_LIB_PATH` * `%CUDA_PATH%\lib\Win32` 接下来在 `Path` 环境变量的后面加上: * `Minicoda` 中关于 `mingw` 的项: * `C:\Miniconda\MinGW\bin;` * `C:\Miniconda\MinGW\x86_64-w64-mingw32\lib;` * `VS` 中的 `cl` 编译命令: * `C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin;` * `C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE;` 生成测试文件: In [1]: ```py %%file test_theano.py from theano import config print 'using device:', config.device ``` ```py Writing test_theano.py ``` 我们可以通过临时设置环境变量 `THEANO_FLAGS` 来改变 `theano` 的运行模式,在 linux 下,临时环境变量直接用: ```py THEANO_FLAGS=xxx ``` 就可以完成,设置完成之后,该环境变量只在当前的命令窗口有效,你可以这样运行你的代码: ```py THEANO_FLAGS=xxx python .py ``` 在 `Windows` 下,需要使用 `set` 命令来临时设置环境变量,所以运行方式为: ```py set THEANO_FLAGS=xxx && python .py ``` In [2]: ```py import sys if sys.platform == 'win32': !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py else: !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py ``` ```py using device: cpu ``` In [3]: ```py if sys.platform == 'win32': !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py else: !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py ``` ```py Using gpu device 0: Tesla C2075 (CNMeM is disabled) using device: gpu ``` 测试 `CPU` 和 `GPU` 的差异: In [4]: ```py %%file test_theano.py from theano import function, config, shared, sandbox import theano.tensor as T import numpy import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([], T.exp(x)) t0 = time.time() for i in xrange(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu') ``` ```py Overwriting test_theano.py ``` In [5]: ```py if sys.platform == 'win32': !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py else: !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py ``` ```py Looping 1000 times took 3.498123 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu ``` In [6]: ```py if sys.platform == 'win32': !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py else: !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py ``` ```py Using gpu device 0: Tesla C2075 (CNMeM is disabled) Looping 1000 times took 0.847006 seconds Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296] Used the gpu ``` 可以看到 `GPU` 明显要比 `CPU` 快。 使用 `GPU` 模式的 `T.exp(x)` 可以获得更快的加速效果: In [7]: ```py %%file test_theano.py from theano import function, config, shared, sandbox import theano.sandbox.cuda.basic_ops import theano.tensor as T import numpy import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), 'float32')) f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x))) t0 = time.time() for i in xrange(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) print("Numpy result is %s" % (numpy.asarray(r),)) if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu') ``` ```py Overwriting test_theano.py ``` In [8]: ```py if sys.platform == 'win32': !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py else: !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py ``` ```py Using gpu device 0: Tesla C2075 (CNMeM is disabled) Looping 1000 times took 0.318359 seconds Result is Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296] Used the gpu ``` In [9]: ```py !rm test_theano.py ``` ## 配置 .theanorc.txt 我们可以在个人文件夹下配置 .theanorc.txt 文件来省去每次都使用环境变量设置的麻烦: 例如我现在的 .theanorc.txt 配置为: ```py [global] device = gpu floatX = float32 [nvcc] fastmath = True flags = -LC:\Miniconda\libs compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin [gcc] cxxflags = -LC:\Miniconda\MinGW ``` 具体这些配置有什么作用之后可以查看官网上的教程。