2017年3月7日 星期二

在 CentOS 7 安裝 CUDA 8.0


今日在公司找到一張 Nvidia 的顯示卡,型號是 GeForce GTX 680。查看 官網資料,這張卡支援 CUDA,意味著可以拿來進行 Machine Learning 的訓練工作。我感覺找到寶物一樣!

我把它安裝到一台 PC 上,把原來的 AMD 顯示卡換掉。開機正常。之後就是要安裝 CUDA 驅動程式及 cuDNN 程式庫。步驟如下:
  1. yum update
  2. yum install gcc gcc-c++
  3. sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
  4. Download CUDA Toolkit 8.0 from https://developer.nvidia.com/cuda-downloads

  5. sudo rpm -i cuda-repo-rhel7-8-0-local-ga2-8.0.61-1.x86_64.rpm
  6. sudo yum clean all
  7. sudo yum install cuda
  8. reboot
  9. vi /etc/profile
    export PATH=/usr/local/cuda-8.0/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
    
  10. mkdir ~/cuda-test
  11. cd ~/cuda-test
  12. cuda-install-samples-8.0.sh test
  13. cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery
  14. make
  15. ./deviceQuery
    CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "GeForce GTX 680"
      CUDA Driver Version / Runtime Version          8.0 / 8.0
      CUDA Capability Major/Minor version number:    3.0
      Total amount of global memory:                 1998 MBytes (2095382528 bytes)
      ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
      GPU Max Clock rate:                            1058 MHz (1.06 GHz)
      Memory Clock rate:                             3004 Mhz
      Memory Bus Width:                              256-bit
      L2 Cache Size:                                 524288 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
      Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
      < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 680
    Result = PASS