Nvidia Rapids

安装Rapids

安装比较简单,访问官方网站选择自己需要的安装方式即可: 地址:https://rapids.ai/start.html#get-rapids。

需要注意的是,在采用conda安装时,由于Rapids需要访问nvidia自己的源,可能会与国内镜像源冲突导致无法连接,此时需要将其nvidia的源也添加到 .condarc 中,我的 .condarc 内容如下:

channels:
  - defaults
show_channel_urls: true
channel_alias: https://mirrors.tuna.tsinghua.edu.cn/anaconda
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/pro
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  statiskit: https://conda.anaconda.org/
  rapidsai: https://conda.anaconda.org/
  nvidia: https://conda.anaconda.org/

填坑记录

LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR

安装rapids之后,执行一个简单的replace操作,居然报错!

代码:

import cudf

df = cudf.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [0.0, 2.0, 4.0]})
df = df.replace([0.0, 1.0], 99.0)

错误信息:

...
  File "/mnt/nfsroot/lixf/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/numba/cuda/compiler.py", line 590, in bind
    self._func.get()
  File "/mnt/nfsroot/lixf/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/numba/cuda/compiler.py", line 441, in get
    linker.add_ptx(ptx)
  File "/mnt/nfsroot/lixf/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 2163, in add_ptx
    raise LinkerError("%s\n%s" % (e, self.error_log))
numba.cuda.cudadrv.driver.LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
ptxas application ptx input, line 9; fatal   : Unsupported .version 7.4; current version is '7.0'

排查:

$ conda list |grep "^cuda"

cudatoolkit               11.0.221             h6bb024c_0    defaults
cudatoolkit-dev           11.4.0           py38h497a2fe_2    conda-forge

发现 cudatoolkit-devcudatoolkit 的版本不一致。 可能是由于我的conda中还安装了一些其他python包而导致的。

解决:

$ conda search -c defaults -c conda-forge cudatoolkit-dev 
cudatoolkit-dev             10.1.243      h516909a_3  conda-forge         
cudatoolkit-dev               11.0.3      h7f98852_0  conda-forge         
cudatoolkit-dev               11.0.3  py36h8f6f2f9_2  conda-forge         
cudatoolkit-dev               11.0.3  py37h5e8e339_2  conda-forge       
cudatoolkit-dev               11.0.3  py38h7f98852_1  conda-forge         
...

$ conda install -c defaults -c conda-forge cudatoolkit-dev=11.0.3=py38h7f98852_1
...

然后再次执行上述的python代码,通过。