Nvidia Rapids
安装Rapids
安装比较简单,访问官方网站选择自己需要的安装方式即可: 地址:https://rapids.ai/start.html#get-rapids。
需要注意的是,在采用conda安装时,由于Rapids需要访问nvidia自己的源,可能会与国内镜像源冲突导致无法连接,此时需要将其nvidia的源也添加到 .condarc 中,我的 .condarc 内容如下:
channels:
- defaults
show_channel_urls: true
channel_alias: https://mirrors.tuna.tsinghua.edu.cn/anaconda
default_channels:
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/pro
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
statiskit: https://conda.anaconda.org/
rapidsai: https://conda.anaconda.org/
nvidia: https://conda.anaconda.org/
填坑记录
LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
安装rapids之后,执行一个简单的replace操作,居然报错!
代码:
import cudf
df = cudf.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [0.0, 2.0, 4.0]})
df = df.replace([0.0, 1.0], 99.0)
错误信息:
...
File "/mnt/nfsroot/lixf/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/numba/cuda/compiler.py", line 590, in bind
self._func.get()
File "/mnt/nfsroot/lixf/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/numba/cuda/compiler.py", line 441, in get
linker.add_ptx(ptx)
File "/mnt/nfsroot/lixf/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 2163, in add_ptx
raise LinkerError("%s\n%s" % (e, self.error_log))
numba.cuda.cudadrv.driver.LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
ptxas application ptx input, line 9; fatal : Unsupported .version 7.4; current version is '7.0'
排查:
$ conda list |grep "^cuda"
cudatoolkit 11.0.221 h6bb024c_0 defaults
cudatoolkit-dev 11.4.0 py38h497a2fe_2 conda-forge
发现 cudatoolkit-dev 与 cudatoolkit 的版本不一致。 可能是由于我的conda中还安装了一些其他python包而导致的。
解决:
$ conda search -c defaults -c conda-forge cudatoolkit-dev
cudatoolkit-dev 10.1.243 h516909a_3 conda-forge
cudatoolkit-dev 11.0.3 h7f98852_0 conda-forge
cudatoolkit-dev 11.0.3 py36h8f6f2f9_2 conda-forge
cudatoolkit-dev 11.0.3 py37h5e8e339_2 conda-forge
cudatoolkit-dev 11.0.3 py38h7f98852_1 conda-forge
...
$ conda install -c defaults -c conda-forge cudatoolkit-dev=11.0.3=py38h7f98852_1
...
然后再次执行上述的python代码,通过。