Thank you for that.
I was able to set the folder mapping. No issue here. The problem I have is with installing nvidia-docker.
Specifically, I am customizing planet with a dockerfile and trying to install nvidia-docker. However, nvidia-docker requires as a dependency docker-ce and it seems like the planet docker has a non conventional install. Specifically you will not find docker when running apt list --installed.
The result is that when I try run apt-get install -y nvidia-docker2 I get
The following packages have unmet dependencies:
nvidia-docker2 : Depends:
docker-ce (>= 18.06.0~ce~3-0~debian) but it is not installable or
docker-ee (>= 18.06.0~ce~3-0~debian) but it is not installable or
docker.io (>= 18.06.0) but it is not installable
I have tried various workarounds, non of which worked:
(1) finding the planet version of docker and running apt-get install -y nvidia-docker2=2.0.3+docker18.09.5-3 is not working
(2) Installed docker 18 or 19 “over” the current planet docker before installing nvidia-docker. The installation of nvidia-docker then succeeds, but then I crash during tele build runai/app.yaml --overwrite --debug
ERRO Command failed. error:[
ERROR REPORT:
Original Error: syscall.Errno operation not permitted
Stack Trace:
/gopath/src/github.com/gravitational/gravity/lib/app/docker/runtime.go:122 github.com/gravitational/gravity/lib/app/docker.TranslateRuntimeImage
/gopath/src/github.com/gravitational/gravity/lib/app/service/vendor.go:430 github.com/gravitational/gravity/lib/app/service.(*vendorer).translateRuntimeImages
/gopath/src/github.com/gravitational/gravity/lib/app/resources/resourcefiles.go:139 github.com/gravitational/gravity/lib/app/resources.(*ResourceFiles).RewriteManifest
/gopath/src/github.com/gravitational/gravity/lib/app/service/vendor.go:285 github.com/gravitational/gravity/lib/app/service.(*vendorer).VendorDir
/gopath/src/github.com/gravitational/gravity/lib/builder/builder.go:318 github.com/gravitational/gravity/lib/builder.(*Builder).Vendor
/gopath/src/github.com/gravitational/gravity/lib/builder/build.go:89 github.com/gravitational/gravity/lib/builder.Build
/gopath/src/github.com/gravitational/gravity/tool/tele/cli/build.go:67 github.com/gravitational/gravity/tool/tele/cli.build
/gopath/src/github.com/gravitational/gravity/tool/tele/cli/run.go:54 github.com/gravitational/gravity/tool/tele/cli.Run
/gopath/src/github.com/gravitational/gravity/tool/tele/main.go:44 main.run
/gopath/src/github.com/gravitational/gravity/tool/tele/main.go:35 main.main
/go/src/runtime/proc.go:200 runtime.main
/go/src/runtime/asm_amd64.s:1337 runtime.goexit
User Message: operation not permitted
] tele/main.go:36
[ERROR]: operation not permitted
(3) I tried to trick nvidia-docker into thinking that docker-ce exists. There are a couple of ways to do that (and you can see it in comments below). They do the work, but it does not end well at runtime
Thanks
Yaron
Below are the Dockerfile details. They represent several attempts, but should be clear (hopefully) enough
FROM quay.io/gravitational/planet:6.3.3-11700
RUN echo ‘export PATH=$PATH:/usr/local/nvidia/bin:/usr/local/nvidia/lib’ >> ~/.bashrc
RUN echo ‘export LD_LIBRARY_PATH=/usr/local/nvidia/lib’ >> ~/.bashrc
RUN /bin/bash -c “source ~/.bashrc”
RUN chmod 777 /tmp &&
mkdir -p /var/cache/apt/archives/partial &&
apt-get update
#TRICK TO SIMULATE DOCKER CE EXISTENCE
#RUN sudo apt-get install -y equivs
#COPY dockerce.control .
#RUN equivs-build dockerce.control && sudo dpkg -i docker-ce_18.06.0~ce~3-0~debian_all.deb
#OVERIDE DOCKER to get docker 19
#RUN apt-get remove -y docker docker-engine docker.io containerd runc
RUN apt-get install -y apt-transport-https ca-certificates curl gnupg2 software-properties-common
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
RUN add-apt-repository “deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable”
RUN apt-get update
#RUN apt-get install -y docker-ce docker-ce-cli containerd.io
RUN apt-get install -y docker-ce=5:18.09.5~3-0~debian-stretch docker-ce-cli=5:18.09.5~3-0~debian-stretch containerd.io
#INSTALL NVIDIA DOCKER/CONTAINER TOOLKIT
RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID) &&
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - &&
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
RUN apt-get update && apt-get install -y nvidia-docker2
#RUN systemctl restart docker
#NVIDIA-DOCKER DEPRECATED, use this instead?
#RUN apt-get update && apt-get install -y nvidia-container-toolkit
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility