# Kubeflow ์„ค์น˜ - Ubuntu 18.04(nvidia-docker)


Ubuntu 18.04์— Kubeflow๋ฅผ ์„ค์น˜ํ•˜๊ณ  pipelines SDK๋ฅผ ์‚ฌ์šฉํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

์™ ์ง€ ๋ชจ๋ฅด๊ฒ ๋Š”๋ฐ 16.04๋Š” ์ž˜ ์•ˆ๋จ

ย 

์ฐธ๊ณ ํ•œ ๋ธ”๋กœ๊ทธ

https://www.kangwoo.kr/category/machine-learning/kubeflow/page/5/

ย 

img

๋ชฉ์ฐจ

Ubuntu 18.04 LTS ์„ค์น˜
- Nvidia driver ์„ค์น˜
Docker ์„ค์น˜
nvidia-docker ์„ค์น˜

ย 

์„ค์น˜ ๋ฒ„์ „ ์ •๋ฆฌ

  • Ubuntu 18.04 LTS
  • Nvidia driver 435
  • docker-CE 18.09
  • nvidia-docker
  • kubernetes 1.15.10
  • cilium 1.6
  • nvidia-device-plugin-daemonset 1.12
  • kubeflow 1.0RC4 with istio 1.3

ย 

Ubuntu 18.04 LTS ์„ค์น˜ํ•˜๊ธฐ

1. ์„ค์น˜ ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œ

2. ๋ถ€ํŒ… USB๋ฅผ ๋งŒ๋“ค๊ธฐ

3. GIGABYTE USB ๋ถ€ํŒ…

ย 

์•„๋ž˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ–ˆ๋‹ค.

https://hiseon.me/linux/ubuntu/install-ubuntu-18-04/

ย 

(1) Nvidia driver ์„ค์น˜

์šฐ๋ถ„ํˆฌ 18.04 ํ™˜๊ฒฝ์—์„œ nvidia ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

์•„๋ž˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ–ˆ๋‹ค.

https://www.kangwoo.kr/category/machine-learning/kubeflow/page/5/

ย 

nvidia ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜๋ฅผ ์œ„ํ•ด nouveau๋ฅผ ์ œ๊ฑฐํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

ย 

1. nouveau ์„ค์น˜ ํ™•์ธ ํ›„ ์ œ๊ฑฐํ•˜๊ธฐ

$ lsmod | grep nouveau

๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด nouveau๊ฐ€ ์„ค์น˜๋˜์–ด์žˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

img

/etc/modprobe.d/ ๊ฒฝ๋กœ์— blacklist ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜์ž.

ย 

vi ํŽธ์ง‘๊ธฐ๋ฅผ ์ด์šฉํ•ด ์•„๋ž˜ ๋‚ด์šฉ์„ ๋„ฃ๊ณ  esc+wq๋กœ ์ €์žฅํ•˜์ž.

$ sudo vi /etc/modprobe.d/blacklist-nouveau.conf

blacklist nouveau options nouveau modset=0

๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•œ ํ›„, ์žฌ๋ถ€ํŒ… ํ•˜์ž.

$ sudo update-initramfs -u
$ sudo service gdm stop

img

์žฌ๋ถ€ํŒ…ํ•˜๋ฉด ๋ชจ๋‹ˆํ„ฐ ํ•ด์ƒ๋„๋ฅผ ์ •์ƒ์ ์œผ๋กœ ์ธ์‹ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค.

Nvidia ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์„ค์น˜ํ•˜๋ฉด ๊ดœ์ฐฎ์•„์ง„๋‹ค!

ย 

2. Nvidia ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜ํ•˜๊ธฐ

์ปจํ…Œ์ด๋„ˆ(Container)๋ฅผ ์ด์šฉํ•ด์„œ GPU๋ฅผ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ด๊ธฐ ๋•Œ๋ฌธ์—, Nvidia ๋“œ๋ผ์ด๋ฒ„๊ฐ€ ์„ค์น˜ํ•˜์ž.

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update

$ sudo apt-get install nvidia-driver-435 $ sudo reboot

์žฌ๋ถ€ํŒ… ํ›„,nvidia-smi๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ด์„œ, ๋“œ๋ผ์ด๋ฒ„๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

$ nvidia-smi

img


Kubeflow ์„ค์น˜ํ•˜๊ธฐ

(1) ์ตœ์†Œ ์‹œ์Šคํ…œ ์š”๊ตฌ์‚ฌํ•ญ

https://www.kubeflow.org/docs/started/k8s/overview/

๊ณต์‹ ๋ฌธ์„œ์— ๋”ฐ๋ฅด๋ฉด kubeflow ์ตœ์†Œ ์‹œ์Šคํ…œ ์š”๊ตฌ์‚ฌํ•ญ์€ ์ด๋ ‡๋‹ค.

- RAM 12GB ์ด์ƒ

- CPU 4 core ์ด์ƒ

- Storage 50GB ์ด์ƒ

ย 

๊ทธ๋ž˜์„œ ๋‚ด PC ์‚ฌ์–‘์„ ์•Œ์•„๋ณด์•˜๋‹ค.

RAM 16 GB
CPU core 12
Storage 256 GB
CPU Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Graphic Card GeForce RTX 2070 - Nvidia

Kubernetes์™€ Kubeflow์˜ ๋ฒ„์ „๋„ ํ™•์ธํ•ด๋ณด์ž.

img

Kubeflow 1.0์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด Kubernetes ๋ฒ„์ „์€ 1.15๋ฅผ ์„ค์น˜ํ•˜์ž.

ย 


(2) Docker ์„ค์น˜ํ•˜๊ธฐ

apt๊ฐ€ https ์ €์žฅ์†Œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํŒจํ‚ค์ง€ ์ถ”๊ฐ€

$ sudo apt-get update
$ sudo apt-get install -y apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

docker์˜ GPGํ‚ค ์ถ”๊ฐ€

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

ย 

์ €์žฅ์†Œ๋ฅผ ์ถ”๊ฐ€

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) stable"

apt ํŒจํ‚ค์ง€์˜ ์ธ๋ฑ์Šค๋ฅผ ์—…๋ฐ์ดํŠธ

$ sudo apt-get update

docker-ce๋Š” 18.09 ๋ฒ„์ „์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ณ  ํŒจํ‚ค์ง€๋ฅผ ๊ณ ์ •ํ•˜์ž.

$ sudo apt-get install -y docker-ce=5:18.09.9~3-0~ubuntu-bionic docker-ce-cli=5:18.09.9~3-0~ubuntu-bionic containerd.io

$ sudo apt-mark hold docker-ce docker-ce-cli

docker๋ฅผ 18.09 ๋ฒ„์ „์œผ๋กœ ๋‹ค์šด๋ฐ›๋Š” ์ด์œ ?

๋”๋ณด๊ธฐ

19.03 ๋ฒ„์ „ ๋ถ€ํ„ฐ GPU ๊ด€๋ จ ๋‚ด์šฉ์ด ๋ณ€๊ฒฝ๋˜์—ˆ๋‹ค. kubernetes์—์„œ GPU ์ž‘์—…์„ ํ•˜๋ ค๋ฉด,ย k8s-device-plugin์ด ํ•„์š”ํ•œ๋ฐ, ์•„์ง(20.06.10) 19.03 ๋ฒ„์ „์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

์ฐธ๊ณ  - https://www.kangwoo.kr/category/machine-learning/kubeflow/page/5/

๋„์ปค๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๋˜์—ˆ๋Š”์ง€ hello-world ์ด๋ฏธ์ง€๋ฅผ ์‹คํ–‰ํ•ด๋ณด์ž

$ sudo docker run hello-world

์•„๋ž˜์™€ ๊ฐ™์ด ์ถœ๋ ฅ๋˜๋ฉด ์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๋œ ๊ฒƒ์ด๋‹ค.

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
0e03bdcc26d7: Pull complete
Digest: sha256:6a65f928fb91fcfbc963f7aa6d57c8eeb426ad9a20c7ee045538ef34847f44f1
Status: Downloaded newer image for hello-world:latest

Hello from Docker! This message shows that your installation appears to be working correctly.

...

For more examples and ideas, visit: https://docs.docker.com/get-started/

Sudo ์—†์ด Docker ์‹คํ–‰ํ•˜๊ธฐ

Docker๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฃจํŠธ ๊ถŒํ•œ์ด ํ•„์š”ํ•˜๋‹ค.

๋ฒˆ๊ฑฐ๋กœ์šฐ๋‹ˆ๊นŒ Docker๋ฅผ sudo ๊ถŒํ•œ์œผ๋กœ ๋“ฑ๋ก์‹œ์ผœ์ฃผ์ž.

$ sudo groupadd docker
$ sudo usermod -aG docker ${USER}
$ newgrp docker
$ docker run hello-world

(3) nvidia-docker ์„ค์น˜ํ•˜๊ธฐ

๋„์ปค ์ปจํ…Œ์ด๋„ˆ์˜ GPU ๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ์„ ์œ„ํ•ด nvidia-docker๋ฅผ ์„ค์น˜ํ•ด์•ผํ•œ๋‹ค.

์•„๋ž˜ distribution์ด ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™๋‹ค!

# Add the package repositories
$ distribution=$(. /etc/os-release;echo VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update

๊ทธ๋Œ€๋กœ ํ–ˆ์ง€๋งŒ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค.

img

https://nvidia.github.io/nvidia-docker/ย ๋งํฌ๋ฅผ ๋”ฐ๋ผ๊ฐ€ ํ™•์ธํ•ด๋ณด์•˜๋‹ค.

์ฐธ๊ณ ํ•œ ๋ธ”๋กœ๊ทธ ๋‚ด์šฉ๊ณผ distribution ๋ช…๋ น์–ด ๋‚ด์šฉ์ด ๋‹ฌ๋ž๋‹ค

๋งํฌ์˜ ๋‚ด์šฉ๋Œ€๋กœ ์ž…๋ ฅํ•ด๋ณด์•˜๋‹ค.

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
    sudo apt-key add -

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |
sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update

img

์ด์ œ nvidia-docker๋ฅผ ์„ค์น˜ํ•˜์ž

$ sudo apt-get install nvidia-docker2
$ sudo systemctl restart docker

ย 

nvidia-docker๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด๊ธฐ ์œ„ํ•ด์„œ, ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ด๋ณด์ž.

$ sudo docker run --runtime nvidia nvidia/cuda:10.0-base nvidia-smi

img

์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๊ฐ€ ๋˜์—ˆ๋‹ค๋ฉด ๋…ธ๋“œ์— ์„ค์น˜๋˜์–ด ์žˆ๋Š” GPU์˜ ๋ฆฌ์ŠคํŠธ์™€ ๊ฐ์ข… ์ƒํƒœ ๋ฐ ์ˆ˜ํ•ด์˜ค๋””๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๋ฅผ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

ย 

ย 

๋„์ปค์˜ ๊ธฐ๋ณธ ๋Ÿฐํƒ€์ž„์„ ๋ณ€๊ฒฝํ•ด์ฃผ์ž

์ƒ์„ฑ๋œ /etc/docker/daemon.json ํŒŒ์ผ์—์„œ "default-runtime": "nvidia"์„ ์ถ”๊ฐ€ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

ย 

$ sudo vi /etc/docker/daemon.json

    {
      "default-runtime": "nvidia", 
      "runtimes": {
        "nvidia": {
          "path": "nvidia-container-runtime",
          "runtimeArgs": []
        }
      }
    }
    

ํŒŒ์ผ์„ ์ˆ˜์ •ํ•œ ํ›„, ๋„์ปค๋ฅผ ์žฌ์‹œ์ž‘ํ•˜์ž.

$ sudo systemctl restart docker

ย 

ย 

Reference

ย 

[1] ์ง€๊ตฌ๋ณ„ ์—ฌํ–‰์ž ๋ธ”๋กœ๊ทธ - https://www.kangwoo.kr/category/machine-learning/kubeflow/page/5

[2] EGAS ๋ธ”๋กœ๊ทธ - http://ghcksdk.com/kubeflow-installation/

https://hub.docker.com/r/tensorflow/tensorflow/

https://www.tensorflow.org/guide/gpu

https://docs.docker.com/install/linux/docker-ce/ubuntu/

https://github.com/NVIDIA/nvidia-docker

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

https://blusky10.tistory.com/359

ย 

Last Updated: 6/18/2023, 2:13:15 PM