在k8s集群的使用過程中,初學者可能會碰到這樣的(怪異)問題: 在一個k8s集群里,部署服務(用的私有鏡像倉庫,如harbor)的時候,只有個別node的服務是部署成功的,其他都是部署失敗的,錯誤的原因就是鏡像拉取失敗,如下:

kubectl get pods -A -owide |grep jenkins-demo
devlopment jenkins-demo-67d4f9d666-2fh8k 1/1 Running 0 27m 10.244.2.40 local-k8s-nd02
devlopment jenkins-demo-dbc9f5b6b-h78tx 0/1 ImagePullBackOff 0 6m4s 10.244.6.93 local-k8s-nd03
production jenkins-demo-dbc9f5b6b-tnkfs 1/1 Running 0 5m47s 10.244.2.44 local-k8s-nd02
qatest jenkins-demo-67d4f9d666-hb22t 1/1 Running 0 27m 10.244.2.41 local-k8s-nd02
qatest jenkins-demo-dbc9f5b6b-d6txr 0/1 ImagePullBackOff 0 6m 10.244.6.94 local-k8s-nd03

查看失敗詳情

# kubectl describe pods -n qatest jenkins-demo-6cbfb64844-79n8l
..........
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 78s default-scheduler Successfully assigned qatest/jenkins-demo-6cbfb64844-79n8l to local-k8s-nd03
Normal Pulling 37s (x3 over 78s) kubelet Pulling image "dev-hub.jiatuiyun.net/zeng/my-demo:429d9c1"
Warning Failed 36s (x3 over 77s) kubelet Failed to pull image "dev-hub.jiatuiyun.net/zeng/my-demo:429d9c1": rpc error: code = Unknown desc = Error response from daemon: pull access denied for dev-hub.jiatuiyun.net/zeng/my-demo, repository does not exist or may require docker login: denied: requested access to the resource is denied
Warning Failed 36s (x3 over 77s) kubelet Error: ErrImagePull
Normal BackOff 6s (x5 over 77s) kubelet Back-off pulling image "dev-hub.jiatuiyun.net/zeng/my-demo:429d9c1"
Warning Failed 6s (x5 over 77s) kubelet Error: ImagePullBackOff

然后我們去鏡像拉取失敗的機器上,直接用命令拉取,竟然是ok的

# docker pull dev-hub.jiatuiyun.net/zeng/my-demo:eb7ec1d
eb7ec1d: Pulling from zeng/my-demo
4fe2ade4980c: Already exists
2e793f0ebe8a: Already exists
77995fba1918: Already exists
4495499e856d: Already exists
0ff8f8e34aa6: Already exists
6c24ea7b9085: Pull complete
c07b8e5ec47b: Pull complete
Digest: sha256:95077089b59358820c4c763ae8bc390e470c62ac3d212abfe38292ff6389c7bb
Status: Downloaded newer image for dev-hub.jiatuiyun.net/zeng/my-demo:eb7ec1d
dev-hub.jiatuiyun.net/zeng/my-demo:eb7ec1d

同一個集群,同一個鏡像倉庫的同一個鏡像,只是node不同而已,個別node拉取鏡像ok,其余node拉取鏡像失敗,為何? 其實這個問題的原因就處在,服務的部署文件里邊沒有配置拉取鏡像用的secret,在服務配置文件中的名字是 imagePullSecrets 如下:

.....
spec:
imagePullSecrets:
- name: registry-pull-secret
containers:
- image: dev-hub.xxxxx.net/zeng/my-demo:
imagePullPolicy: IfNotPresent
name: jenkins-demo
.....

注意:如果服務分布在多個不同的namespace下,那在這些namespace下都要創建secret 至于secret如何創建就不在此贅述了,網上很多可以參考的資料。