
Troubleshooting your Kubernetes Cluster

When faced with a cluster that’s not doing what you expect it do, what should you do to find out why it’s not behaving?

Assuming that you have (tried to) configured your cluster to be something in the order of “Ingress listens for traffic on port 80, meant for domain name example.com” then passes that traffic to a service that passes the traffic to a pod which is your container, then the first job is to check that the ingress is receiving the traffic intended for the domain.

Start by looking at all the pods

$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                        READY   STATUS      RESTARTS        AGE
ingress-nginx   ingress-nginx-admission-create-ghh8z        0/1     Completed   0               23h
ingress-nginx   ingress-nginx-admission-patch-pxf2q         0/1     Completed   0               23h
ingress-nginx   ingress-nginx-controller-7c6974c4d8-2x5vq   1/1     Running     1 (5h39m ago)   23h
kube-system     coredns-5dd5756b68-rn4td                    1/1     Running     1 (19h ago)     30h
kube-system     etcd-minikube                               1/1     Running     1 (19h ago)     30h
kube-system     kube-apiserver-minikube                     1/1     Running     1 (5h39m ago)   30h
kube-system     kube-controller-manager-minikube            1/1     Running     1 (19h ago)     30h
kube-system     kube-proxy-kv2xd                            1/1     Running     1 (19h ago)     30h
kube-system     kube-scheduler-minikube                     1/1     Running     1 (19h ago)     30h
kube-system     storage-provisioner                         1/1     Running     3 (5h38m ago)   30h

Then check the status of the pods in your namespace

$ kubectl get pods -n shane
NAME                             READY   STATUS              RESTARTS     AGE
accountserver-56b469dbbd-9q492   0/1     ErrImageNeverPull   0            12h
apiserver-6687c679f-tzgsw        1/1     Running             1 (9h ago)   14h

We can see in that snippet that there’s a problem with the accountserver image, but the apiserver is claiming to be happy.

Assuming that we’re checking the apiserver we continue -

Check that the ingress is receiving traffic (requires two shells, one to send the command, one to log what’s happening)

The logging shell:

$ kubectl logs -n ingress-nginx ingress-nginx-controller-7c6974c4d8-2x5vq -f

The command shell: First I want the address that the ingress has

$ kubectl get ingress -n shane
shane   <none>   example.com   80      3h15m

Test if that IP is receiving traffic

--2023-12-04 15:17:53--
Connecting to connected.
HTTP request sent, awaiting response... 404 Not Found
2023-12-04 15:17:53 ERROR 404: Not Found.

Well, wget reports connecting ok, but then it got a 404

So, let’s try with the domain name (For the record, I set the IP of the domain I want to pretend I have in /etc/hosts)

 $ wget example.com
--2023-12-04 15:27:03--  http://example.com/
Resolving example.com (example.com)...
Connecting to example.com (example.com)||:80... connected.
HTTP request sent, awaiting response... 502 Bad Gateway
2023-12-04 15:27:03 ERROR 502: Bad Gateway.

This tells us that the ingress is listening, and will only respond to requests on example.com which is fine. But it’s returning a 502 error, let’s see what the logs say (the other shell)

2023/12/04 04:27:03 [error] 577#577: *342388 connect() failed (111: Connection refused) while connecting to upstream, client:, server: example.com, request: "GET / HTTP/1.1", upstream: "", host: "example.com"
2023/12/04 04:27:03 [error] 577#577: *342388 connect() failed (111: Connection refused) while connecting to upstream, client:, server: example.com, request: "GET / HTTP/1.1", upstream: "", host: "example.com"
2023/12/04 04:27:03 [error] 577#577: *342388 connect() failed (111: Connection refused) while connecting to upstream, client:, server: example.com, request: "GET / HTTP/1.1", upstream: "", host: "example.com" - - [04/Dec/2023:04:27:03 +0000] "GET / HTTP/1.1" 502 150 "-" "Wget/1.21.2" 124 0.001 [example-api-service-8080] [],, 0, 0, 0 0.001, 0.000, 0.000 502, 502, 502 3cee74a2a4e97297f6ef379689bda677

The ingress is telling us that it the connection attempt upstream was refused.

We now turn our attention to the pod that the ingress is trying to connect to.

What have the logs been for apiserver?

$ kubectl logs apiserver-86c595ccfc-lpd2x -n shane

Ooo, empty, that’s curious.

From here it’s time to look at the configuration in kubernetes as well as in the container. It’s clear that kubernetes has directed traffic from the ingress to the accountserver service, but it’s not been accepted.

Upon inspection I found that the port configured in the service definition did not match the port configured in the container - so I have made a permanent fix, using an environment variable passed from the service to the container to define the port, meaning that the two will (from hence forth) have the same value for port number.

