Kubernetes Pod Restart Times

Embrace Failures: find the start times of Kubernetes pods


Cloud Native means embracing failures. At Agilicus, our strategy for security is Defense in Depth. In a nutshell, assume bad things will happen and have a fallback position, rather than dying on the hill of the first line. Similarly for reliability we assume Strength in Numbers. Rather than spending large time and money on a single infinitely reliable thing, we assume each component will fail, and have a strategy to make that invisible.

A big part of our strategy is the Pre-emptible node. This means that the underlying machines our software runs on can be powered off without notice. This might sound like a bad thing, but, consider, kernels panic, hardware fails, its going to happen anyway. Would you rather make it infrequent enough you don’t know what to do when it happens? Or would you rather make it part of a normal workday and have a solution? Embrace Failure.

The Agilicus strategy to embracing the failure involves Kuberentes, Istio, retries, etc. Many pieces. And sometimes one of those needs improving or investigating. And, the first question is always: “what restarted when”? For this, you would think, “kubectl -n namespace get pods“, right? Well, turns out that kubectl is a liar. It misses events like a node panic, showing an age which doesn’t correlate to when it started.

But, all is not lost, kubectl is capable of telling you the truth, its just not the default. So, this small script below was invented. It uses the jsonpath output and a wee bit of formatting to make it beautiful readable.

$ cat ~/bin/p-times 
#!/bin/bash
ns=$1

(
echo -e "Pod\tnodeName\tstartTime\tstartedAt"
kubectl -n "$1" get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\t"}{.status.startTime}{"\t"}{.status.containerStatuses[0].state.running.startedAt}{"\n"}{end}'
) | column -t

Running it gives this output. Adding the nodeName makes it quite wide, feel free to remove that. We use it to correlate to node-logs in Elasticsearch if needed.

$ p-times istio-system
Pod                                      nodeName                                startTime             startedAt
istio-citadel-849cc4bfb6-ksklm           gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:55Z  2021-01-15T15:15:13Z
istio-citadel-849cc4bfb6-l5cqs           gke-noc-noc-preempt-pool-bc2cfcf6-gmcm  2021-01-14T05:59:18Z  2021-01-14T21:59:04Z
istio-galley-6476d5f4df-qx5q5            gke-noc-noc-preempt-pool-bc2cfcf6-gmcm  2021-01-14T05:59:15Z  2021-01-14T21:56:13Z
istio-galley-6476d5f4df-x5vp5            gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:54Z  2021-01-15T15:14:35Z
istio-ingressgateway-6d7d9cc94-2cr62     gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:54Z  2021-01-15T15:14:47Z
istio-ingressgateway-6d7d9cc94-dc2g6     gke-noc-noc-preempt-pool-bc2cfcf6-lw40  2021-01-14T05:59:18Z  2021-01-15T01:37:03Z
istio-pilot-89497d7c9-kndft              gke-noc-noc-preempt-pool-bc2cfcf6-lw40  2021-01-14T05:59:18Z  2021-01-15T01:37:45Z
istio-pilot-89497d7c9-pjgjp              gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:58Z  2021-01-15T15:15:28Z
istio-policy-7589588588-b8dhk            gke-noc-noc-preempt-pool-bc2cfcf6-lw40  2021-01-14T05:59:19Z  2021-01-15T01:37:58Z
istio-policy-7589588588-vkc4s            gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:53Z  2021-01-15T15:14:21Z
istio-sidecar-injector-67d9c85d6f-l52hr  gke-noc-noc-preempt-pool-bc2cfcf6-gmcm  2021-01-14T05:59:15Z  2021-01-14T22:00:42Z
istio-sidecar-injector-67d9c85d6f-pcll7  gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:58Z  2021-01-15T15:15:33Z
istio-telemetry-6f6fb677bd-jj7h8         gke-noc-noc-preempt-pool-356e72e3-heye  2021-01-15T15:13:54Z  2021-01-15T15:14:50Z
istio-telemetry-6f6fb677bd-jt4dg         gke-noc-noc-preempt-pool-bc2cfcf6-gmcm  2021-01-14T05:59:15Z  2021-01-14T22:01:00Z