22ef6e23 cloud cache

Completely Complex Cloud Cluster Capacity Crisis: Cool as a Cucumber in Kubernetes


So…. capacity. There’s never enough.  This is why people like cloud computing. You can expand for some extra cash. There’s different ways to expand: scale out (add more of the same) and scale up (make the same things bigger). Normally in cloud you are focused on scale-out, but, well, you need big enough pieces to make that reasonable.

When I set up my Kubernetes on GKE, I used n1-standard-2 (2VCPU, 7.5GB ram). 3 of them to make the initial cluster. And it was ok, it got the job done. But, as soon as we started using more CI pipelines (gitlab-runner), well, it left something to be desired. So, a fourth node was pressed into service, and then a fifth. At this stage I looked at it and said, well, I’d rather have 3 bigger machines than 5 little ones. Better over-subscription, faster CI, and, its cheaper. Now, this should be easy right? Hmm, it was a bit tough, let me share the recipe with you.

First, I needed to create a new pool, out of n1-standard-4 (4VCPU, 15GB RAM). I did that like this:

gcloud container node-pools create pool-n1std4 \
--zone northamerica-northeast1-a \
--cluster k8s \
--machine-type n1-standard-4 \
--image-type gci \
--disk-size=100 \
--num-nodes 3

OK, that kept complaining an upgrade was in progress. So I look, and sure enough the ‘add fifth node’ never worked properly, it was hung up. Grumble. Reboot it, still the same. Dig into it, its complaining about DaemonSets (calico), and not enough capacity. Hmm. So I used

gcloud container operations list 
gcloud beta container operations cancel operation-# --region northamerica-northeast1-a

And now the upgrade is ‘finished’ ๐Ÿ™‚

So at this stage I am able to do the above ‘create pool’. Huh, what’s this? everything resets and goes Pending. Panic sets in. The world is deleted, time to jump from the high window? OK, its just getting a new master, I don’t know why all was reset and down and is now Pending, but, well, the master is there.

Now lets drain the 5 ‘small’ ones:

kubectl drain gke-k8s-default-pool-XXX-XXX --delete-local-data --force --ignore-daemonsets

I had to use ignore-daemonsets cuz calico wouldn’t evict without it. OK, now we should be able to delete the old default-pool:

gcloud container node-pools delete default-pool --zone northamerica-northeast1-a --cluster k8s

Now, panic starts to set in again:

$ kubectl get nodes 
NAME STATUS ROLES AGE VERSION gke-k8s-pool-n1std4-XXX-XXX Ready,SchedulingDisabled <none> 21m v1.10.5-gke.0
gke-k8s-pool-n1std4-XXX-XXX Ready,SchedulingDisabled <none> 21m v1.10.5-gke.0
gke-k8s-pool-n1std4-XXX-XXX Ready,SchedulingDisabled <none> 21m v1.10.5-gke.0

Indeed, the entire world is down, and everything is Pending again.

So lets uncordon:

kubectl uncordon gke-k8s-pool-n1std4-XXXX-XXX

Great, they are now starting to accept load again, and the Kubernetes master is scheduling its little heart out, containers are being pulled (and getting image pull backoff cuz the registry is not up yet). OK the registry is up… And, we are all back to where we were, but 2x bigger per node. Faster CI, bigger cloud, roughly the same cost.