October 29, 2020 | Cloud, Kubernetes, Open Source
While working with a client recently, we experienced some issues when attempting to make use of NLB external load balancer services when using AWS EKS. I wanted to investigate whether these issues had been fixed in the upstream GitHub Kubernetes repository, or if I could fix it myself, contributing back to the community in the process.
The particular Kubernetes code in question is in the part of the repository responsible for communicating with the AWS EC2 APIs. This lies entwined with the kube-controller-manager which runs on the masters. This meant I wasn’t able to use EKS directly to test the changes as the EKS master nodes are not able to be controlled or upgraded by users. Instead I needed to run the masters myself and essentially simulate EKS. As long as these simulated masters resided somewhere in AWS, I would be able to accurately test the AWS integration. I thus needed to build, package, deploy & test Kubernetes for AWS from the Kubernetes Git repository – this blog records the travails and steps of what is required to do this.
After a lot of searching which turned up little information, this blog post serves as an answer to anyone else needing or wanting to achieve the same thing.
The Kubernetes community is currently in the process of moving cloud vendor specific code into their own cloud provider repositories. This will allow each cloud provider to be released on a different cadence to Kubernetes itself. Unfortunately, the process isn’t trivial as the kubelet currently relies on being able to ask the cloud providers for various pieces of information at startup such as which availability zone it is running in. At the moment the code for the integration with AWS lives at staging/src/k8s.io/legacy-cloud-providers/aws within the Kubernetes repository. The AWS cloud provider code is going to be moved to cloud-provider-aws.
The basic flow for developing the AWS cloud provider is as follows:
The Terraform code used to manage the infrastructure changes for this testing is available at https://github.com/opencredo/hacking-k8s-on-mac.
You will need:
Set up your local Docker machine so that it has 50GB of disk space and 10GB of memory.
Create the ECR repositories that will be used to store the Docker images that you will build; the Terraform for this is contained in the repositories directory of the https://github.com/opencredo/hacking-k8s-on-mac repository. We will be building the images for kube-apiserver, kube-controller-manager, kube-proxy & kube-scheduler shortly but the pause image must be manually pushed up.
When we run kubeadm, it will verify and use a pause image hosted under the same container image registry as is hosting the other images. Nothing I could configure would stop this. If the cluster is unable to download the pause image, this usually manifests as kubeadm timing out waiting for kubelet to boot the control plane. Run these commands to download the pause image and push it to the correct ECR repository.
aws ecr get-login-password | docker login --username AWS --password-stdin $(aws sts get-caller-identity --query 'Account' --output text).dkr.ecr.$(aws configure get region).amazonaws.com docker pull k8s.gcr.io/pause:3.2 docker tag k8s.gcr.io/pause:3.2 $(aws sts get-caller-identity --query Account --output text).dkr.ecr.$(aws configure get region).amazonaws.com/pause:3.2 docker push $(aws sts get-caller-identity --query Account --output text).dkr.ecr.$(aws configure get region).amazonaws.com/pause:3.2
These steps are performed within the Kubernetes Git repository that you should clone to your local machine.
KUBE_BUILD_PLATFORMS=linux/amd64 build/run.sh make
KUBE_BUILD_PLATFORMS=linux/amd64 \ KUBE_DOCKER_REGISTRY=$(aws sts get-caller-identity --query Account --output text).dkr.ecr.$(aws configure get region).amazonaws.com \ KUBE_BUILD_CONFORMANCE=n \ build/release-images.sh
aws ecr get-login-password | docker login --username AWS --password-stdin $(aws sts get-caller-identity --query 'Account' --output text).dkr.ecr.$(aws configure get region).amazonaws.com for i in _output/release-images/amd64/kube*.tar; do tag=$(docker load < "${i}" | grep 'Loaded image' | grep -v k8s.gcr | sed 's/^Loaded image: //'); newTag=$(echo "${tag}" | sed -E 's|^(.*)-amd64([^_]*)_(.*)$|\1\2\3|'); docker tag "${tag}" "${newTag}"; docker push "${newTag}"; done
These steps take place within the Git repository created for this blog post that you should clone to your local machine.
cd cluster && terraform apply -var kubernetes_directory=<location where Kubernetes was checked out to> -var kubernetes_version=<image tag that the new images were pushed up with>
cd cluster && $(terraform output master)
watch kubectl get nodes
kubectl -n kube-system logs --tail -1 -l component=kube-controller-manager
cd cluster && terraform destroy
If all of your tests have passed manual testing, then you can move on to the next section – otherwise go back to step 1.
Once you’re happy with the changes, you are now on the road to raising a PR against Kubernetes. Note that you need to:
Using the workflow outlined above, I’ve managed to raise one PR so far although it is currently awaiting review before it can be merged.
Note that the per-cloud controllers are in the process of being separated out from kube-controller-manager, although it is the early stages at the moment. Currently cloud-provider-aws makes use of staging/src/k8s.io/legacy-cloud-providers/aws that is within the Kubernetes repository but that is due to change as the migration continues. Once the cloud providers have been successfully calved off, some of the work in this blog post will be replaced by a simple deployment.
If you experience failures when building Kubernetes with a message similar to /usr/local/go/pkg/tool/linux_amd64/link: signal: killed then increase the memory.
I hope this blog can help those of you trying to do something similar! My final bit of parting advice is for the scenario where Docker may run out of disk space. If this happens, try cleaning out stopped docker containers and unused volumes, as I found that building Kubernetes on a Mac leaked both of them.
This blog is written exclusively by the OpenCredo team. We do not accept external contributions.
Cloud for Business in 2023: Raconteur and techUK webinar (Recording)
Check out the recording of our CEO/CTO Nicki Watt, and other panellists at the Raconteur and techUK webinar discussing “The Cloud for Business Report,” which…Lunch & Learn: Secure Pipelines Enforcing policies using OPA
Watch our Lunch & Learn by Hieu Doan and Alberto Faedda as they share how engineers and security teams can secure their software development processes…