You have probably heard of Infrastructure-as-Code (IaC) and the many benefits it offers over manually creating and updating resources.
For starters, IaC gives you the ability to reproduce deployments across environments (development, staging, production) and accounts (different regions, different clients). This reproducibility is crucial in helping with disaster recovery situations and paving the way for a faster and more automated scaling of your infrastructure.
IaC files additionally serve as documentation, helping everyone to understand how and why resources have been deployed, as well as bringing your entire infrastructure under version control. By reducing the risk of human errors (a frequent occurrence when manual steps are used for new deployments), IaC is accepted as the de facto approach underpinning reliable, scalable solutions.
Infrastructure-as-Code is typically implemented using a combination of specific tools such as Terraform, CloudFormation or Ansible. If you use Kubernetes to run your workloads, there is an additional option available to you. Native Cloud resources can also be declared and managed from your Kubernetes manifest files directly using custom resource definitions (CRDs). These CRDs are backed by controllers running inside your Kubernetes cluster that will, in turn, manage (i.e., create, update, or delete) those resources in the cloud for you. This combination of CRDs and controllers is known as the Operator pattern.
The three leading cloud vendors (Amazon Web Services, Microsoft Azure, and Google Cloud Platform) now offer such facilities. Of course, as with everything related to Kubernetes, Google leads the way with an offering covering some 60+ GCP services. Not to be outdone, details regarding the AWS offering can be found here, while for Azure, click here.
For the Big 3 cloud vendors, the way this ‘IaC-by-Kubernetes’ works is similar. You first need to install additional components in your Kubernetes cluster, which each vendor names differently:
For this article, let’s call them ‘operators’ (since they follow the operator pattern). These operators will implement custom resource definitions for various cloud resources, so you will need to load those CRDs in your Kubernetes cluster as well. In addition to the operators from cloud vendors, there are operators that cover a large array of different technologies.
Once the operators are deployed in your Kubernetes cluster, you will be able to declare cloud resources alongside your usual Kubernetes objects, such as Deployments and Services. The operators will then create/update/delete the corresponding cloud resources as required.
Please note that, as usual, with Kubernetes custom resources, you will need to deploy the operators first and then deploy the manifest files describing the cloud resources you want to manage. This is because the CRDs must be in place before you can use them in your manifest files, and it is thus impossible to deploy everything in one command line.
The main advantage of declaring your cloud resources in your Kubernetes manifest files in this way is that it allows you to use a consistent approach to manage most if not all aspects of your workload. Suppose your engineers are used to the syntax of Kubernetes manifest files. In that case, it will be easier for them to manage the cloud resources using such files, rather than learn a new language and tool, such as Terraform or CloudFormation.
In the typical Kubernetes way, the Kubernetes objects and resources will be eventually consistent. So the initial application of changes can be made very quickly, while the operators and Kubernetes work in the background to ensure that your workload eventually matches the declarations in your manifest files. Provided your code can handle this kind of situation, this method can significantly increase your deployments’ speed compared to traditional IaC tools. The declarative nature also makes it a natural fit for GitOps style deployment pipelines, helping to better bridge and smooth the application and infrastructure workflows.
Finally, you can manage your cloud resource operators’ permissions using resource-based access control (RBAC), like any other Kubernetes object.
This technology is still relatively new, and as such, comes with some warnings. For example, both AWS and Azure warn you that this technology is not yet suitable for production deployments. In preview-like states, it is common for such software to break backward compatibility. This may mean that you will need to be prepared to adjust (rework) your code to align with the latest changes delivered at this stage to gain access to new features and functionality. Additionally, managing the RBACs will be an additional burden for your team, on top of managing the permissions for the cloud vendor itself.
Kubernetes’ eventual consistency might be a disadvantage as well, requiring you to carefully write your application code so that it is resilient to failure, for example, by implementing retries. Let’s say you declare an RDS instance in AWS to provide you with a MySQL database, but it typically takes around 10 minutes to be up and running. Your code needs to cater to that type of situation. Alternatively, you might want to stage your deployments by deploying the manifest files related to your resources first, followed by the manifest files related to your apps.
As with virtually all other IaC tools, this technology does not provide a unified interface for similar resources on various cloud vendors. Consequently, switching to a new cloud vendor will require that you rewrite all the manifest files declaring your cloud resources.
By switching to this new technology, you will lose some functionalities tied to specific IaC tools. For example, CloudFormation has signals to allow an EC2 instance to notify it that an instance is ready to process requests, so CloudFormation should resume the stack’s deployment. On the Terraform side of things, you won’t have access to the many publicly available modules, so you will need to reinvent the wheel with Kubernetes manifest files.
There are also security implications with managing infrastructure from within a Kubernetes cluster. There will now exist a collection of pods in your cluster with, potentially, great power over your infrastructure. With appropriate RBAC controls and network policies, these security concerns can be managed, but this will need to be kept in mind.
This new technology is just a different way of performing the same function as existing IaC tools. Consequently, a relatively deep knowledge of the cloud resources in question is still required. It can be argued that existing IaC tools are better at handling IaC and offer more functionality.
While it might seem there isn’t a use for these operators, they may prove helpful to certain organisations practising a “you build it, you run it” approach. With this approach, developers have responsibility for deploying and running the parts of the infrastructure that belong to their service (databases, elastic search cluster, Redis caches etc.).
These teams are often stream-aligned and combined with a platform team responsible for the entire platform. To hand responsibility for service-specific infrastructure, complex pipelines to run service-specific snippets of IaC code need to be built. Using one of these infrastructure operators can significantly simplify this, allowing development teams to manage their own infrastructure from their Kubernetes manifests or Helm charts.
In conclusion, this new approach to IaC is interesting and has its uses but should be used in the right context. It is certainly not suitable for deploying all of your infrastructure, but it can help deliver specific infrastructure services to developers.