Open Credo

September 14, 2015 | Cloud, DevOps

Terraform Infrastructure Design Patterns

If you are operating in the programmable infrastructure space, you will hopefully have come across Terraform, a tool from HashiCorp which is primarily used to manage infrastructure resources such as virtual machines, DNS names and firewall settings across a number of public and private providers (AWS, GCP, Azure, …).

WRITTEN BY

Bart Spaans

Bart Spaans

Terraform Infrastructure Design Patterns

From the outside Terraform looks similar to other cloud related tools such as Amazon’s CloudFormation and Google’s gcloud, but when it comes to programmatically and consistently managing your infrastructure, there are a number of reasons to prefer Terraform. One of them is that it allows us to seamlessly combine resources from multiple providers (a feature highlighted in one of our other blog posts). In this post however, we’ll be looking to explore another one: that is Terraform’s ability to grow with the size of your infrastructure through its slightly hidden metaprogramming capabilities.

From single file to modules

Single file

Most Terraform projects start with a single file defining only a few resources. We set up the network, start a few instances, give them sensible names, and off we go. In this example I’m defining ten private Google Compute instances, each getting a DNS name web-0.example.com, web-1.example.com, …

variable "disk_image" {}
variable "public_key" {}

resource "google_compute_instance" "web" {
  count        = "10"
  name         = "web-${count.index}"
  zone         = "europe-west1-b"
  tags         = ["docker", "no-ip"]
  machine_type = "n1-standard-1"

  disk {
    image = "${var.disk_image}"
  }
  metadata {
    sshKeys = "${var.public_key}"
  }
  network_interface {
    network = "default"
  }
}

resource "google_dns_record_set" "web" {
  count        = "10"
  managed_zone = "example.com"
  name         = "web-${count.index}.example.com"
  type         = "A"
  ttl          = 5
  rrdatas      = ["${element(google_compute_instance.web.*.network_interface.0.address, count.index)}"]
}

Although admittedly it’s slightly hard to make out: the last field of our dns record set definition rrdatas gets its value from its corresponding compute instance.

I.e the ["${element(google_compute_instance.web.*.network_interface.0.address, count.index)}"] reference translates into: Create an array which consists of data loaded from each google_compute_instance resource defined under the name web, specifically the first (zero based counting) IP address allocated and made available via the network_interface definition, and select an element based on the current count.index, which is a magic variable that is incremented by Terraform when using the count looping feature.

This is a small example of the sort of programming we can do.

Multiple files

As your infrastructure needs grow bigger, so does your Terraform file. To alleviate this maintenance burden we can group common resources together into separate files, because Terraform reads all the .tf files in the directory. We could move to a structure like this for example:

vm.tf
firewall.tf
routes.tf
dns.tf

Splitting on resource type can be helpful, but it does make it harder to look at the logical pieces in your infrastructure. For instance, if we want to find all the resources relating to our database servers then we’d have to dig through all the files and look for relevant definitions. We could split the resources out logically instead:

database.tf
web.tf
site.tf # common resources like DNS zones

…but then we end up with the opposite problem and have resource types scattered around multiple files.

Modules

Modules give us a way out of this. Using modules we can logically group infrastructure components, whilst also being able to easily find component types. A layout I commonly use for projects of medium size is something like this:

# Shared resources go here. Networks, dns zones, ...
modules/site/network.tf
modules/site/dns.tf
modules/site/variables.tf

# Each logical component becomes a module
modules/database/vm.tf
modules/database/dns.tf
modules/database/network.tf
modules/database/variables.tf

modules/web/vm.tf
modules/web/dns.tf
modules/web/network.tf
modules/web/variables.tf

# Leaving the root relatively clean
site.tf
variables.tf
terraform.tfvars

As the inclusion of the variables.tf file in each module suggests: we can make the modules configurable. That is the variables.tf can be used to define what the module expects to be supplied in the form of variables in order to function. We can also import modules multiple times with different configurations, which makes it easy to build mirrored prod and pre-prod environments for example.

An added benefit of grouping your files like this is that site.tf in the root now gives you a clean, logical breakdown of your infrastructure.

Below is a snippet of the site.tf I use to run the environment for our “Containers and Schedulers” course as an example:

module "mgt" {
  source     = "./modules/mgt"
  course_key = "${var.course_key}"
  dns_domain = "${var.dns_domain}"
}

module "attendees" {
  source          = "./modules/attendees"
  course_key      = "${var.course_key}"
  nr_of_attendees = "${var.nr_of_attendees}"
  dns_zone_name   = "${module.mgt.dns_zone_name}"
  dns_domain      = "${var.dns_domain}"
}

I use the variable output of the mgt module as an input to the attendees module. This is another example of using the output of one resource as the input to another, but this time it’s across modules. In the next sections we’ll see how these features enable some powerful metaprogramming abstractions.

Pattern Modules

In small development environments we often deploy build servers and artefact stores as single machines. Both of the machines get a fitting DNS name and are configured to be accessible to developers. Their setup is identical barring a detail or two. In production we might be running multiple applications; all in their own load-balanced pool, on a separate network. Once again, driven by DRY, there is a pattern that we would like to extract. And these are just two examples; we may have coordinator/worker or primary/secondary setups, clusters, a DMZ, blue/green deployments, etc.

Now, my favourite Terraform feature is that we can use modules to create blueprints for these infrastructure patterns, and that we can do this for each supported provider. The pattern modules are organised like regular modules:

modules/single_instance/vm.tf
modules/single_instance/dns.tf
modules/single_instance/variables.tf

modules/load_balanced_pool/vm.tf
modules/load_balanced_pool/dns.tf
modules/load_balanced_pool/variables.tf

But the resources defined inside these files are fairly abstract and depend almost entirely on configuration. Below is an example of what the single_instance module could look like:

variable "name"          {}
variable "zone"          {}
variable "tags"          {}
variable "machine_type"  {}
variable "disk_image"    {}
variable "public_key"    {}
variable "network"       {}
variable "dns_zone_name" {}
variable "dns_domain"    {}

resource "google_compute_instance" "single_instance" {
  name         = "${var.name}"
  zone         = "${var.zone}"
  tags         = "${split(";", var.tags)}"
  machine_type = "${var.machine_type}"

  disk {
    image = "${var.disk_image}"
  }
  metadata {
    sshKeys = "${var.public_key}"
  }
  network_interface {
    network = "${var.network}"
  }
}

resource "google_dns_record_set" "single_instance_record" {
  managed_zone = "${var.dns_zone_name}"
  name         = "${var.name}-.${var.dns_domain}"
  type         = "A"
  ttl          = 5
  rrdatas      = ["${google_compute_instance.single_instance.network_interface.0.address}"]
}

This looks almost pointless as the code above is redefining Terraform resources, but crucially it’s composing them in Terraform code. We can think of these pattern modules above as implementations of higher level resources on top of the lower level resources offered by the infrastructure providers. This is an example of how a tool like Terraform can transcend the abilities of simple cloud APIs.

By using default values for the commonalities our logical components simply turn into the set of configuration values that make them unique:

module "postgresql" {
  src          = "modules/single_instance/"
  name         = "postgresql"
  machine_type = "n1-standard-4"
}
module "mysql" {
  src          = "modules/single_instance/"
  name         = "mysql"
  machine_type = "n1-standard-32"
}
module "web" {
  src          = "modules/load_balanced_pool/"
  name         = "web"
  machine_type = "n1-standard-4"
  instances    = 10
}

We can change the variables in the base modules to include the needed defaults, or we can put them in yet another module to provide a specialisation of the base implementation, because modules can import other modules. The latter has the advantage that the highly abstract “meta” modules could be maintained by the community to provide common infrastructure pattern libraries.

Composing Pattern Modules

Since modules can import other modules, we can also import patterns from other patterns and compose them into even bigger patterns.

As an example: we could have built the load_balanced_pool from above by importing and configuring a new multiple_instance module and adding only a load balancer resource. We could then also rewrite the single_instance module to be a specialisation of the multiple_instance module.

The load_balanced_pool could also be a tiny cog in a production module, or even an end_to_end_delivery module — these possibilities are interesting to think about and are reminiscent of similar discussions in software engineering around decoupling, encapsulation and re-use.

Conclusion

Terraform is a great tool that grows with your infrastructure. From a single file containing a few resources to reusable design patterns that build entire enterprise grade cloud architectures.

At the moment of writing the ideas mentioned in the later part of this post do mean you have to push Terraform’s programming model to its limits as can already be witnessed by my use of the slight “split string” hack to get a list of tags in the single_instance module. Sometimes you have to repeat yourself and resort to a bit of stringly typed programming, but generally you can take it quite far. I’m sure the metaprogramming capabilities will keep improving over time, as they have done in the past and I’m excited to see where it goes.

 

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

RETURN TO BLOG

SHARE

Twitter LinkedIn Facebook Email

SIMILAR POSTS

Blog