March 2, 2016 | Microservices
Microservice-style software architectures have many benefits: loose coupling, independent scalability, localised failures, facilitating the usage of polyglot data persistence tools or multiple programming languages.
However, they also introduce other challenges. A major one is the fact that the end-user functionality of the system will ultimately emerge as a composition of multiple services. This significantly increases the complexity of deploying the system. In addition, because we lose the concept of “versions” of the system, it becomes harder to answer questions like “what capabilities are in production?” and “when is a new feature considered ‘done’?”.
In this blogpost I intend to describe a mechanism which can help control the deployment of a microservice-based system and give more visibility about what code is actually deployed.
Let me illustrate with a typical example. Assume that we have a client organisation for whom we have developed a system using a microservice architecture. Usually, after the initial launch of the product, further changes will be required. Often the actual delivery of these features will span multiple services1. And we almost certainly will have more than one of such new requirements being worked on at any given time.
As the development process goes on, the team would break down the individual feature requests into smaller stories, identify what needs to be modified in each microservice start working on them, and deliver the necessary changes in the system.
Most likely, these modifications will have some kind of dependency between them. Our user-facing new function will rely on changes in a background service. Therefore, before deploying and delivering the complete feature, we’ll need to make sure that all the dependencies are in place.
In an ideal situation, one method of achieving this would be a continuous deployment mechanism: once something is ready, you push it immediately into production. This would mean shipping the dependencies first, then, eventually, completing the required function2.
However, while truly continuous deployment is highly desirable3, it might not always be achievable, often because of non-technical reasons. Just to list a few possible cases:
In addition; whenever making a change in the system, a natural expectation is to have a reasonable confidence in not breaking any existing functionality and introducing new features that work as expected – both from a functional and non-functional standpoint.
One natural way of mitigating this is having a deployment pipeline that has “phases” corresponding to the various levels of confidence that our system as a whole4 is “good enough”. Typically, this would look like at least one, but possibly several environments where various automated tests (functional or non-functional) are executed, an environment where the client can sign off changes (UAT) and possibly used for demos, perhaps a few more for other specific purposes (performance or security testing), and, ultimately production.
The problem with the proliferation of environments is two-fold. It requires running a larger infrastructure; thus driving up costs. In addition, it becomes more difficult to track what piece of functionality reached which phase (and in the end, it’s harder to answer the question – what functionality are we releasing now?).
The process I describe below aims to help solve these problems.
The fundamental idea is having a small repository in a source control system that serves as our “environment tracker”. The contents of this repository should describe what versions of the various services we want to have deployed in each environment5. Our preferred source control system is git, therefore I will use it for illustration.
The repository should have a very simple structure; for example each of the services could be described in individual files. A possible layout is illustrated below:
/ +-- services/ | +-- service-a.version +-- service-b.version
Each file should contain, at the very least, one line with the version of the deployable artifact of that service. E.g.
$ cat services/service-a.version 1.0.0
Afterwards, automated deployment scripts can pick up the contents of the tracker repository and deploy it into their target environment. This should be a very lightweight process – you only need to actually deploy those services that have had a version change.
This repository is able to track more than one environment. Git branches are a good way of dealing with the task. Each branch would correspond to a phase in the deployment pipeline, and by definition the corresponding environment(s) will have the state of the system described by HEAD in each branch deployed.
The promotion of the system between the environments would then manifest itself as git-merges between the appropriate branches. This has the following benefits:
One important consequence of the tracker repository is that we do not actually need all the environments running all the time.
Consider a scenario when we have a fast delivery and client feedback loop, we are not actually allowed to move changes to production at arbitrary times.
Such a situation often results in a back-pressure in the deployment pipeline. Because we cannot get the latest changes deployed to the live system, we are also reluctant (or unable) to change the lower environments because of the desire to keep things stable until we can release a set of stable and signed-off changes. Eventually this could mean that we completely stop making code changes because of a pending deployment.
The solution feels straightforward enough – let’s add more environments to the pipeline to relieve the pressure, e.g. a “staging” area which is by definition holds production ready code waiting for deployment.
However, these phases may not have any actual dedicated function, besides holding the code. But, because without them it is generally difficult to tell what piece of code has actually reached a phase, the corresponding environments are often kept running all the time, thus requiring additional infrastructure and increasing overall costs.
Having a “staging” branch in the environment tracker repository, on the other hand, has no associated cost at all. Yet it fulfils the same purpose as an actual staging environment would.
In fact, leveraging the modern cloud-based infrastructures, provisioning and configuration management tools, we can easily create these environments on-demand; only whenever they are needed. At any given time we are going to know their exact expected state using the tracker repository.
Using the tracker repository will probably make it very tempting to start treating your microservice system as a “distributed” monolith. It’s easy to see a strategy of versioning the contents (using git-tags) and promoting only these “versions”.
Conscious effort should be made to not do this. The pattern itself does not automatically mean that you can only promote a “version” of the system. So far I have described an environment promotion as a merge from a branch representing a lower phase to one higher in the deployment pipe. However it does not have to be a full merge at all. Git gives us tools to not necessarily take everything from another branch. Two examples of these: use git-cherry-pick or just synchronise some selected files from the other branch:
$ git checkout uat $ git checkout qa -- services/service-b.version $ git commit -m "Promoted service-b"
What this will do is copy the state of file services/service-a.version from branch QA to branch UAT and then commit it to UAT. Note that this will not cause a merge commit to appear in your git history, yet effectively results in promoting service-a to UAT from QA.
So what does a continuous delivery pipeline with a tracker repository look like end-to-end?
At the beginning of it, nothing is different. You write your microservice code, push it to a source code repository server, from where a build tool picks it up, runs some automated module-level and contract tests on your code. If these are successful, it builds and publishes a versioned artefact into a repository.
This artefact then automatically gets deployed into the first integrated environment where you have a chance of testing how it behaves when interacting with other services.
Higher level automated (system regression type) tests are passing, and if all is well, the environment’s state is snapshotted into the tracker repository’s master branch.
At this point, you have reasonable confidence that your changes didn’t break existing functionality. You can now promote the changes using the tracker repository, run more tests, get client signoff and everything else that is in your process, and, ultimately, release to production.
A tracker repository is a useful way to capture a state of a single system that is several collaborating services. In case you have many microservices in your organisation, forming many conceptually more separate systems, it is not necessary (and almost certainly not desirable) to have a single repository describing all of them. If it makes sense, there is nothing preventing you from having a number of separate tracker repositories, each responsible for a certain collection of services.
In some cases it may be necessary to have different versions of the same microservice running at the same time in an environment. This can be easily solved, by listing every version of a service that should be deployed in the corresponding tracker file.
In the examples I have used so far, I only captured the version of each service that identifies a deployable artefact. However it is perfectly feasible to have more complex information captured, for example deployment descriptors or references to the state of your configuration management tools that were used to provision the infrastructure under your environment.
Using the tracker repository you gain the ability to create an environment that is precisely the same as any other. This can be especially useful if you need to replicate a production-like environment.
The fundamental idea of the tracker repository is that our deployed constellation of services is no longer purely incidental, but have a more formal definition. The gain is all the power that git as a source control tool can give us across the whole system. In this post, I have used simple examples to illustrate what is possible, but it is easy to extend these relying on the various git merging strategies so that the actual promotion and deployment process meets the organisational needs.
The environment tracker repository pattern described above is a result of an attempt to solve a very specific problem: having a software product with the benefits and flexibility of a microservice architecture, but balancing it with the desire or need to have more control over change.
It is by no means a solution that would fit all use-cases. If your organisation is ready for truly continuous software deployment, then it may not be suitable for you. However in many cases it is useful to bridge the gap, grants better understanding and greater control of your deployment process and generally helps to move the organisation towards the desired goal.