Open Credo

March 2, 2016 | Microservices

Versioning a Microservice System with git

Microservice-style software architectures have many benefits: loose coupling, independent scalability, localised failures, facilitating the usage of polyglot data persistence tools or multiple programming languages.

However, they also introduce other challenges. A major one is the fact that the end-user functionality of the system will ultimately emerge as a composition of multiple services. This significantly increases the complexity of deploying the system. In addition, because we lose the concept of “versions” of the system, it becomes harder to answer questions like “what capabilities are in production?” and “when is a new feature considered ‘done’?”.

WRITTEN BY

David Borsos

David Borsos

Versioning a Microservice System with git

Motivation

In this blogpost I intend to describe a mechanism which can help control the deployment of a microservice-based system and give more visibility about what code is actually deployed.

Microservice deployment pipelines

Let me illustrate with a typical example. Assume that we have a client organisation for whom we have developed a system using a microservice architecture. Usually, after the initial launch of the product, further changes will be required. Often the actual delivery of these features will span multiple services1. And we almost certainly will have more than one of such new requirements being worked on at any given time.

As the development process goes on, the team would break down the individual feature requests into smaller stories, identify what needs to be modified in each microservice start working on them, and deliver the necessary changes in the system.

Most likely, these modifications will have some kind of dependency between them. Our user-facing new function will rely on changes in a background service. Therefore, before deploying and delivering the complete feature, we’ll need to make sure that all the dependencies are in place.

In an ideal situation, one method of achieving this would be a continuous deployment mechanism: once something is ready, you push it immediately into production. This would mean shipping the dependencies first, then, eventually, completing the required function2.

However, while truly continuous deployment is highly desirable3, it might not always be achievable, often because of non-technical reasons. Just to list a few possible cases:

  • The client wants to communicate any change to the end-users
  • A more controlled, scheduled release procedure is a requirement
  • We need an explicit sign-off from the client when certain changes (user facing content or functionality) are rolled out
  • Regulatory constraints

In addition; whenever making a change in the system, a natural expectation is to have a reasonable confidence in not breaking any existing functionality and introducing new features that work as expected – both from a functional and non-functional standpoint.

One natural way of mitigating this is having a deployment pipeline that has “phases” corresponding to the various levels of confidence that our system as a whole4 is “good enough”. Typically, this would look like at least one, but possibly several environments where various automated tests (functional or non-functional) are executed, an environment where the client can sign off changes (UAT) and possibly used for demos, perhaps a few more for other specific purposes (performance or security testing), and, ultimately production.

The problem with the proliferation of environments is two-fold. It requires running a larger infrastructure; thus driving up costs. In addition, it becomes more difficult to track what piece of functionality reached which phase (and in the end, it’s harder to answer the question – what functionality are we releasing now?).

The process I describe below aims to help solve these problems.

The environment tracker repository

The fundamental idea is having a small repository in a source control system that serves as our “environment tracker”. The contents of this repository should describe what versions of the various services we want to have deployed in each environment5. Our preferred source control system is git, therefore I will use it for illustration.

The repository should have a very simple structure; for example each of the services could be described in individual files. A possible layout is illustrated below:

Each file should contain, at the very least, one line with the version of the deployable artifact of that service. E.g.

Afterwards, automated deployment scripts can pick up the contents of the tracker repository and deploy it into their target environment. This should be a very lightweight process – you only need to actually deploy those services that have had a version change.

Having more than one environment

This repository is able to track more than one environment. Git branches are a good way of dealing with the task. Each branch would correspond to a phase in the deployment pipeline, and by definition the corresponding environment(s) will have the state of the system described by HEAD in each branch deployed.

A simple tracker repository layout with 3 environments

A simple tracker repository layout with 3 environments

 

The promotion of the system between the environments would then manifest itself as git-merges between the appropriate branches. This has the following benefits:

  • Each branch describes the current state of an environment (including prod), therefore it’s trivial to tell what is actually running where
  • If we don’t do fast-forward merges, the merge commits will give us a precise log of when something was promoted from one environment to the other
  • The branch will describe us the exact history of the state of the system in each environment
  • Having a history also gives the technical capability of easily rolling back a change if necessary – just do a git-revert

Reducing the infrastructure

One important consequence of the tracker repository is that we do not actually need all the environments running all the time.

Consider a scenario when we have a fast delivery and client feedback loop, we are not actually allowed to move changes to production at arbitrary times.

Such a situation often results in a back-pressure in the deployment pipeline. Because we cannot get the latest changes deployed to the live system, we are also reluctant (or unable) to change the lower environments because of the desire to keep things stable until we can release a set of stable and signed-off changes. Eventually this could mean that we completely stop making code changes because of a pending deployment.

The solution feels straightforward enough – let’s add more environments to the pipeline to relieve the pressure, e.g. a “staging” area which is by definition holds production ready code waiting for deployment.

However, these phases may not have any actual dedicated function, besides holding the code. But, because without them it is generally difficult to tell what piece of code has actually reached a phase, the corresponding environments are often kept running all the time, thus requiring additional infrastructure and increasing overall costs.

Having a “staging” branch in the environment tracker repository, on the other hand, has no associated cost at all. Yet it fulfils the same purpose as an actual staging environment would.

In fact, leveraging the modern cloud-based infrastructures, provisioning and configuration management tools, we can easily create these environments on-demand; only whenever they are needed. At any given time we are going to know their exact expected state using the tracker repository.

The staging environment does not need to be always up

On-demand provisioning of the staging environment

Avoiding the “distributed monolith”

Using the tracker repository will probably make it very tempting to start treating your microservice system as a “distributed” monolith. It’s easy to see a strategy of versioning the contents (using git-tags) and promoting only these “versions”.

Conscious effort should be made to not do this. The pattern itself does not automatically mean that you can only promote a “version” of the system. So far I have described an environment promotion as a merge from a branch representing a lower phase to one higher in the deployment pipe. However it does not have to be a full merge at all. Git gives us tools to not necessarily take everything from another branch. Two examples of these: use git-cherry-pick or just synchronise some selected files from the other branch:

What this will do is copy the state of file services/service-a.version from branch QA to branch UAT and then commit it to UAT. Note that this will not cause a merge commit to appear in your git history, yet effectively results in promoting service-a to UAT from QA.

Selective promotion of a service

Selective promotion of a service

Putting it all together

So what does a continuous delivery pipeline with a tracker repository look like end-to-end?

At the beginning of it, nothing is different. You write your microservice code, push it to a source code repository server, from where a build tool picks it up, runs some automated module-level and contract tests on your code. If these are successful, it builds and publishes a versioned artefact into a repository.

This artefact then automatically gets deployed into the first integrated environment where you have a chance of testing how it behaves when interacting with other services.

Higher level automated (system regression type) tests are passing, and if all is well, the environment’s state is snapshotted into the tracker repository’s master branch.

At this point, you have reasonable confidence that your changes didn’t break existing functionality. You can now promote the changes using the tracker repository, run more tests, get client signoff and everything else that is in your process, and, ultimately, release to production.

Additional thoughts

Scaling the approach

A tracker repository is a useful way to capture a state of a single system that is several collaborating services. In case you have many microservices in your organisation, forming many conceptually more separate systems, it is not necessary (and almost certainly not desirable) to have a single repository describing all of them. If it makes sense, there is nothing preventing you from having a number of separate tracker repositories, each responsible for a certain collection of services.

Multiple versions of a service

In some cases it may be necessary to have different versions of the same microservice running at the same time in an environment. This can be easily solved, by listing every version of a service that should be deployed in the corresponding tracker file.

Capturing additional information

In the examples I have used so far, I only captured the version of each service that identifies a deployable artefact. However it is perfectly feasible to have more complex information captured, for example deployment descriptors or references to the state of your configuration management tools that were used to provision the infrastructure under your environment.

Creating equivalent environments

Using the tracker repository you gain the ability to create an environment that is precisely the same as any other. This can be especially useful if you need to replicate a production-like environment.

Extending the pattern

The fundamental idea of the tracker repository is that our deployed constellation of services is no longer purely incidental, but have a more formal definition. The gain is all the power that git as a source control tool can give us across the whole system. In this post, I have used simple examples to illustrate what is possible, but it is easy to extend these relying on the various git merging strategies so that the actual promotion and deployment process meets the organisational needs.

Conclusion

The environment tracker repository pattern described above is a result of an attempt to solve a very specific problem: having a software product with the benefits and flexibility of a microservice architecture, but balancing it with the desire or need to have more control over change.

It is by no means a solution that would fit all use-cases. If your organisation is ready for truly continuous software deployment, then it may not be suitable for you. However in many cases it is useful to bridge the gap, grants better understanding and greater control of your deployment process and generally helps to move the organisation towards the desired goal.

Notes

  1. An argument can be made that if a client requirement spans multiple services, then the service boundaries have been incorrectly drawn up in the first place. In a realistic situation this is not always true. Clients will (and should) think in terms of business processes and end-user functionality, and while these concepts ideally map to the system’s microservice structure, the boundaries in the real world are not always as well-defined as the boundaries of the microservices.
    Quoting Martin Fowler: “you can expect many single service changes to only require that service to be redeployed. That’s not an absolute, some changes will change service interfaces resulting in some coordination, but the aim of a good microservice architecture is to minimize these“
  2. There are several potential mechanisms that can help. You can explicitly declare versions and dependencies that your services need, you can use feature flagging to enable a piece of functionality only when everything is ready. These are all valid solutions, and should be part of your microservice toolkit, but each of them is going to shift the complexity somewhere else (e.g. feature flagging will result in more complex code that needs to work correctly in both cases).
  3. More about why continuous delivery is important:
    http://martinfowler.com/articles/microservice-trade-offs.html#deployment
    http://martinfowler.com/bliki/MicroservicePrerequisites.html
  4. Ultimately, the client cares about whether the whole product / system works as expected, and not that the individual microservices fulfill their service contracts correctly.
  5. Note that a single service may have multiple versions deployed at the same time – the pattern is easily modifiable to accommodate this, see section “Multiple versions of a service” below

RETURN TO BLOG

SHARE

Twitter LinkedIn Facebook Email

SIMILAR POSTS

Blog