Many of our clients are currently implementing applications using a ‘microservice’-based architecture. Increasingly we are hearing from organisations that are part way through a migration to microservices, and they want our help with validating and improving their current solution. These ‘microservices checkup’ projects have revealed some interesting patterns, and because we have experience of working in a wide-range of industries (and also have ‘fresh eyes’ when looking at a project), we are often able to work alongside teams to make significant improvements and create a strategic roadmap for future improvements.
We’ve summarised our findings below, with the goal of providing inspiration if you are thinking of performing your own checkup of your current microservices implementation.
In 2014 Martin Fowler stated that the prerequisites for a successful microservices implementation included rapid provisioning, basic monitoring and rapid application deployment. The OpenCredo team have discussed this internally to great lengths, and not only do we broadly agree with this statement, we have seen this validated within several projects.
Rapid provisioning is becoming somewhat less important in 2016, what with the realisation of microservice platforms, microservice compatible Platform as a Service (PaaS) and Containers as a Service (CaaS) like Kubernetes, Mesos and Docker Datacenter. These platforms typically provide a homogenised pool of (or some other abstraction over) computing resource to which microservices can be deployed. However, some organisations choose to deploy onto Infrastructure as a Service (IaaS), such as the offered by Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure, and here the absence of rapid provisioning is a frequent cause of contention between development and operations.
As part of a microservice checkup project we typically look for warning signs like unrepeatable builds, manual assembly of infrastructure, and configuration drift (the dreaded ‘snowflake server’). The team here at OpenCredo are supporters of container technology and associated orchestration platforms. We are also fans of HashiCorp tooling, and regularly work with (and contribute to) Packer and Terraform. As much as HashiCorp are aiming for world domination, we of course realise that other configuration management tooling does exist, and we work a lot with Ansible, Puppet and Chef.
The ability to monitor a microservices implementation is truly vital. As the number of services deployed within an application increases, so does the complexity of interactions, and in turn the potential for emergent behaviour becomes a reality (which is the sign of a truly complex system). As part of a microservice checkup we look for issues like undetected infrastructure issues (e.g. running out of disk space), alert storming (where so many alerts are triggered that it is impossible to know what the cause of an issue is), and production outages.
We regularly work with clients to help integrate and configure SaaS monitoring solutions such as Datadog, Sysdig and Ruxit, and have also worked with local monitoring solutions like Prometheus. We are also strong proponents of synthetic transactions and critical smoke tests, and have used tooling such as JMeter and Serenity BDD for this purpose. Often when we show this to project managers and other business stakeholders they wonder how they managed without this visibility into the application!
The topic of rapid application deployment, and the associated theme of continuous delivery, deserve their own blog posts. However, in the context of a microservices checkup, we often find this is the cause of most contention within an organisation adopting microservices. Many of us are used to working with continuous integration and deployment tooling like Jenkins, Go CD and Spinnaker, but we often see problems emerge within companies when the implementation of a successful build pipeline from one team is applied to another team within the organisation. If builds can’t be reliably and repeatedly deployed across an organisation, or there is frequent production outages, then we look further into building support for scalable rapid application deployment. Frequently the issues associated with this problem must be addressed at the technical and organisational level.
Another common pattern we see is that initial microservice implementations or proof-of-concepts don’t integrate with the ‘legacy’ (money making) systems, and so the pain of implementing continuous delivery within this context isn’t experienced. We often work with the SpectoLabs team to overcome this problem, as they have extensive experience in solutions for this issue, such as Service Virtualisation and API Simulation.
For those playing microservices Bingo, this is where you check-off “Conway’s Law”. However, as we’ve discussed in other articles and presentations, the reality of the situation is that Conway was telling the truth…
Often the first sign of organisational design struggles appear as a team moves from the proof-of-concept to implementation phase. As discussed in the ‘Rapid application deployment’ section above, as soon as the implementation expands beyond one team, then the complexity of interactions increases. This can often be challenging on a technical level, but it is almost always challenging on an organisational level. We look for red flags like queues of work, long delays or ‘sign offs’ on work as it moves around an organisation, and teams pulling in different directions (or using competing technologies).
We have helped several companies review their current approach to goal-setting and organisational structure, from the development teams to the product teams, and also further upwards into the management level. We often work with the C*O level within organisations, as unless alignment and buy-in is achieved at this level, any changes made within the rest of organisation can easily unravel.
Although the term ‘DevOps’ has become somewhat overused (and still isn’t truly defined), we believe that the concepts behind it, such as a (1) shared understanding and responsibility across development and operations, (2) automation, with principles and practices driving tooling, not the other way around, and (3) creating signals for rapid feedback, are vital for success within a microservices architecture. As the number of application components is typically higher within a microservice-based application (in comparison with more traditional architectures) we often see problems emerge rapidly where teams are mis-aligned on goals, solving the same problem in multiple different ways; cargo-culting of automation, or the incorrect use of off-the-shelf ‘DevOps tooling’; and an absence of situational awareness.
The technical aspects of DevOps have been covered already within this article, but as with the previous section focusing on organisational design, the organisational and people aspects of DevOps are equally (if not more) important. We have seen DevOps implementations create fear within organisations, and we have also seen suboptimal processes being automated (automated failure is still failure!). Based on our experiences we have developed programs to assist with the transformation to a more ‘DevOps’ way of working. We also believe that concepts and goals behind the ‘DevOps’ movement are vital in the wider business context and the current economic climate, where time-to-market and speed of innovation are clear competitive advantages.
As the cliche goes, at OpenCredo no two engagements are the same, as we tailor our services to each client and project. We aim to make systemic changes and provide as much knowledge transfer as possible, so that any implemented solutions are sustainable in the long term. Having said this, we have identified several broad categories within the context of microservices with which we repeatedly help clients.
We are happy to provide guidance on any of the above themes, and can focus as wide or narrow as you require. Please contact us with any questions, or request a free consultation below!