TL;DR - Computational Governance in Data Mesh with OPA

OpenCredo

April 21, 2023

•

Share this post

Copied!

Check out Mateus Pimenta's TL;DR video to learn how federated computational governance could be implemented using Open Policy Agent (OPA) and policy-as-code to support a successful Data Mesh architecture.

In this TL;DR video, our Lead Consultant Mateus demonstrates how federated computational governance could be implemented using Open Policy Agent (OPA) and policy-as-code to support a successful Data Mesh architecture.

Computational Governance in Data Mesh with OPA - Mateus Pimenta

Mateus introduces the use of Open Policy Agent (OPA) to implement computational governance in Data Mesh and provides some insights for those who want to implement it in their own projects.

In Data Mesh, the organisation is designed using Domain-Driven Design (DDD), resulting in a division of the business into multiple domain teams. Each team is responsible for all the systems and the data from that domain. This way, data ownership is assigned to that team. Any data shared from this domain to consumers is done by what is called data products.

Data products are curated data sets aimed at facilitating the consumption of that domain data. They offer a strong contract, schemas, and higher quality data and are designed for easier consumption. As a result of this model, teams have much more autonomy to make decisions related to their own domains.

With this autonomy, organisations can more easily scale up without creating bottlenecks. And ultimately, organisations can create a more efficient data ecosystem, making it easier to process, share, and drive insights out of the data.

However, in order to create a secure, trusted, and usable environment for data and a good consumer experience, certain levels of data governance are required. There's no one model for data governance. It can be seen better as in the spectrum of options from a fully controlled model to a completely decentralised one. Data Mesh tries to push the organisation as far left as possible towards decentralisation because you want to have as much autonomy as possible so that teams can independently contribute to the mesh. But at the same time, it's important to enforce some key controls to make the mesh valuable, interoperable, and useful for the business.

The question becomes, "where is the sweet spot"? So how do we do this without creating extra coordination or potential bottlenecks?

The key approach in Data Mesh is to use Federated computational governance. That is, teams have some independence to make some decisions, but some controls are still defined by a central body.

However, the key thing here is that those controls are enforced with the use of policy-as-code. So even though this centralised control reduces some autonomy, it does that without creating human bottlenecks along the way.

Mateus provides an example of how policy-as-code works in a development process to create, deploy, and serve data products to consumers. Policies are created as executable code by governance and served to the development process, which can use them to control different parts of the process.

For example, developers can execute policies at their workstations to verify their code, API, and other elements of data products. Policies can also be executed by a CI system at deployment time and on the data itself.

Mateus notes that policy-as-code can be used anywhere in the process, and a flexible tool that can integrate with any system is required. This is where OPA comes into play. OPA is the tool to write and serve policies.

In this model, the process that needs to implement a check calls OPA with some inputs. OPA evaluates those inputs against the policies, makes use of some extra external data, and returns the result of the evaluation to decide whether to continue or not.

The key thing to notice in this model is that it implements the principle of policy decoupling, which means that any policy can be modified or enhanced in OPA without requiring changes to the caller code.

This is crucial for Data Mesh because it allows governance to be as light-touch as possible, and any change can be smoothly rolled out to everyone without requiring the domain team to do any work for policy updates apart from starting to execute the new version.