A data mesh is a decentralized architecture devised by Zhamak Dehghani, director of Next Tech Incubation, principal consultant at Thoughtworks, and a member of its technology Advisory Board.
An intentionally designed distributed data architecture, under centralized governance and standardization for interoperability, enabled by a shared and harmonized self-serve data infrastructure.
Key uses for a data mesh
Data mesh’s key aim is to enable you to get value from your analytical data and historical facts at scale. We can apply this approach in the case of frequent data landscape change. The proliferation of data sources, and various data transformation and processing cases.
There are a plethora of use cases for it, including:
- Building virtual data catalogs from dispersed sources
- Enabling a straightforward way for developers and the DevOps team to run data queries from a wide variety of sources
- Allowing data teams to introduce a universal, domain-agnostic, automated approach to data standardization thanks to data meshes’ self-serve infrastructure-as-a-platform.
There are four key principles of distributed architecture. Let’s take a look at these in more detail.
The 4 Principles of Data Mesh
Equipped with a bit of background, we’re ready to look at the four principles of Data Mesh
1. Domain-oriented decentralized data ownership and architecture
The trend toward a decentralized architecture started decades ago—driven by the advent of service-oriented architecture and then — by microservices. It provides more flexibility, is easier to scale, easier to work on in parallel, and allows for the reuse of functionality. Compared with old-fashioned monolithic data lakes and data warehouses (DWH), data meshes offer a far more limber approach to data management.
Embracing the decentralization of data has its own history. Various approaches have been documented in the past, including decentralized DWH, federated DWHs, and even Kimball’s data marts (the heart of his DWH) are domain-oriented, supported, and implemented by separate departments. Here at ELEKS, we apply this approach in situations whereby multiple software engineering teams are working collaboratively, and the overall complexity is high.
During one of our financial consulting projects, our client’s analytical department was split into teams based on the finance area they covered. This meant that most of the decision-making and analytical dataset creation could be done within the team, while team members could still read global datasets, use common toolsets and follow the same data quality, presentation and release best practices.
2. Data as a product
This simply means applying widely used product thinking to data and, in doing so, making data a first-class citizen: supporting operations with its owner and development team behind it.
Creating a dataset and guaranteeing its quality isn’t enough to produce a data product. It also needs to be easy for the user to locate, read and understand. It should conform to global rules too, in relation to things like versioning, monitoring, and security.
3. Self-serve data infrastructure as a platform
A data platform is really an extension of the platform businesses use to run, maintain and monitor their services, but it uses a vastly different technology stack. The principle of creating a self-serve infrastructure is to provide tools and user-friendly interfaces so that generalist developers can develop analytical data products where, previously, the sheer range of operational platforms made this incredibly difficult.
ELEKS has implemented self-service architecture for both analytical end-users and development teams—self-service BI using Power BI or Tableau—and power users. This has included the self-service creation of different types of cloud resources.
4. Federated computational governance
This is an inevitable consequence of the first principle. Wherever you deploy decentralized services—microservices, for example—it’s essential to introduce overarching rules and regulations to govern their operation also. Although Dehghani puts it, it’s crucial to “maintain an equilibrium between centralization and decentralization”.
In essence, this means that there’s a “common ground” for the whole platform where all data products conform to a shared set of rules, where necessary while leaving enough space for autonomous decision-making. It’s this last point that is the key difference between decentralized and centralized approaches.
The challenges of data mesh
It allows much more room to flex and scale, data mesh, like every other paradigm, shouldn’t be considered a perfect-fit solution for every single scenario. Also with all decentralized data architectures, there are a few common challenges, including:
- Ensuring that toolsets and approaches are unified (where applicable) across teams.
- Minimize the duplication of workload and data between different teams; centralized data management is often incredibly hard to implement company-wide.
- Harmonizing data and unifying presentation. A user that reads interconnected data across several data products should be able to map it correctly.
- Making data products easy to find and understand, through a comprehensive documentation process.
- Establishing consistent monitoring, alerting, and logging practices also.
- Safeguarding data access controls, especially where a many-to-many relationship exists between data products.
According to this blog, data mesh in many ways represents a completely new approach to data. While it certainly is prescriptive in many ways about how technology should be leveraged to implement data mesh principles, perhaps the bigger implementation challenge is the organizational/cultural changes that are needed in order to implement. Overcoming the inertia of decades of centralized, monolithic architecture will not be easy for most companies.
Nevertheless, we think that the four principles of data mesh address significant issues that have long plagued data and analytics applications, and therefore there is real value in thinking about them and gleaning what we can—regardless of whether your organization ever goes “full data mesh”.