DataMesh: A Web Netting

Reading Time: 4 minutes

Enterprises Data Challenges?

Global data creation is projected to exceed 180 zettabytes in the next five years.

Current data platforms have several architectural failures that hinder enterprise data processing and inhibit business growth.

How today’s enterprise data is managed?

Today’s technology and organization design divide the data into two categories – operational data and analytical data.

Operational data is transactional. Stored into RDBMS at the backend and helps the applications to run their businesses. It helps keep the data’s current state accurate and consistent.

Analytical data is an aggregated view of the facts of the business. Over time, often designed to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports.

The analytical data plane itself has been divided into two main architectures: DataLake and Data WareHouse.

Data Lake: supports data model access patterns, and Data Warehouse supports analytical and business intelligence reporting access patterns.

This architecture provides an integrated and yet separate solution to manage the bulk data. Continuously failing ETL (Extract, Transform, Load) jobs and the ever-growing complexity of the labyrinth of data pipelines have led to fragile architecture.

The concept of data mesh originated while facing the challenges of the existing analytical data architecture.

These architectural differences to manage the two archetypes of data, should not lead to the separation of organization, teams, and people working on them. 

Limitations of Current Data Platforms

#1: Enterprises have been using a centralization strategy to process extensive data from various locations. It is time-consuming and expensive.

 #2: Global data volumes continuously increasing. The centralized data model fails to respond at scale. It slows down the response time which negatively affects business agility.

#3: Data Migration is often prohibited in a few geographies or legal jurisdictions. Such as data stored in an EU country and needs to be accessed by a user in North America. Abiding by data governance regulations is time-consuming and tedious, and can significantly delay data processing and analysis.

What is DataMesh ?

Data mesh is a new way of thinking about how to use data to create organizational value.

It is a new way to think about how to leverage data at scale across an organizational ecosystem.

Data mesh allows businesses to escape from monolithic data architectures trap and save themselves from massive operational and storage costs.

This new distributed approach aims to clear the data access bottlenecks of centralized data ownership by giving data management and ownership to domain-specific business teams.

DataMesh Core Principles

There are 4 core principles embodied in data mesh to achieve scalability while delivering quality and integrity –

4 Principles are

 1) Domain-Driven Distributed Architecture. 

2) Data As a Product,

3) Self-Serve Data Infrastructure as a platform,

4) Federated Computational Governance.

DataMesh Architecture Added Value

1. The distributed architecture of data mesh views data as a product with separate domain ownership of each business unit.

2. Data mesh delegates datasets ownership from the central to the domains to enable business agility and change at scale. Data mesh architecture steers enterprises towards real-time decision-making by closing the time and space gap between an event happening and its consumption/process for analysis.

3, In decentralized data management, the domains are responsible for the quality, security, and transfer of their data products. Data mesh provides a connectivity layer that enables direct access capabilities to data sets where they reside, avoiding costly data transfers and residency concerns.

DataMesh Benefits

Improved Agility and Scalability

Data mesh powers decentralized data operations, independent team performance, and data infrastructure as a service provision, resulting in improved time-to-market, scalability, and business domain agility. It eliminates the process complexities and IT backlog to reduce operating and storage costs.

Faster Access and Accurate Data Delivery

Data mesh offers easily governable and centralized infrastructure based on a self-service model without underlying complexity for faster data access and accurate delivery. Businesses can access data from anywhere with SQL queries with much lower latency.

Supporting vendor-agnostic businesses

Enterprises adopting data mesh architecture are becoming vendor-agnostic businesses that are not locked in with one data platform.

Data Security

The decentralized framework allows cloud applications to be connected to on-site sensitive data, which can be live streaming or existing on devices in real-time. Data mesh queries/compiles data analytics where the data resides, instead of requiring users to make a copy and route it through a public network to a data warehouse.

DataMesh architecture eliminates the risk of data breach or information loss to improve security and reduces data latency to improve overall performance in various use cases including, live streaming, online gaming, financial trading, etc., through platform connectivity in a distributed model.

Data Governance for End-to-End Compliance

Distributed architecture reconciles data ingestion with its sources, formats, and volumes to allow businesses to control their security at the source system. The decentralized data operations simplify compliance with global data governance guidelines for quality data delivery and ease of data access.

Improved Transparency among Cross-Functional Teams

The centralized data ownership of traditional data platforms isolates expert teams, creates a lack of transparency, and fails to provide contingency against data control/ownership loss. Data mesh decentralizes data ownership by distributing it among cross-functional domain teams, including domain experts, business teams, IT, and agile virtual teams through its domain-oriented approach for improved transparency and data quality.

Conclusion

The data mesh addresses the problems of large, complex, monolithic data architectures. Data mesh unlocks endless possibilities for businesses in various consumption scenarios, including behavior modeling, analytics, and data-intensive applications. Its distributed architecture enables easy data access and faster delivery without a vendor lock-in with an expensive enterprise warehouse.

DataMesh On Technology Radar : Trial [OCT-2020]

DataMesh marks a welcome architectural and organizational paradigm shift in how we manage big analytical data and got into Trial Phase on Technology Radar on Oct-2020 – https://www.thoughtworks.com/en-in/radar/techniques/data-mesh

Scala Future

Written by 

Work as a Vice President, Engineering at Knoldus Inc. A Result-driven Techno-Functional Professional with over 20 years of extensive IT experience in Project Management, IT Delivery Operations, Team Management & Leadership. She is a Sun Certified Enterprise Architect (JAVA) professional with core expertise in Managing and Executing Real-time Trading /Foreign exchange / Capital Market/ Investment Banking projects using Java/J2EE/Cloud technologies.