Presented by Barbara Eckman
As Zhamak Dehghani defines it in her seminal book “Data Mesh: Delivering Data-Driven Value at Scale,” data mesh is based on four core principles:
- Decentralized domain ownership
- Data as a product
- Self-serve data platform
- Federated computational governance
She writes, “The data products created by each domain team should be discoverable, addressable, trustworthy, possess self-describing semantics and syntax, be interoperable, secure, and governed by global standards and access controls.” These principles ensure an improved level of reliability and ease of use for consumers, while still maintaining high data protection standards.
In this talk, I will outline an approach to building an enterprise data mesh to honor these principles. Our data mesh spans a wide variety of data products, including on-prem and public cloud and SQL, noSQL, and API-only access. We support federated queries across these data products using Presto/Trino and Spark SQL. Fine-grained role-, tag-, and attribute-based access control are provided using extensions of Apache Atlas and Ranger, fully automated after an initial metadata specification. Self-service discovery and access control are provided on a column/attribute level.
Our data mesh vision is not yet fully realized. I’ll share the progress we’ve made toward this vision and the lessons we’ve learned along the way.