Federated Data-Mesh Quality Scoring with Great Expectations and Apache Atlas Lineage
DOI:
https://doi.org/10.60087/jklst.v4.n2.008Abstract
The proliferation of decentralized data architectures like data-mesh introduces challenges in maintaining consistent data quality across federated domains. This research proposes an integrated framework for federated data quality scoring by leveraging Great Expectations (GE) for declarative data validation and Apache Atlas for lineage-driven impact analysis. The solution enables domain teams to autonomously define quality rules using GE, while Apache Atlas captures end-to-end lineage to propagate quality scores across interconnected datasets. This lineage-aware approach quantifies quality degradation risks downstream, providing a holistic view of data health in a decentralized ecosystem. Experimental results demonstrate a 40% reduction in root-cause analysis time and a 35% improvement in cross-domain trust scores. The framework supports scalable, domain-agnostic quality monitoring without central oversight, aligning with data-mesh principles of decentralization and domain ownership.
Downloads
References
References
Dehghani, Z. (2022). Data Mesh: Delivering Da-ta-Driven Value at Scale. O'Reilly Media.
Marz, N., & Warren, J. (2015). Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications.
Fowler, M. (2021). "Data Mesh Principles and Logical Architecture". martinfowler.com.
Abedjan, Z., et al. (2016). "Detecting Data Errors: Where Are We and What Needs to Be Done?". Pro-ceedings of the VLDB Endowment, 9(12), 993-1004.
Hellerstein, J. M., et al. (2019). "Quality-Driven Data Sharing with Change Propagation". CIDR.
Qu, H., et al. (2018). "Data Contract: A Decen-tralized Approach for Data Quality in Data Sharing". IEEE ICDE, 1558-1561.
Ballou, D. P., & Pazer, H. L. (2003). "Modeling Information Manufacturing Systems to Determine Information Product Quality". Management Science, 49(4), 462-484.
Bertino, E., et al. (2019). "Data Trustworthi-ness—Concepts and Challenges". ACM Journal of Data and Information Quality, 11(2), 1-6.
Schell, A., et al. (2022). "Declarative Data Quality with Great Expectations". Journal of Open Source Software, 7(78), 4682.
Data Quality Builders. (2023). Great Expecta-tions in Production: Patterns for Data Quality at Scale. O'Reilly Report.
Khurana, S., et al. (2019). "Apache Atlas: Scala-ble Metadata Management for Hadoop Ecosystem". IEEE Big Data, 2879-2888.
Simmhan, Y., et al. (2018). "A Survey of Data Provenance in Cloud Computing Environments". IEEE Transactions on Services Computing, 14(3), 1-20.
Google Cloud. (2023). Data Mesh Implementa-tion Framework: Lessons from 20 Enterprise De-ployments. Technical White Paper.
AWS Solutions Lab. (2022). "Federated Data Quality at FinServCo: A Data Mesh Case Study". SIGMOD Industry Track, 78-89.
Schelter, S., et al. (2018). "Automating Large-Scale Data Quality Verification". Proceedings of the VLDB Endowment, 11(12), 1781-1794.
Hellerstein, J. M. (2010). Quantitative Data Cleaning for Large Databases. United Nations Eco-nomic Commission for Europe Report.
NIST. (2021). Data Quality Measurement Framework. Special Publication 1500-12.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online)

This work is licensed under a Creative Commons Attribution 4.0 International License.
©2024 All rights reserved by the respective authors and JKLST.