GenAI-Driven Semantic ETL:

Synthesizing Self-Optimizing SQL & PL/SQL

Authors

  • Vasudevan Ananthakrishnan Yakshna Solutions, USA Author
  • Dharmeesh Kondaveeti Conglomerate IT Services Inc, USA Author
  • Abdul Samad Mohammed Dominos, USA Author

DOI:

https://doi.org/10.60087/jklst.v4.n2.003

Abstract

The burgeoning complexity of data ecosystems demands intelligent, adaptive ETL (Extract, Transform, Load) solutions. This research introduces GenAI-Driven Semantic ETL, a novel framework leveraging Generative Artificial Intelligence (GenAI) to automate the synthesis of self-optimizing SQL and PL/SQL code for ETL workflows. By integrating semantic understanding of data schemas, transformation rules, and performance objectives, the system dynamically generates code that autonomously refines execution strategies based on runtime statistics, workload patterns, and data evolution. Key innovations include a context-aware GenAI engine that translates natural language requirements into optimized procedural logic and a feedback-driven optimization loop enabling continuous code adaptation. Evaluations on enterprise datasets demonstrate 40–65% reductions in ETL latency and 30–50% lower resource consumption compared to manually tuned pipelines, while minimizing human intervention. This work pioneers the fusion of generative AI with semantic reasoning to realize truly autonomous, efficient, and future-proof data integration systems. 

Downloads

Download data is not yet available.

References

Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877-1901. Seminal work on large language models capabilities

Rozière, B., et al. (2023). Code Llama: Open Foundation Models for Code. arXiv preprint arXiv:2308.12950. Foundation for fine-tuned code generation models

Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374. Established evaluation paradigms for code generation AI

Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284(5), 34-43. Seminal work on seman-tic technologies

Noy, N., et al. (2019). Protégé: A Tool for Managing and Using Terminology in Radiology Applications. Journal of Digital Im-aging, 32(3), 459-466. Ontology management framework implementation

Angles, R., et al. (2018). PGQL: A Property Graph Query Language. Proceedings of the 2018 International Conference on Man-agement of Data, 1-6. Graph database query foundations

Simitsis, A., & Vassiliadis, P. (2008). A Methodology for the Conceptual Modeling of ETL Processes. CAiSE Forum, 13(2), 305-316. Fundamental ETL modeling approach

Dageville, B., et al. (2016). The Snowflake Elastic Data Warehouse. Proceedings of the 2016 ACM SIGMOD International Con-ference on Management of Data, 215-226.

Armbrust, M., et al. (2020). Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment, 13(12), 3411-3424. Transactional layer for data lakes

Van Aken, D., et al. (2021). An Overview of End-to-End Automatic Database Tuning. Proceedings of the VLDB Endowment, 14(12), 3279-3290. Survey of autonomous database technologies

Krishnan, S., et al. (2018). Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv preprint arXiv:1808.03196. Machine learning for query optimization

Hellerstein, J. M., et al. (2012). The MADlib Analytics Library: Or MAD Skills, the SQL. Proceedings of the VLDB Endowment, 5(12), 1700-1711. Early integration of ML in database system

Orr, L., et al. (2020). Mosaic: A Sample-Based Framework for Operational Database Intelligence. Proceedings of the VLDB En-dowment, 13(12), 3395-3408. Data-centric AI approaches

Manakul, P., et al. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 9004-9021. Detection techniques for AI hallu-cinations

Battaglia, P. W., et al. (2018). Relational Inductive Biases, Deep Learning, and Graph Networks. arXiv preprint arXiv:1806.01261. Foundation for structure-aware AI

Abadi, M., et al. (2016). TensorFlow: A System for Large-Scale Machine Learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 265-283. ML framework enabling adaptive systems

Vanschoren, J. (2019). Meta-Learning: A Survey. arXiv preprint arXiv:1810.03548. Foundations for adaptive learning systems

Davenport, T. H., & Ronanki, R. (2018). Artificial Intelligence for the Real World. Harvard Business Review, 96(1), 108-116. Business impact of AI adoption

Gartner. (2023). Market Guide for Data Integration Tools. Gartner Research Note G00792635. Industry perspective on next-gen ETL

Paparrizos, J., et al. (2022). Lab: Towards Evaluating End-to-End Data Analytics Systems. Proceedings of the VLDB Endowment, 15(12), 3766-3769. Standardized evaluation methodology

Downloads

Published

05-06-2025

How to Cite

Ananthakrishnan, V., Kondaveeti, D. ., & Mohammed, A. S. . (2025). GenAI-Driven Semantic ETL:: Synthesizing Self-Optimizing SQL & PL/SQL. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online), 4(2), 29-43. https://doi.org/10.60087/jklst.v4.n2.003

Most read articles by the same author(s)

<< < 4 5 6 7 8 9 10 11 12 13 > >>