GenAI-Driven Semantic ETL:
Synthesizing Self-Optimizing SQL & PL/SQL
DOI:
https://doi.org/10.60087/jklst.v4.n2.003Abstract
The burgeoning complexity of data ecosystems demands intelligent, adaptive ETL (Extract, Transform, Load) solutions. This research introduces GenAI-Driven Semantic ETL, a novel framework leveraging Generative Artificial Intelligence (GenAI) to automate the synthesis of self-optimizing SQL and PL/SQL code for ETL workflows. By integrating semantic understanding of data schemas, transformation rules, and performance objectives, the system dynamically generates code that autonomously refines execution strategies based on runtime statistics, workload patterns, and data evolution. Key innovations include a context-aware GenAI engine that translates natural language requirements into optimized procedural logic and a feedback-driven optimization loop enabling continuous code adaptation. Evaluations on enterprise datasets demonstrate 40–65% reductions in ETL latency and 30–50% lower resource consumption compared to manually tuned pipelines, while minimizing human intervention. This work pioneers the fusion of generative AI with semantic reasoning to realize truly autonomous, efficient, and future-proof data integration systems.
Downloads
References
Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877-1901. Seminal work on large language models capabilities
Rozière, B., et al. (2023). Code Llama: Open Foundation Models for Code. arXiv preprint arXiv:2308.12950. Foundation for fine-tuned code generation models
Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374. Established evaluation paradigms for code generation AI
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284(5), 34-43. Seminal work on seman-tic technologies
Noy, N., et al. (2019). Protégé: A Tool for Managing and Using Terminology in Radiology Applications. Journal of Digital Im-aging, 32(3), 459-466. Ontology management framework implementation
Angles, R., et al. (2018). PGQL: A Property Graph Query Language. Proceedings of the 2018 International Conference on Man-agement of Data, 1-6. Graph database query foundations
Simitsis, A., & Vassiliadis, P. (2008). A Methodology for the Conceptual Modeling of ETL Processes. CAiSE Forum, 13(2), 305-316. Fundamental ETL modeling approach
Dageville, B., et al. (2016). The Snowflake Elastic Data Warehouse. Proceedings of the 2016 ACM SIGMOD International Con-ference on Management of Data, 215-226.
Armbrust, M., et al. (2020). Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment, 13(12), 3411-3424. Transactional layer for data lakes
Van Aken, D., et al. (2021). An Overview of End-to-End Automatic Database Tuning. Proceedings of the VLDB Endowment, 14(12), 3279-3290. Survey of autonomous database technologies
Krishnan, S., et al. (2018). Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv preprint arXiv:1808.03196. Machine learning for query optimization
Hellerstein, J. M., et al. (2012). The MADlib Analytics Library: Or MAD Skills, the SQL. Proceedings of the VLDB Endowment, 5(12), 1700-1711. Early integration of ML in database system
Orr, L., et al. (2020). Mosaic: A Sample-Based Framework for Operational Database Intelligence. Proceedings of the VLDB En-dowment, 13(12), 3395-3408. Data-centric AI approaches
Manakul, P., et al. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 9004-9021. Detection techniques for AI hallu-cinations
Battaglia, P. W., et al. (2018). Relational Inductive Biases, Deep Learning, and Graph Networks. arXiv preprint arXiv:1806.01261. Foundation for structure-aware AI
Abadi, M., et al. (2016). TensorFlow: A System for Large-Scale Machine Learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 265-283. ML framework enabling adaptive systems
Vanschoren, J. (2019). Meta-Learning: A Survey. arXiv preprint arXiv:1810.03548. Foundations for adaptive learning systems
Davenport, T. H., & Ronanki, R. (2018). Artificial Intelligence for the Real World. Harvard Business Review, 96(1), 108-116. Business impact of AI adoption
Gartner. (2023). Market Guide for Data Integration Tools. Gartner Research Note G00792635. Industry perspective on next-gen ETL
Paparrizos, J., et al. (2022). Lab: Towards Evaluating End-to-End Data Analytics Systems. Proceedings of the VLDB Endowment, 15(12), 3766-3769. Standardized evaluation methodology
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online)

This work is licensed under a Creative Commons Attribution 4.0 International License.
©2024 All rights reserved by the respective authors and JKLST.