Building Trustworthy, Sustainable Data Engineering and MLOps Competence
Keywords:
data engineering, MLOps, trustworthy AI, cloud-native systems, sustainable computing, curriculum framework, reproducibility, cybersecurityAbstract
The use of data engineering as a core competency in modern computing is increasingly becoming a significant factor in supporting operational and strategic decision-making by organisations. This trend has increased the responsibilities of the data engineers to include not only database management, extract-transform-load operations, and code implementation. The modern data engineering practice demands an understanding of system reliability, cybersecurity, privacy, reproducibility, governance, performance engineering, MLOps, and sustainable computing. This desk-study synthesises the recent academic literature, curriculum frameworks, standards, and policy documents published between 2021 and 2026 to create a curriculum and governance framework of trustful data engineering education. The study is based on the guidance of competency-based computing curriculum, artificial intelligence risk management frameworks, reproducibility badging standards, cloud and serverless data pipeline research, MLOps education evidence, and energy-related analysis of artificial intelligence infrastructure. The results build the recommendation that data engineering education needs to be refocused not towards a pipeline-based approach to data engineering but towards an integrated trustworthiness-pipeline model. Under this model, students will be evaluated based not just on whether data systems work, but also on whether those systems are secure, reliable, reproducible, auditable, governable, energy-conscious and capable of being deployed into the real world. The proposed framework comprises cloud-native architecture, data pipeline orchestration, security and privacy engineering, reliability and performance engineering, MLOps lifecycle governance, reproducible artifacts, and sustainable computing. The paper comes to the conclusion that data engineering degree programs need to teach graduates to be productionally accountable and not just isolated technical implementers.
References
Association for Computing Machinery, IEEE Computer Society, & AAAI. (2023). Computer science curricula 2023. ACM, IEEE Computer Society, and AAAI.
European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union.
International Energy Agency. (2025). Energy demand from AI. International Energy Agency.
Lanubile, F., Martínez-Fernández, S., & Quaranta, L. (2024). Training future ML engineers: A project-based course on MLOps.
National Information Standards Organization. (2021). NISO RP-31-2021: Reproducibility badging and definitions. National Information Standards Organization.
National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (NIST AI 100-1). U.S. Department of Commerce.
National Institute of Standards and Technology. (2024). Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1). U.S. Department of Commerce.
Rad, Z. S., & Ghobaei-Arani, M. (2024). Data pipeline approaches in serverless computing: A taxonomy, review, and research trends. Journal of Big Data, 11, Article 82.
World Bank. (2021). World Development Report 2021: Data for better lives. World Bank.