MLOps approach in the cloud-native data pipeline design
DOI:
https://doi.org/10.14513/actatechjaur.00581Keywords:
MLOps, Machine learning, data pipeline, cloud-nativeAbstract
The data modeling process is challenging and involves hypotheses and trials. In the industry, a workflow has been constructed around data modeling. The offered modernized workflow expects to use of the cloud’s full abilities as cloud-native services. For a flourishing big data project, the organization should have analytics and information-technological know-how. MLOps approach concentrates on the modeling, eliminating the personnel and technology gap in the deployment. In this article, the paradigm will be verified with a case-study in the context of composing a data pipeline in the cloud-native ecosystem. Based on the analysis, the considered strategy is the recommended way for data pipeline design.
Downloads
References
R. Gao, L. Wang, R. Teti, D. Dornfeld, S. Kumara, M. Mori, M. Helu, Cloud-enabled prognosis for manufacturing. CIRP Annals - Manufacturing Technology 64 (2) (2015) pp. 749-772. doi: https://doi.org/10.1016/j.cirp.2015.05.011
D. A. Tamburri, M. Miglierina, E. Di Nitto, Cloud applications monitoring: An industrial study. Information and Software Technology 127 (2020) 106376. doi: https://doi.org/10.1016/j.infsof.2020.106376
L. Franceschi, M. Donini, P. Frasconi, M. Pontil, On hyperparameter optimization in learning systems. 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings.
M. De Benedictis, A. Lioy, Integrity verification of Docker containers for a light-weight cloud environment. Future Generation Computer Systems 97 (2019) pp. 236-246. doi: https://doi.org/10.1016/j.future.2019.02.026
A. Martin, S. Raponi, T. Combe, R. Di Pietro, Docker ecosystem – Vulnerability Analysis. Computer Communications 122 (2018) pp. 30-43. doi: https://doi.org/10.1016/j.comcom.2018.03.011
M. Mohamed, R. Engel, A. Warke, S. Berman, H. Ludwig, Extensible persistence as a service for containers. Future Generation Computer Systems 97 (2019) pp. 10-20. doi: https://doi.org/10.1016/j.future.2018.12.015
D. Gannon, R. Barga, N. Sundaresan, Cloud-Native Applications. IEEE Cloud Computing, 4 (2017) pp. 16-21. doi: https://doi.ieeecomputersociety.org/10.1109/MCC.2017.4250939
S. Peltonen, L. Mezzalira, D. Taibi, Motivations, Benefits, and Issues for Adopting Micro-Frontends: A Multivocal Literature Review. arXiv. https://arxiv.org/abs/2007.00293
M. Malawski, A. Gajek, A. Zima, B. Balis, K. Figiela, Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions. Future Generation Computer Systems 110 (2020) pp. 502-514. doi: https://doi.org/10.1016/j.future.2017.10.029
S. Kho Lin, U. Altaf, G. Jayaputera, J. Li, D. Marques, D. Meggyesy, S. Sarwar, S. Sharma, W. Voorsluys, R. Sinnott, A. Novak, V. Nguyen, K. Pash, Auto-Scaling a Defence Application across the Cloud Using Docker and Kubernetes. Proceedings - 11th IEEE/ACM International Conference on Utility and Cloud Computing Companion, UCC Companion 2018. doi: https://doi.org/10.1109/UCC-Companion.2018.00076
D. Wu, L. Zhu, X. Xu, S. Sakr, D. Sun, Q. Lu, Building pipelines for heterogeneous execution environments for big data processing. IEEE Software, 33 (2) (2016) pp. 60-67. doi: https://doi.org/10.1109/MS.2016.35
Z. Peng, Stocks Analysis and Prediction Using Big Data Analytics. Proceedings - 2019 International Conference on Intelligent Transportation, Big Data and Smart City, ICITBS 2019. doi: https://doi.org/10.1109/ICITBS.2019.00081
I. Karamitsos, S. Albarhami, C. Apostolopoulos, Applying DevOps Practices of Continuous Automation for Machine Learning. Information 11 (7) (2020) 363. doi: https://doi.org/10.3390/info11070363
J. S. Saltz, S. Yilmazel, O. Yilmazel, Not all software engineers can become good data engineers. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. doi: https://doi.org/10.1109/BigData.2016.7840939
D. Baylor, K. Haas, K. Katsiapis, S. Leong, R. Liu, C. Menwald, M. Trott, H. Miao, M. Zinkevich, N. Polyzotis, Continuous training for production ML in the tensorflow extended (TFX) platform. Proceedings of the 2019 USENIX Conference on Operational Machine Learning, OpML 2019. URL https://www.usenix.org/conference/opml19/presentation/baylor
Google, Kubeflow [cited 2020-12-25]. URL https://www.kubeflow.org/
Z. Li, R. Chard, L. Ward, K. Chard, T. J. Skluzacek, Y. Babuji, A. Woodard, S. Tuecke, B. Blaiszik, M. J. Franklin, I. Foster, DLHub: Simplifying publication, discovery, and use of machine learning models in science. Journal of Parallel and Distributed Computing 147 (2021) pp. 64-76. doi: https://doi.org/10.1016/j.jpdc.2020.08.006
C. Avci Salma, B. Tekinerdogan, I. N. Athanasiadis, Domain-Driven Design of Big Data Systems Based on a Reference Architecture. Software Architecture for Big Data and the Cloud (2017) pp. 49-68. doi: https://doi.org/10.1016/b978-0-12-805467-3.00004-1
R. Mitchell, L. Pottier, S. Jacobs, R. F. Da Silva, M. Rynge, K. Vahi, & E. Deelman, Exploration of Workflow Management Systems Emerging Features from Users Perspectives. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. doi: https://doi.org/10.1109/BigData47090.2019.9005494
F. Rouzbeh, P. Griffin, A. Grama, M. Adibuzzaman, Collaborative Cloud Computing Framework for Health Data with Open Source Technologies. arXiv. https://doi.org/10.1145/3388440.3412460
Google, Tensorflow [cited 2020-12-25]. URL https://www.tensorflow.org/
M. Abdar, W. Książek, U. R. Acharya, R. S. Tan, V. Makarenkov, P. Pławiak, A new machine learning technique for an accurate diagnosis of coronary artery disease. Computer Methods and Programs in Biomedicine 179 (2019) 104992. doi: https://doi.org/10.1016/j.cmpb.2019.104992
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Acta Technica Jaurinensis
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.