DOI: 10.18178/wcse.2025.06.004
Towards Efficient Multimodal AI: A Flexible Framework for Distributed Training
Abstract— The rapid advancement of artificial intelligence has led to an increasing demand for systems capable of processing and understanding multimodal information. However, developing such systems poses challenges in integrating data, architecture, and computational resources. To address these issues, this study proposes a flexible and efficient distributed multimodal training and model deployment framework designed to streamline the development process. The framework facilitates seamless interaction between key components, optimizing resource allocation and training efficiency. To evaluate the effectiveness of the proposed framework, we conduct experiments on a multimodal anomaly detection task, demonstrating its ability to handle complex data representations. Our study also explores distributed training strategies to enhance the scalability of large-scale multimodal models. Through performance analysis, we identify and mitigate key bottlenecks, leading to improved computational efficiency and reduced training time. Experimental results confirm that our framework not only supports efficient multimodal learning but also enhances system performance in large-scale AI applications.
Index Terms— MultiModal Learning, Anomaly Detection, Performance Optimization
Weiyu Chen, Yen Han Chiang
NCHC (National Center for High-Performance Computing)
NYCU (National Yang Ming Chiao Tung University)
Vincent S. Tseng
NYCU (National Yang Ming Chiao Tung University)
Cite: Weiyu Chen, Yen Han Chiang, Vincent S. Tseng, "Towards Efficient Multimodal AI: A Flexible Framework for Distributed Training", 2025 the 15th International Workshop on Computer Science and Engineering (WCSE 2025), pp. 21-26, Jeju Island, South Korea, June 28-30, 2025.
