超级计算机上矩阵乘的并行计算与实现

引用本文

伍湘君, 黄丽萍. 超级计算机上矩阵乘的并行计算与实现[J]. 应用气象学报, 2005, 16(1): 121-128. 复制到剪切板

Wu Xiang jun, Huang Liping. IMPLEMENTATION OF MATRICES-MULTIPLICATION ON SUPERCOMPUTER[J]. Journal of Applied Meteorological Science, 2005, 16(1): 121-128 复制到剪切板

超级计算机上矩阵乘的并行计算与实现

伍湘君, 黄丽萍

中国气象科学研究院, 北京 100081

2003-09-08 收到, 2004-02-11 收到修改稿.

资助项目: “十五”国家重点科技攻关计划项目 (2001BA607B)“中国气象数值预报技术创新研究”资助

摘要: 数值预报系统中经常要用到矩阵乘运算。在分布式超级计算机 (如IBM-SP) 上, 矩阵乘的并行计算需要较多的数据移动, 有效的数据传输对矩阵乘的实现至关重要。该文讨论了两种矩阵乘的并行算法, 一种是基于矩阵的列-行划分方式, 一种是基于矩阵的网格划分方式。在IBM-SP计算机上的实验结果表明, 网格划分的矩阵乘并行算法通讯开销更小, 并行效率更高, 其并行加速比较列-行并行算法改善约10 %。

关键词: 数值预报矩阵乘并行计算分布式并行计算机数据通讯

IMPLEMENTATION OF MATRICES-MULTIPLICATION ON SUPERCOMPUTER

Wu Xiang jun, Huang Liping

Chinese Academy of Meteorological Sciences, Beijing 100081

Abstract: The matrices multiplication is often used in NWP. On distributed systems, such as IBM-SP, the multiplication of two matrices requires data transpose and the efficient data communication are crucial to its performance. Two parallel algorithms are presented, one is based on column-row decomposition and another is based on mesh partition, and the implementation and communication-time of this two different methods are discussed. Results on IBM-SP show that the communication in mesh algorithm are less and the improvement on speedup is up to 10%.

Key words: NWP Matrices-multiplication Parallel-computing Distributed system Data communication

[1]	Strassen V, Gaussian Elimination is Not Optinal. Numerical Mathematics, 1969, 13: 354–356. DOI:10.1007/BF02165411
[2]	BarryWilkinson, MichaelAllen著. 陆鑫达译.并行程序设计. 北京: 机械工业出版社, 2002.
[3]	都志辉著. 高性能计算并行编程技术———MPI并行程序设计. 北京: 清华大学出版社, 2001.
[4]	李晓梅, 蒋增荣著. 并行算法. 长沙: 湖南科学技术出版社, 1992.
[5]	施妙根, 顾丽珍编著. 科学和工程计算基础. 北京: 清华大学出版社, 1999.
[6]	金之雁, 王鼎兴. 大规模数据并行问题的可扩展性分析. 应用气象学报, 2003, 14, (3): 369–374.


应用气象学报 2005, 16 (1): 121-128	PDF


图 1. 3×3处理机上的网格矩阵乘