{"id":316,"date":"2024-08-08T17:13:09","date_gmt":"2024-08-08T09:13:09","guid":{"rendered":"https:\/\/www.ndnlab.com\/?p=316"},"modified":"2024-08-08T17:13:10","modified_gmt":"2024-08-08T09:13:10","slug":"crux-gpu-efficient-communication-scheduling-for-deep-learning-training","status":"publish","type":"post","link":"https:\/\/www.ndnlab.com\/?p=316","title":{"rendered":"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training"},"content":{"rendered":"\n<p>Crux:GPU\u9ad8\u6548\u7684\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\u901a\u4fe1\u8c03\u5ea6<\/p>\n\n\n\n<p>\u4f5c\u8005\uff1a\u963f\u91cc\u4e91\u56e2\u961f<\/p>\n\n\n\n<p>Author\uff1aJiamin Cao, Yu Guan, Kun Qian, Jiaqi Gao, Wencong Xiao, Jianbo Dong\uff0cBinzhang Fu, Dennis Cai, Ennan Zhai<\/p>\n\n\n\n<p><strong>\u8bba\u6587\u6458\u8981\u539f\u6587\uff1a<\/strong>Deep learning training (DLT), e.g., large language model (LLM)training,has become one of the most important services in multi-tenant cloud computing. By deeply studying in-production DLTjobs, we observed that communication contention among differ-ent DLT jobs seriously influences the overall GPU computationutilization, resulting in the low efficiency of the training cluster.In this paper, we presentCrux, a communication scheduler thataims to maximize GPU computation utilizationby mitigating the communication contention among DLT jobs. Maximizing GPU com-putation utilization for DLT, nevertheless, is NP-Complete; thus,we formulate and prove a novel theorem to approach this goal by GPUintensity-aware communication scheduling. Then, we propose an approach that prioritizes the DLT flows with high GPU com-putation intensity, reducing potential communication contention.Our 96-GPU testbed experiments show that Crux improves 8.3% to 14.8% GPU computation utilization. The large-scale production trace-based simulation further shows that Crux increases GPU computation utilization by up to 23% compared with alternatives including Sincronia, TACCL, and CASSINI.<\/p>\n\n\n\n<p><strong>\u8bba\u6587\u6458\u8981\u4e2d\u6587\uff1a<\/strong>\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\uff08DLT\uff09\uff0c\u4f8b\u5982\uff1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8bad\u7ec3\u5df2\u6210\u4e3a\u591a\u79df\u6237\u4e91\u8ba1\u7b97\u4e2d\u6700\u91cd\u8981\u7684\u670d\u52a1\u4e4b\u4e00\u3002\u901a\u8fc7\u5bf9\u751f\u4ea7\u4e2dDLT\u4f5c\u4e1a\u7684\u6df1\u5165\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u73b0<span style=\"background: #ff0;\">DLT\u4f5c\u4e1a\u95f4\u7684\u901a\u4fe1\u7ade\u4e89\u4e25\u91cd\u5f71\u54cd\u4e86GPU\u7684\u6574\u4f53\u8ba1\u7b97\u5229\u7528\u7387\uff0c\u5bfc\u81f4\u8bad\u7ec3\u96c6\u7fa4\u7684\u6548\u7387\u4f4e\u4e0b<\/span>\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u4fe1\u8c03\u5ea6\u5668Crux\uff0c\u65e8\u5728\u901a\u8fc7\u51cf\u5c11DLT\u4f5c\u4e1a\u4e4b\u95f4\u7684\u901a\u4fe1\u4e89\u7528\u6765\u6700\u5927\u5316GPU\u8ba1\u7b97\u5229\u7528\u7387.\u7136\u800c\uff0cDLT\u7684GPU\u8ba1\u7b97\u5229\u7528\u7387\u6700\u5927\u5316\u662fNP\u5b8c\u5168\u7684;\u56e0\u6b64\uff0c\u6211\u4eec\u516c\u5f0f\u5316\u5e76\u8bc1\u660e\u4e86\u4e00\u4e2a\u65b0\u7684\u5b9a\u7406\uff0c<span style=\"background: #ff0;\">\u4ee5\u901a\u8fc7GPU\u5f3a\u5ea6\u611f\u77e5\u901a\u4fe1\u8c03\u5ea6\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807<\/span>\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9GPU\u8ba1\u7b97\u5f3a\u5ea6\u8f83\u9ad8\u7684DLT\u6d41\u8fdb\u884c\u4f18\u5148\u7ea7\u6392\u5e8f\u7684\u65b9\u6cd5\uff0c\u51cf\u5c11\u4e86\u6f5c\u5728\u7684\u901a\u4fe1\u51b2\u7a81.\u6211\u4eec\u768496-GPU\u6d4b\u8bd5\u5e73\u53f0\u5b9e\u9a8c\u8868\u660e\uff0cCrux\u5c06GPU\u8ba1\u7b97\u5229\u7528\u7387\u63d0\u9ad8\u4e868.3%\u81f314.8%\u3002\u57fa\u4e8e\u5927\u89c4\u6a21\u751f\u4ea7\u8f68\u8ff9\u7684\u6a21\u62df\u8fdb\u4e00\u6b65\u8868\u660e\uff0c\u4e0eSincronia\u3001TACCL\u548cCASSINI\u7b49\u66ff\u4ee3\u65b9\u6848\u76f8\u6bd4\uff0cCrux\u5c06GPU\u8ba1\u7b97\u5229\u7528\u7387\u63d0\u9ad8\u4e8623%\u3002<\/p>\n\n\n\n<p><strong>\u7814\u7a76\u95ee\u9898\u3001\u5173\u952e\u95ee\u9898\uff1a<\/strong>\u672c\u6587\u7684\u7814\u7a76\u95ee\u9898\u662f <span style=\"background: #ff0;\">\u5982\u4f55\u901a\u8fc7\u4f18\u5316\u591a\u79df\u6237\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\uff08DLT\uff09\u96c6\u7fa4\u4e2d\u7684\u901a\u4fe1\u8c03\u5ea6\uff0c\u63d0\u9ad8 GPU \u8ba1\u7b97\u5229\u7528\u7387\uff0c\u4ece\u800c\u63d0\u5347\u8bad\u7ec3\u6548\u7387\u548c\u96c6\u7fa4\u6536\u76ca<\/span>\u3002<\/p>\n\n\n\n<p>\u5177\u4f53\u800c\u8a00\uff0c\u7814\u7a76\u95ee\u9898\u5305\u542b\u4ee5\u4e0b\u4e24\u4e2a\u65b9\u9762\uff1a<\/p>\n\n\n\n<p>1.\u5206\u6790 DLT \u4efb\u52a1\u4e4b\u95f4\u7684\u901a\u4fe1\u7ade\u4e89\u95ee\u9898\uff1a \u7814\u7a76\u751f\u4ea7\u73af\u5883\u4e2d DLT \u4efb\u52a1\u4e4b\u95f4\u7684\u901a\u4fe1\u7ade\u4e89\u73b0\u8c61\uff0c\u5206\u6790\u5176\u4ea7\u751f\u539f\u56e0\u548c\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u89e3\u51b3\u65b9\u6848\u3002<\/p>\n\n\n\n<p>2.\u8bbe\u8ba1\u9ad8\u6548\u7684\u901a\u4fe1\u8c03\u5ea6\u7b97\u6cd5\uff1a \u57fa\u4e8e GPU \u5f3a\u5ea6\u6982\u5ff5\uff0c\u8bbe\u8ba1\u9ad8\u6548\u7684\u901a\u4fe1\u8c03\u5ea6\u7b97\u6cd5\uff0c\u4f18\u5148\u8c03\u5ea6 GPU \u5f3a\u5ea6\u9ad8\u7684\u4efb\u52a1\uff0c\u4ece\u800c\u6700\u5927\u5316 GPU \u5229\u7528\u7387\u3002<\/p>\n\n\n\n<p><strong>\u7814\u7a76\u52a8\u673a\uff1a<\/strong><strong><\/strong><\/p>\n\n\n\n<p>1.\u751f\u4ea7\u73af\u5883\u4e2d DLT \u4efb\u52a1\u901a\u4fe1\u7ade\u4e89\u666e\u904d\u5b58\u5728\uff1a \u968f\u7740\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0cDLT \u4efb\u52a1\u5728<span style=\"background: #ff0;\">\u5171\u4eab GPU \u96c6\u7fa4\u4e2d<\/span>\u6267\u884c\u65f6\uff0c\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4f1a\u4ea7\u751f\u901a\u4fe1\u7ade\u4e89\uff0c\u5bfc\u81f4 GPU \u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u8bad\u7ec3\u6548\u7387\u964d\u4f4e\uff0c\u96c6\u7fa4\u6536\u76ca\u53d7\u635f\u3002<\/p>\n\n\n\n<p>2.\u73b0\u6709\u901a\u4fe1\u8c03\u5ea6\u65b9\u6cd5\u65e0\u6cd5\u6709\u6548\u89e3\u51b3\u901a\u4fe1\u7ade\u4e89\u95ee\u9898\uff1a <span style=\"background: #ff0;\">\u73b0\u6709\u7684 DLT \u901a\u4fe1\u8c03\u5ea6\u65b9\u6cd5\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4efb\u52a1\u5185\u90e8<\/span>\uff0c\u5ffd\u7565\u4e86\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u7684\u901a\u4fe1\u7ade\u4e89\uff0c\u65e0\u6cd5\u6709\u6548\u63d0\u9ad8\u6574\u4f53 GPU \u5229\u7528\u7387\u3002<\/p>\n\n\n\n<p>3.\u63d0\u9ad8 GPU \u5229\u7528\u7387\u5bf9 DLT \u96c6\u7fa4\u81f3\u5173\u91cd\u8981\uff1a GPU \u5229\u7528\u7387\u662f DLT \u96c6\u7fa4\u6027\u80fd\u7684\u91cd\u8981\u6307\u6807\uff0c\u76f4\u63a5\u5f71\u54cd\u5230\u8bad\u7ec3\u6548\u7387\u548c\u96c6\u7fa4\u6536\u76ca\u3002\u56e0\u6b64\uff0c\u4f18\u5316 GPU \u5229\u7528\u7387\u5bf9\u4e8e DLT \u96c6\u7fa4\u81f3\u5173\u91cd\u8981\u3002<\/p>\n\n\n\n<p><strong>\u7814\u7a76\u610f\u4e49\uff1a<\/strong><strong><\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\u63d0\u5347 DLT \u8bad\u7ec3\u6548\u7387\uff1a \u901a\u8fc7<span style=\"background: #ff0;\">\u4f18\u5316 DLT \u96c6\u7fa4\u7684\u901a\u4fe1\u8c03\u5ea6<\/span>\uff0c\u51cf\u5c11\u901a\u4fe1\u7ade\u4e89\uff0c\u53ef\u4ee5\u63d0\u9ad8 GPU \u5229\u7528\u7387\uff0c\u4ece\u800c\u7f29\u77ed\u8bad\u7ec3\u65f6\u95f4\uff0c\u63d0\u5347\u8bad\u7ec3\u6548\u7387\u3002<\/li>\n\n\n\n<li>\u63d0\u9ad8\u96c6\u7fa4\u6536\u76ca\uff1a GPU \u5229\u7528\u7387\u8d8a\u9ad8\uff0c\u96c6\u7fa4\u7684\u541e\u5410\u91cf\u8d8a\u5927\uff0c\u53ef\u4ee5\u5904\u7406\u66f4\u591a\u7684\u8bad\u7ec3\u4efb\u52a1\uff0c\u4ece\u800c<span style=\"background: #ff0;\">\u63d0\u9ad8\u96c6\u7fa4\u7684\u6536\u76ca<\/span>\u3002<\/li>\n\n\n\n<li>\u63a8\u52a8 DLT \u96c6\u7fa4\u53d1\u5c55\uff1a \u901a\u8fc7\u4f18\u5316 DLT \u96c6\u7fa4\u7684\u901a\u4fe1\u8c03\u5ea6\uff0c\u53ef\u4ee5\u4fc3\u8fdb DLT \u96c6\u7fa4\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u80fd\u591f\u66f4\u597d\u5730\u6ee1\u8db3\u65e5\u76ca\u589e\u957f\u7684\u8ba1\u7b97\u9700\u6c42\u3002<\/li>\n<\/ol>\n\n\n\n<p><strong>\u7814\u7a76\u5185\u5bb9\uff08\u7b97\u6cd5\u3001\u65b9\u6cd5\u3001\u6280\u672f\u3001\u6a21\u578b\uff09<\/strong>\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>GPU \u5f3a\u5ea6\u6982\u5ff5\uff1a<\/li>\n<\/ol>\n\n\n\n<p>\u5b9a\u4e49 GPU \u5f3a\u5ea6\u6765\u8861\u91cf\u4efb\u52a1\u5bf9 GPU \u5229\u7528\u7387\u7684\u5f71\u54cd\uff0c\u5e76\u4ee5\u6b64\u4f5c\u4e3a\u901a\u4fe1\u8c03\u5ea6\u7684\u4f9d\u636e\u3002<\/p>\n\n\n\n<p>GPU \u5f3a\u5ea6\u8d8a\u9ad8\uff0c\u4efb\u52a1\u5bf9 GPU \u5229\u7528\u7387\u7684\u5f71\u54cd\u8d8a\u5927\uff0c\u56e0\u6b64\u5728\u901a\u4fe1\u8c03\u5ea6\u4e2d\u5e94\u4f18\u5148\u8003\u8651\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"426\" height=\"194\" src=\"https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image.png\"  class=\"wp-image-317\" style=\"width:600px\" srcset=\"https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image.png 426w, https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-300x137.png 300w\" sizes=\"auto, (max-width: 426px) 100vw, 426px\" title=\"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training\u63d2\u56fe\" alt=\"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training\u63d2\u56fe\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u8def\u5f84\u9009\u62e9\u7b97\u6cd5\uff1a<\/li>\n<\/ul>\n\n\n\n<p>\u8bbe\u8ba1\u8def\u5f84\u9009\u62e9\u7b97\u6cd5\uff0c\u9009\u62e9\u5bf9\u9ad8 GPU \u5f3a\u5ea6\u4efb\u52a1\u5f71\u54cd\u8f83\u5c0f\u7684\u8def\u5f84\uff0c\u907f\u514d\u901a\u4fe1\u7ade\u4e89\u3002<\/p>\n\n\n\n<p>\u7b97\u6cd5\u4f1a\u6839\u636e\u4efb\u52a1\u7684 GPU \u5f3a\u5ea6\u548c\u7f51\u7edc\u62d3\u6251\u7ed3\u6784\u8fdb\u884c\u8def\u5f84\u9009\u62e9\uff0c\u4ee5\u786e\u4fdd\u9ad8 GPU \u5f3a\u5ea6\u4efb\u52a1\u80fd\u591f\u4f18\u5148\u4f7f\u7528\u5e26\u5bbd\u8f83\u5bbd\u7684\u8def\u5f84\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u4f18\u5148\u7ea7\u5206\u914d\u7b97\u6cd5\uff1a<\/li>\n<\/ul>\n\n\n\n<p>\u8bbe\u8ba1\u4f18\u5148\u7ea7\u5206\u914d\u7b97\u6cd5\uff0c\u8003\u8651 DLT \u4efb\u52a1\u7684\u7279\u5f81\uff08\u5982\u8fed\u4ee3\u548c\u8ba1\u7b97-\u901a\u4fe1\u91cd\u53e0\uff09\uff0c\u5bf9\u4efb\u52a1\u8fdb\u884c\u4f18\u5148\u7ea7\u5206\u914d\uff0c\u4f18\u5148\u8c03\u5ea6 GPU \u5f3a\u5ea6\u9ad8\u7684\u4efb\u52a1\u3002<\/p>\n\n\n\n<p>\u7b97\u6cd5\u4f1a\u6839\u636e\u4efb\u52a1\u7684 GPU \u5f3a\u5ea6\u3001\u8fed\u4ee3\u65f6\u95f4\u3001\u8ba1\u7b97-\u901a\u4fe1\u91cd\u53e0\u7a0b\u5ea6\u7b49\u56e0\u7d20\u8fdb\u884c\u4f18\u5148\u7ea7\u5206\u914d\uff0c\u4ee5\u786e\u4fdd\u9ad8 GPU \u5f3a\u5ea6\u4efb\u52a1\u80fd\u591f\u4f18\u5148\u83b7\u5f97\u7f51\u7edc\u8d44\u6e90\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"467\" height=\"425\" src=\"https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-1.png\"  class=\"wp-image-318\" style=\"width:600px\" srcset=\"https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-1.png 467w, https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-1-300x273.png 300w\" sizes=\"auto, (max-width: 467px) 100vw, 467px\" title=\"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training\u63d2\u56fe1\" alt=\"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training\u63d2\u56fe1\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u4f18\u5148\u7ea7\u538b\u7f29\u7b97\u6cd5\uff1a<\/li>\n<\/ul>\n\n\n\n<p>\u8bbe\u8ba1\u4f18\u5148\u7ea7\u538b\u7f29\u7b97\u6cd5\uff0c\u5c06\u4f18\u5148\u7ea7\u5206\u914d\u7ed3\u679c\u538b\u7f29\u5230\u6709\u9650\u7684\u4f18\u5148\u7ea7\u7ea7\u522b\u4e0b\uff0c\u6700\u5c0f\u5316 GPU \u5229\u7528\u7387\u635f\u5931\u3002<\/p>\n\n\n\n<p>\u7b97\u6cd5\u4f1a\u6839\u636e\u4efb\u52a1\u7684 GPU \u5f3a\u5ea6\u3001\u7f51\u7edc\u62d3\u6251\u7ed3\u6784\u7b49\u56e0\u7d20\u8fdb\u884c\u4f18\u5148\u7ea7\u538b\u7f29\uff0c\u4ee5\u786e\u4fdd\u9ad8 GPU \u5f3a\u5ea6\u4efb\u52a1\u80fd\u591f\u4f18\u5148\u83b7\u5f97\u7f51\u7edc\u8d44\u6e90\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4f4e GPU \u5f3a\u5ea6\u4efb\u52a1\u4e4b\u95f4\u7684\u7ade\u4e89\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"468\" height=\"483\" src=\"https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-2.png\"  class=\"wp-image-319\" style=\"width:600px\" srcset=\"https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-2.png 468w, https:\/\/www.ndnlab.com\/wp-content\/uploads\/2024\/08\/image-2-291x300.png 291w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" title=\"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training\u63d2\u56fe2\" alt=\"Crux: GPU-Efficient\u00a0Communication Scheduling for Deep\u00a0Learning Training\u63d2\u56fe2\" \/><\/figure>\n\n\n\n<p><strong>\u4e3b\u8981\u8d21\u732e<\/strong><strong><\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\u5bf9\u6211\u4eec\u7684\u591a\u79df\u6237\u751f\u4ea7\u57f9\u8bad\u96c6\u7fa4\u7684\u5206\u6790\u8868\u660e\uff0c36.3%\u7684DLT\u4f5c\u4e1a\u53ef\u80fd\u4f1a\u9047\u5230\u4e0e\u5176\u4ed6\u4f5c\u4e1a\u7684\u901a\u4fe1\u7ade\u4e89\uff0c\u4ece\u800c\u5bfc\u81f4\u5927\u91cfGPU\u6d6a\u8d39\u3002\u6211\u4eec\u8ba4\u4e3a\u4f5c\u4e1a\u95f4\u901a\u4fe1\u8c03\u5ea6\u5bf9\u4e8e\u63d0\u9ad8GPU\u5229\u7528\u7387\u662f\u5fc5\u8981\u7684\u3002\u6211\u4eec\u5728https:\/\/github.com\/alibaba\/alibaba-lingjun-dataset-2023\u4e0a\u516c\u5f00\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u3002<\/li>\n\n\n\n<li>\u6211\u4eec\u5c06GPU\u5229\u7528\u7387\u6700\u5927\u5316\u8fd9\u4e00NP\u5b8c\u5168\uff08NPC\uff09\u95ee\u9898\u8f6c\u5316\u4e3aGPU\u5f3a\u5ea6\u611f\u77e5\u901a\u4fe1\u8c03\u5ea6\u95ee\u9898\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7cfb\u7edfCrux\u6765\u4f18\u5316DLT\u96c6\u7fa4\u4e2d\u7684GPU\u5229\u7528\u7387\u3002Crux\u5f15\u5165\u4e86\uff081\uff09\u4e00\u79cd\u8def\u5f84\u9009\u62e9\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e3a\u5177\u6709\u8f83\u9ad8GPU\u5bc6\u5ea6\u7684\u4f5c\u4e1a\u9009\u62e9\u6700\u4e0d\u62e5\u585e\u7684\u8def\u5f84\u6765\u7f13\u89e3\u901a\u4fe1\u4e89\u7528\uff0c\uff082\uff09\u4f18\u5148\u7ea7\u5206\u914d\u7b97\u6cd5\uff0c\u8003\u8651DLT\u7279\u6027\uff0c\u5982\u591a\u6b21\u8fed\u4ee3\u548c\u901a\u4fe1-\u8ba1\u7b97\u91cd\u53e0\uff0c\u4ee5\u53ca\uff083\uff09\u4e00\u79cd\u9ad8\u6548\u7684\u4f18\u5148\u7ea7\u538b\u7f29\u7b97\u6cd5\uff0c\u4ee5\u9002\u5e94\u5b9e\u9645NIC\u548c\u4ea4\u6362\u673a\u4e0a\u6709\u9650\u7684\u4f18\u5148\u7ea7\u3002<\/li>\n\n\n\n<li>\u6211\u4eec\u7684\u5b9e\u9a8c\u6d4b\u8bd5\u5e73\u53f0\u753196\u4e2aNvidia A100 GPU\u7ec4\u6210\uff0c\u8868\u660eCrux\u5728\u5b9e\u9645\u6a21\u578b\uff08\u4f8b\u5982\uff0cGPT\u3001BERT\u548cResNet\uff09\u3002\u6211\u4eec\u57fa\u4e8e\u751f\u4ea7\u8ddf\u8e2a\uff082\uff0c000 + GPU\uff0c5\uff0c000+\u4f5c\u4e1a\uff09\u7684\u6a21\u62df\u8868\u660e\uff0c\u4e0e\u6700\u5148\u8fdb\u7684\u89e3\u51b3\u65b9\u6848\uff08Sincronia\uff0cCASSINI\u548cTACCL\uff09\u76f8\u6bd4\uff0cCrux\u5728\u5404\u79cd\u96c6\u7fa4\u7f51\u7edc\u67b6\u6784\u4e0b\u5c06GPU\u5229\u7528\u7387\u63d0\u9ad8\u4e865%\u81f323%\u3002<\/li>\n<\/ol>\n\n\n\n<p><strong>\u521b\u65b0\u70b9\u3001\u521b\u65b0\u6027<\/strong>\uff1a<\/p>\n\n\n\n<p>\u63d0\u51fa\u4e86 GPU \u5f3a\u5ea6\u6982\u5ff5\u5e76\u6784\u5efa\u4e86\u57fa\u4e8e GPU \u5f3a\u5ea6\u7684\u901a\u4fe1\u8c03\u5ea6\u65b9\u6cd5\uff0c\u4ece\u800c\u6709\u6548\u5730\u89e3\u51b3\u4e86 DLT \u96c6\u7fa4\u4e2d\u4efb\u52a1\u4e4b\u95f4\u7684\u901a\u4fe1\u7ade\u4e89\u95ee\u9898\uff0c\u63d0\u9ad8\u4e86 GPU \u5229\u7528\u7387\u548c\u8bad\u7ec3\u6548\u7387\u3002<\/p>\n\n\n\n<p><strong>\u6280\u672f\u96be\u70b9<\/strong>\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\u901a\u4fe1\u7ade\u4e89\u7684\u590d\u6742\u6027\uff0c\u4ee5\u53ca\u5982\u4f55\u51c6\u786e\u8bc4\u4f30\u548c\u8c03\u5ea6\u901a\u4fe1\u7ade\u4e89\u3002<\/li>\n\n\n\n<li>GPU \u5f3a\u5ea6\u7684\u8ba1\u7b97\uff0c\u4ee5\u53ca\u5982\u4f55\u9ad8\u6548\u5730\u8ba1\u7b97 GPU \u5f3a\u5ea6\u3002<\/li>\n\n\n\n<li>\u8c03\u5ea6\u7b97\u6cd5\u7684\u8bbe\u8ba1\uff0c\u4ee5\u53ca\u5982\u4f55\u8bbe\u8ba1\u9ad8\u6548\u7684\u8c03\u5ea6\u7b97\u6cd5\u3002<\/li>\n\n\n\n<li>\u7cfb\u7edf\u7684\u53ef\u6269\u5c55\u6027\u548c\u9c81\u68d2\u6027\uff0c\u4ee5\u53ca\u5982\u4f55\u63d0\u9ad8\u7cfb\u7edf\u7684\u53ef\u6269\u5c55\u6027\u548c\u9c81\u68d2\u6027\u3002<\/li>\n<\/ol>\n\n\n\n<p><strong>\u8fdb\u4e00\u6b65\u7814\u7a76\u601d\u8def (Future Work)<\/strong>\uff1a<\/p>\n\n\n\n<p>Crux \u8bba\u6587\u63d0\u51fa\u7684 GPU \u5f3a\u5ea6\u6982\u5ff5\u548c\u901a\u4fe1\u8c03\u5ea6\u65b9\u6cd5\u4e3a DLT \u96c6\u7fa4\u7684\u6027\u80fd\u4f18\u5316\u63d0\u4f9b\u4e86\u65b0\u7684\u601d\u8def\u3002\u672a\u6765\u7814\u7a76\u53ef\u4ee5\u4ece\u4ee5\u4e0b\u51e0\u4e2a\u65b9\u9762\u8fdb\u884c\u63a2\u7d22\uff1a<\/p>\n\n\n\n<p>GPU \u5f3a\u5ea6\u7684\u7ec6\u5316\uff1a \u8003\u8651\u6570\u636e\u4f20\u8f93\u6a21\u5f0f\u3001\u6570\u636e\u7c7b\u578b\u3001\u901a\u4fe1\u534f\u8bae\u7b49\u56e0\u7d20\uff0c\u8bbe\u8ba1\u66f4\u7cbe\u7ec6\u7684 GPU \u5f3a\u5ea6\u8ba1\u7b97\u65b9\u6cd5\uff0c\u4ee5\u66f4\u51c6\u786e\u5730\u53cd\u6620\u4efb\u52a1\u7684\u901a\u4fe1\u9700\u6c42\u3002<\/p>\n\n\n\n<p>\u591a\u76ee\u6807\u4f18\u5316\uff1a \u5728\u6700\u5927\u5316 GPU \u5229\u7528\u7387\u7684\u57fa\u7840\u4e0a\uff0c\u8003\u8651\u6700\u5c0f\u5316\u4efb\u52a1\u5b8c\u6210\u65f6\u95f4\u3001\u63d0\u9ad8\u4efb\u52a1\u516c\u5e73\u6027\u3001\u6700\u5927\u5316\u541e\u5410\u91cf\u7b49\u591a\u76ee\u6807\uff0c\u8bbe\u8ba1\u591a\u76ee\u6807\u4f18\u5316\u7b97\u6cd5\uff0c\u627e\u5230\u6700\u4f18\u7684\u8c03\u5ea6\u7b56\u7565\u3002<\/p>\n\n\n\n<p>\u81ea\u52a8\u5316\u8c03\u5ea6\uff1a \u5229\u7528\u673a\u5668\u5b66\u4e60\u6216\u5f3a\u5316\u5b66\u4e60\u6280\u672f\uff0c\u5b9e\u73b0\u81ea\u52a8\u5316\u8c03\u5ea6\uff0c\u51cf\u5c11\u4eba\u5de5\u5e72\u9884\uff0c\u63d0\u9ad8\u8c03\u5ea6\u6548\u7387\uff0c\u5e76\u66f4\u597d\u5730\u9002\u5e94\u52a8\u6001\u53d8\u5316\u7684 DLT \u96c6\u7fa4\u73af\u5883\u3002<\/p>\n\n\n\n<p><strong>\u4e2a\u4eba\u603b\u7ed3\uff1a<\/strong><strong><\/strong><\/p>\n\n\n\n<p>Crux\u5f15\u5165<span style=\"background: #ff0;\">GPU\u5f3a\u5ea6<\/span>\u6982\u5ff5\uff08\u5373\u7279\u5f02\u6027\u6307\u6807\uff09\u8861\u91cf\u4f5c\u4e1a\u5bf9GPU\u5229\u7528\u7387\u7684\u5f71\u54cd\u3002\u5b83\u4eec\u7684\u8c03\u5ea6\u51b3\u7b56\u5e76\u975e\u662f\u4f20\u7edf\u7684\u57fa\u4e8e\u5355\u4e2a\u4f5c\u4e1a\u7684\u6d41\u91cf\u6a21\u5f0f\uff0c\u8003\u8651\u4e86\u4e0d\u540c\u4f5c\u4e1a\u95f4\u7684\u7ade\u4e89\u3002\u4f7f\u7528GPU\u5f3a\u5ea6\uff0cCrux\u4e3a\u4e0d\u540c\u7684\u4f5c\u4e1a\u9009\u62e9\u8def\u5f84\u5e76\u5206\u914d\u4f18\u5148\u7ea7\uff0c\u4ee5\u51cf\u8f7b\u901a\u4fe1\u4e89\u7528\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Crux:GPU\u9ad8\u6548\u7684\u6df1\u5ea6\u5b66\u4e60\u8bad\u7ec3\u901a\u4fe1\u8c03\u5ea6 \u4f5c\u8005\uff1a\u963f\u91cc\u4e91\u56e2\u961f Author\uff1aJiamin Cao, Yu Guan, Kun Qian, Jiaqi Gao, Wencong Xiao, Jianbo Dong\uff0cBinzhang Fu, Dennis Cai, Ennan Zhai \u8bba\u6587\u6458\u8981\u539f\u6587\uff1aDeep learning training (DLT), e.g., large language model (LLM)training,has become one of th &hellip; <a href=\"https:\/\/www.ndnlab.com\/?p=316\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":317,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-316","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-weilaiwangluo"],"_links":{"self":[{"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/posts\/316","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=316"}],"version-history":[{"count":2,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/posts\/316\/revisions"}],"predecessor-version":[{"id":331,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/posts\/316\/revisions\/331"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=\/wp\/v2\/media\/317"}],"wp:attachment":[{"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=316"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=316"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ndnlab.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}