当前位置:网站首页>For high performance, large scale model training, this combination "career"
For high performance, large scale model training, this combination "career"
2022-08-05 12:00:00 【Microsoft Technology Stack】
In recent years, large-scale deep learning models based on transformers trained on large amounts of data have achieved good results in multiple cognitive tasks, and have been used behind some new products and functions to further enhance humanability.These models have grown in size by orders of magnitude over the past five years.From the millions of parameters of the original transformer model all the way up to the latest 530 billion parameter Megatron-Turing (MT-NLG 530B) model (shown), customer demand for training and fine-tuning large models at unprecedented scalegetting stronger.

Large model and hardware capability panorama
Azure Machine Learning (AzureML) brings a host of state-of-the-art GPUs powered by InfiniBand interconnects for large-scale AI training.We have trained Megatron/Turing and GPT-3 models on Azure.Previously, in order to train these models, users needed to set up and maintain a complex distributed training infrastructure, often involving several manual steps that were prone to errors, resulting in a poor experience in terms of usability and performance.
Today, we are proud to announce a breakthrough in our software stack - using DeepSpeed and the 1024 A100 to scale the training of 2T parametric models and deliver a streamlined user experience on 1K+ GPU scale.We'll bring you these software innovations through AzureML, including the fully optimized PyTorch environment, which provides great performance and an easy-to-use interface for training at scale.
As shown in the figure below, Microsoft is adopting a full-stack optimization approach, in which hardware, operating system, VM image, Docker image (with optimized PyTorch, DeepSpeed, ONNX runtime and other Python packages), user-facing Azure MLAPIs are optimized, integrated and tested for excellent performance and scalability.

Microsoft's full-stack optimization for scalable distributed training on Azure
This optimized stack enables us to efficiently scale the training of large models using DeepSpeed on Azure.We support 2x larger model sizes (2 trillion vs. 1 trillion parameters), scale up to 2x GPUs (1024 vs. 512), and up to 1.8x compute compared to other cloud vendors' published dataThroughput/GPU (150 TFLOPs vs. 81 TFLOPs).
Follow Microsoft Developer MSDNfor more info
边栏推荐
- 60行从零开始自己动手写FutureTask是什么体验?
- D-Desthiobiotin-PEG4-Maleimide主要物理性质特点 !
- 尚硅谷-JUC篇
- 高泽龙出席博鳌全球旅游生态大会 讲元宇宙与未来网络科技
- 莅临GOPS大会龙智展位,获取Forrester最新报告:《Forrester Wave:2021年第四季度企业服务管理报告》
- 碘乙酰胺在Desthiobiotin-Iodoacetamide试剂中的作用?
- Zhihu asks: Can China still achieve great national rejuvenation?
- hdu1455 Sticks(搜索+剪枝+剪枝+.....+剪枝)
- treeselect common function record (with a callback function for clearing options)
- 二:OpenCV图片叠加逻辑运算
猜你喜欢

后缀自动机(SAM)——黑盒使用方案

Food and Beverage Industry B2B Mall System: Accelerate the digital transformation of the industry and improve the transaction efficiency of the B2B platform

高泽龙出席博鳌全球旅游生态大会 讲元宇宙与未来网络科技

2022 CCF International AIOps Challenge Finals and AIOps Seminar Registration Open

isn't it?Is there anyone who can't locate the slow query problem of MySQL online?

Apache APISIX Ingress v1.5-rc1 released

对于聚合物聚乙二醇PEG大家了解多少了?以及在生活中的应用
![[供应链·案例篇]疫情影响下的全球十大零售商都做了些什么](/img/44/9ef9f86f8afb85f49aac1cce55723d.jpg)
[供应链·案例篇]疫情影响下的全球十大零售商都做了些什么
The importance of parameter naming, remember a JDBC parameter conflict
The principle and application scenario of mysql master-slave synchronization
随机推荐
内存问题难定位,那是因为你没用ASAN
正则表达式实战
Hands-on Deep Learning_GoogLeNet / Inceptionv1v2v3v4
2022.08.01_每日一题
Gray value and thermal imaging understanding
STM32H743IIT6学习笔记03——使用第三方组件FreeRTOS
Cesium.js点线面绘制
Grid Infrastructure Installation Fails with Error
2022 CCF国际AIOps挑战赛决赛暨AIOps研讨会报名已开启
莅临GOPS大会龙智展位,获取Forrester最新报告:《Forrester Wave:2021年第四季度企业服务管理报告》
Monthly observation of Internet medical field in June 2022
动手学深度学习_GoogLeNet / Inceptionv1v2v3v4
2022.08.03_每日一题
163_技巧_Power BI 一键批量建立自定义字段参数
hello world、hello 计科人
hdu1455 Sticks(搜索+剪枝+剪枝+.....+剪枝)
790. 数的三次方根
Digital-intelligent supply chain system in the household appliance industry: efficiently integrate the supply chain and enhance the core competitiveness of household appliance enterprises
Object中的方法
2021 RoboCom 世界机器人开发者大赛-高职组(决赛)