Megatron-LM scales up neural networks

Very large models can be quite difficult to train due to memory constraints. With Megatron-LM, they can achieve training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters.So this approach does not require a new compiler or library changes.

Links
https://arxiv.org/abs/1909.08053