Tensor and Parallel AI training in Colossus AI

Colossus is a ‘colossal’ parallel distrubuted tools collection for big datacenter, which rely heavily on parallelism. Some of the most important characteristics are:

  • Data Parallelism
  • Pipeline Parallelism
  • 1D, 2D, 2.5D, 3D tensor parallelism
  • Sequence parallelism
  • Friendly trainer and engine
  • Extensible for new parallelism
  • Mixed Precision Training
  • Zero Redundancy Optimizer (ZeRO)

This toolset include GPT-3 and Bert, autoregresive models to produce human-like texts.

Links

https://analyticsindiamag.com/a-guide-to-parallel-deep-learning-with-colossal-ai/