NVIDIA releases NeMo AutoModel for accelerated Transformers fine-tuning
NVIDIA released NeMo AutoModel, a library that accelerates Transformers fine-tuning with a 3.4-3.7x higher training throughput and 29-32% less GPU memory than native Transformers v5. NeMo AutoModel builds on top of Transformers v5, adding Expert Parallelism, DeepEP fused all-to-all dispatch, and TransformerEngine kernels. The library is part of the NVIDIA NeMo framework for building custom generative AI models at scale. It supports a broad set of model families and uses v5's dynamic weight loading for optimizations.