Efficient ML & Distributed GPU
Orchestration

We publish practical research on workload optimization and orchestration across heterogeneous
GPUs. Find papers, reproducible benchmarks, grants, and media coverage

Publications

Peer-reviewed papers and preprints on efficient training, model offloading, inference latency, and GPU schedking.

Highly-efficient billion-scale AI models training and inference using affordable GPUs

ZeRO-Offload and Sentinel for transformers

USENIX ATC’21 HPCA’21

DyNN-Offload for Mixture-of-Experts (MoE)

HPCA’24

TECO-Offload on disaggregated memory

SC’24

Billion-scale graph neural network

ASPLOS’23

AI training based on parallelism management

Runtime Concurrency Control and Operation Scheduling

IPDPS’21

Tree structure-aware high performance inference engine

EuroSys’21

AI training using novel hardware

Energy-efficient training on GPU-FPGA accelerators

ICS’21

Processing-in-memory for energy-efficient DNN

MICRO’18

Award

Decentralized AI Computing Operating System for Accessible and Cost-Effective AI

Media & Press

Coverage of our research, open-source and product releases.

Microsoft And The University Of California, Merced Introduces ZeRO-Offload, A Novel Heterogeneous DeepLearning...

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

What’s New in HPC Research: Galaxies, Fugaku, Electron Microscopes & More

Microsoft Releases AI Training Library ZeRO-3 Offload

Efficient ML & Distributed GPU Orchestration

Publications

Award

Media & Press

Efficient ML & Distributed GPU Orchestration

Publications

Award

Media & Press

Efficient ML & Distributed GPU
Orchestration

Efficient ML & Distributed GPU
Orchestration