We publish practical research on workload optimization and orchestration across heterogeneous
GPUs. Find papers, reproducible benchmarks, grants, and media coverage

Peer-reviewed papers and preprints on efficient training, model offloading, inference latency, and GPU schedking.
Highly-efficient billion-scale AI models training and inference using affordable GPUs
ZeRO-Offload and Sentinel for transformers
DyNN-Offload for Mixture-of-Experts (MoE)
TECO-Offload on disaggregated memory
Billion-scale graph neural network
AI training based on parallelism management
Runtime Concurrency Control and Operation Scheduling
Tree structure-aware high performance inference engine
Decentralized AI Computing Operating System for Accessible and Cost-Effective AI


Coverage of our research, open-source and product releases.