Efficient ML & Distributed GPU Orchestration
We publish practical research on workload optimization and orchestration across heterogeneous GPUs. Find papers, reproducible benchmarks, grants, and media coverage
PUBLICATION
Peer-reviewed papers and preprints on efficient training, model offloading, inference latency, and GPU schedking.
Highly-efficient billion-scale AI models training and inference using affordable GPUs
ZeRO-Offload and Sentinel for transformers
DyNN-Offload for Mixture-of-Experts (MoE)
TECO-Offload on disaggregated memory
Billion-scale graph neural network
AI training based on parallelism management
Runtime Concurrency Control and Operation Scheduling
Tree structure-aware high performance inference engine
AI training using novel hardware
Energy-efficient training on GPU-FPGA accelerators
Processing-in-memory for energy-efficient DNN
AWARD
Decentralized AI Computing Operating System for Accessible and Cost-Effective AI
NSF-Symbol