PUBLICATION
Highly-efficient billion-scale AI models training and inference using affordable GPUs
ZeRO-Offload and Sentinel for transformers
DyNN-Offload for Mixture-of-Experts (MoE)
TECO-Offload on disaggregated memory
Billion-scale graph neural network
AI training based on parallelism management
Runtime Concurrency Control and Operation Scheduling
Tree structure-aware high performance inference engine
AI training using novel hardware
Energy-efficient training on GPU-FPGA accelerators
Processing-in-memory for energy-efficient DNN