Toggle navigation
Team
Research
Publications
About PI
Turbo4DGen
GaussianSpa
1
Spatial AI & Computing
[ICML'26]
Turbo4DGen: Ultra-Fast Acceleration for 4D Generation
[ICML'26]
WhisperSplat: Lossless Steganography in 3D Gaussian Splatting
[Preprint'26]
Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training
[CVPR'25]
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
[NeurIPS'23]
GraphMP: Graph Neural Network-based Motion Planning with Efficient Graph Search
[IROS'22]
Robot Motion Planning as Video Prediction: A Spatio-Temporal Neural Network-based Motion Planner
2
AI for Scientific Computing & Data Management
[Preprint'25]
FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression
[HPDC'25]
Advancing Scientific Data Compression via Cross-Field Prediction
[ICS'25]
NeurLZ: An Online Neural Learning-based Method to Enhance Scientific Lossy Compression
[DRBSD@SC'24]
Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications
[FlexScience@HPDC'24]
GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
3
Multimodal Media Intelligence
[CVPR'25 Highlight]
AdaCM2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
4
AI/ML Systems & Hardware
[ASPLOS'26]
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
[NeurIPS'24]
Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices
[ASPLOS'24]
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
[ISCA'23]
ETTE: Efficient Tensor-Train-based Computing Engine for Deep Neural Networks
[PPoPP'23]
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition
[TC'22]
Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition
[ICCAD'19]
(Invited Paper) High-performance Hardware Architecture for Tensor Singular Value Decomposition
5
AI/ML Model Optimization, Compression, and Adaptation
[ICLR'26]
LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
[Preprint'25]
AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition
[EMNLP'24 Findings]
MoE-I2: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
[ICML'23]
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
[AAAI'23]
CSTAR: Towards Compact and Structured Deep Neural Networks with Adversarial Robustness
[AAAI'23]
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer
[CVPR'22]
HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks
[AAAI'22]
BATUDE: Budget-Aware Neural Network Compression Based on Tucker Decomposition
[NeurIPS'21]
CHIP: CHannel Independence based Pruning for Compact Neural Networks
[CVPR'21]
Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework
[CVPR'21]
Towards Extremely Compact RNNs for Video Recognition with Fully Decomposed Hierarchical Tucker Structure