TL;DR: Apply multi-scale granularity in SCM attention, achieving a 9.7x speedup.


How it works

Turbo4DGen identifies and removes redundant computations at multi-scale granularity in the SCM attention mechanism – i.e., token, block, and chain levels – with the designed rolling cache and adaptive bypassing mechanism, while maintaining high generation quality. To the best of our knowledge, this is the first systematic framework for accelerating 4D generation.

A diagram explaining the method in broad strokes, like explained in the caption.




Performance and efficiency

The latency, speedup, and peak memory evaluation are presented, where "" indicates OOM errors, and F and V denote the number of generated frames and views, respectively. Methods marked with “*” are not directly applicable to 4D generation, and their code is modified for experimental evaluation.

A diagram explaining the method in broad strokes, like explained in the caption.




🎥 Novel View Video Synthesis Results (4D)





Method overview

A diagram explaining the method in broad strokes, like explained in the caption.
During the 4D generation process, Turbo4DGen adopts a multi-level acceleration scheme across denoising steps. Specifically, the attention-level computations for the noise latent Zt-1 are skipped by reusing attention outputs from the rolling cache. Then, we identify token-level redundancy to prune the computations for the camera and motion blocks in the next timestamp. Furthermore, as the denoising process proceeds, the entire SCM chain is adaptively bypassed for further acceleration.



                    
                    @article{2025turbo4dgen,
                        title={Turbo4DGen: Ultra-Fast Acceleration for 4D Generation},
                        author={Yuanbin Man and Ying Huang and Zhile Ren and Miao Yin},
                        journal={arXiv},
                        year={2025}
                    }