Differentially private (DP) optimization is the standard paradigm to learn large neural networks that are accurate and privacy-preserving. The computational cost for dp deep learning, however, is notoriously heavy due to the per-sample gradient clipping. Existing DP implementations are