Step Distillation#

Step distillation is an important optimization technique in LightX2V. By training distilled models, it significantly reduces inference steps from the original 40-50 steps to 4 steps, dramatically improving inference speed while maintaining video quality. LightX2V implements step distillation along with CFG distillation to further enhance inference speed.

πŸ” Technical Principle#

DMD Distillation#

The core technology of step distillation is DMD Distillation. The DMD distillation framework is shown in the following diagram:

DMD Distillation Framework

The core idea of DMD distillation is to minimize the KL divergence between the output distributions of the distilled model and the original model:

\[\begin{split} \begin{aligned} D_{KL}\left(p_{\text{fake}} \; \| \; p_{\text{real}} \right) &= \mathbb{E}{x\sim p\text{fake}}\left(\log\left(\frac{p_\text{fake}(x)}{p_\text{real}(x)}\right)\right)\\ &= \mathbb{E}{\substack{ z \sim \mathcal{N}(0; \mathbf{I}) \\ x = G_\theta(z) }}-\big(\log~p_\text{real}(x) - \log~p_\text{fake}(x)\big). \end{aligned} \end{split}\]

Since directly computing the probability density is nearly impossible, DMD distillation instead computes the gradient of this KL divergence:

\[\begin{split} \begin{aligned} \nabla_\theta D_{KL} &= \mathbb{E}{\substack{ z \sim \mathcal{N}(0; \mathbf{I}) \\ x = G_\theta(z) } } \Big[- \big( s_\text{real}(x) - s_\text{fake}(x)\big) \hspace{.5mm} \frac{dG}{d\theta} \Big], \end{aligned} \end{split}\]

where \(s_\text{real}(x) =\nabla_{x} \text{log}~p_\text{real}(x)\) and \(s_\text{fake}(x) =\nabla_{x} \text{log}~p_\text{fake}(x)\) are score functions. Score functions can be computed by the model. Therefore, DMD distillation maintains three models in total:

  • real_score, computes the score of the real distribution; since the real distribution is fixed, DMD distillation uses the original model with fixed weights as its score function;

  • fake_score, computes the score of the fake distribution; since the fake distribution is constantly updated, DMD distillation initializes it with the original model and fine-tunes it to learn the output distribution of the generator;

  • generator, the student model, guided by computing the gradient of the KL divergence between real_score and fake_score.

Self-Forcing#

DMD distillation technology is designed for image generation. The step distillation in LightX2V is implemented based on Self-Forcing technology. The overall implementation of Self-Forcing is similar to DMD, but following DMD2, it removes the regression loss and uses ODE initialization instead. Additionally, Self-Forcing adds an important optimization for video generation tasks:

Current DMD distillation-based methods struggle to generate videos in one step. Self-Forcing selects one timestep for optimization each time, with the generator computing gradients only at this step. This approach significantly improves Self-Forcing’s training speed and enhances the denoising quality at intermediate timesteps, also improving its effectiveness.

LightX2V#

Self-Forcing performs step distillation and CFG distillation on 1.3B autoregressive models. LightX2V extends it with a series of enhancements:

  1. Larger Models: Supports step distillation training for 14B models;

  2. More Model Types: Supports standard bidirectional models and I2V model step distillation training;

  3. Better Results: LightX2V uses high-quality prompts from approximately 50,000 data entries for training;

For detailed implementation, refer to Self-Forcing-Plus.

🎯 Technical Features#

  • Inference Acceleration: Reduces inference steps from 40-50 to 4 steps without CFG, achieving approximately 20-24x speedup

  • Quality Preservation: Maintains original video generation quality through distillation techniques

  • Strong Compatibility: Supports both T2V and I2V tasks

  • Flexible Usage: Supports loading complete step distillation models or loading step distillation LoRA on top of native models; compatible with int8/fp8 model quantization

πŸ› οΈ Configuration Files#

Basic Configuration Files#

Multiple configuration options are provided in the configs/distill/ directory:

Configuration File

Purpose

Model Address

wan_t2v_distill_4step_cfg.json

Load T2V 4-step distillation complete model

hugging-face

wan_i2v_distill_4step_cfg.json

Load I2V 4-step distillation complete model

hugging-face

wan_t2v_distill_4step_cfg_lora.json

Load Wan-T2V model and step distillation LoRA

hugging-face

wan_i2v_distill_4step_cfg_lora.json

Load Wan-I2V model and step distillation LoRA

hugging-face

Key Configuration Parameters#

  • Since DMD distillation only trains a few fixed timesteps, we recommend using LCM Scheduler for inference. In WanStepDistillScheduler, LCM Scheduler is already fixed in use, requiring no user configuration.

  • infer_steps, denoising_step_list and sample_shift are set to parameters matching those during training, and are generally not recommended for user modification.

  • enable_cfg must be set to false (equivalent to setting sample_guide_scale = 1), otherwise the video may become completely blurred.

  • lora_configs supports merging multiple LoRAs with different strengths. When lora_configs is not empty, the original Wan2.1 model is loaded by default. Therefore, when using lora_config and wanting to use step distillation, please set the path and strength of the step distillation LoRA.

{
  "infer_steps": 4,                              // Inference steps
  "denoising_step_list": [1000, 750, 500, 250],  // Denoising timestep list
  "sample_shift": 5,                             // Scheduler timestep shift
  "enable_cfg": false,                           // Disable CFG for speed improvement
  "lora_configs": [                              // LoRA weights path (optional)
    {
      "path": "path/to/distill_lora.safetensors",
      "strength": 1.0
    }
  ]
}

πŸ“œ Usage#

Model Preparation#

Complete Model: Place the downloaded model (distill_model.pt or distill_model.safetensors) in the distill_models/ folder under the Wan model root directory:

  • For T2V: Wan2.1-T2V-14B/distill_models/

  • For I2V-480P: Wan2.1-I2V-14B-480P/distill_models/

LoRA:

  1. Place the downloaded LoRA in any location

  2. Modify the lora_path parameter in the configuration file to the LoRA storage path

Inference Scripts#

T2V Complete Model:

bash scripts/wan/run_wan_t2v_distill_4step_cfg.sh

I2V Complete Model:

bash scripts/wan/run_wan_i2v_distill_4step_cfg.sh

Step Distillation LoRA Inference Scripts#

T2V LoRA:

bash scripts/wan/run_wan_t2v_distill_4step_cfg_lora.sh

I2V LoRA:

bash scripts/wan/run_wan_i2v_distill_4step_cfg_lora.sh

πŸ”§ Service Deployment#

Start Distillation Model Service#

Modify the startup command in scripts/server/start_server.sh:

python -m lightx2v.api_server \
  --model_cls wan2.1_distill \
  --task t2v \
  --model_path $model_path \
  --config_json ${lightx2v_path}/configs/distill/wan_t2v_distill_4step_cfg.json \
  --port 8000 \
  --nproc_per_node 1

Run the service startup script:

scripts/server/start_server.sh

For more details, see Service Deployment.

Usage in Gradio Interface#

See Gradio Documentation