moe_gate_dispatch_and_quant

paddle.incubate.nn.functional. moe_gate_dispatch_and_quant ( x: Tensor, gate_logits: Tensor, corr_bias: Tensor, k: int, capacity: int, use_pad: bool, use_pow2_scale: bool ) tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor] [source]
Parameters
  • x – Input tensor [batch_size, seq_len, hidden_dim].

  • gate_logits – Gate logits for choosing experts [batch_size, seq_len, num_experts].

  • corr_bias – Optional correction bias to adjust gate logits.

  • k – Top-k experts to be selected.

  • capacity – The maximum number of tokens an expert can handle.

  • use_pad – Boolean indicating if padding is used for uniform input length.

  • use_pow2_scale – Boolean indicating if power-of-two scaling is applied for quantization.

Returns

  • fp8_out: Processed output tensor in FP8 format.

  • scale: Scaling factors used during processing.

  • combine_weights: Weights for combining experts’ outputs.

  • scatter_index: Indices for scattering inputs to experts.

  • expert_offset: Offset indices for each expert.

  • expert_id: IDs of selected experts for each position.

Return type

Tuple of Tensors containing