moe_gate_dispatch_and_quant

paddle.incubate.nn.functional. moe_gate_dispatch_and_quant ( x: Tensor, gate_logits: Tensor, corr_bias: Tensor, k: int, capacity: int, use_pad: bool, use_pow2_scale: bool ) → tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor] [source]

Parameters

x – Input tensor [batch_size, seq_len, hidden_dim].
gate_logits – Gate logits for choosing experts [batch_size, seq_len, num_experts].
corr_bias – Optional correction bias to adjust gate logits.
k – Top-k experts to be selected.
capacity – The maximum number of tokens an expert can handle.
use_pad – Boolean indicating if padding is used for uniform input length.
use_pow2_scale – Boolean indicating if power-of-two scaling is applied for quantization.

Returns

fp8_out: Processed output tensor in FP8 format.
scale: Scaling factors used during processing.
combine_weights: Weights for combining experts’ outputs.
scatter_index: Indices for scattering inputs to experts.
expert_offset: Offset indices for each expert.
expert_id: IDs of selected experts for each position.

Return type

Tuple of Tensors containing