moe_gate_dispatch_permute

paddle.incubate.nn.functional. moe_gate_dispatch_permute ( x: Tensor, gate_logits: Tensor, corr_bias: Tensor, k: int, capacity: int, world_size: int, name: str | None = None ) → tuple[Tensor, Tensor, Tensor, Tensor, Tensor] [source]

Dispatch and permute for Mixture of Experts (MoE).

Parameters

x – Input tensor [batch_size, seq_len, hidden_dim].
gate_logits – Gate logits for choosing experts [batch_size, seq_len, num_experts].
corr_bias – Optional correction bias to adjust gate logits.
k – Top-k experts to be selected.
capacity – The maximum number of tokens an expert can handle.
world_size – Number of distributed processes.
name – Optional name for the operation.

Returns

y: Output tensor after dispatch and permute.
combine_weights: Weights for combining experts’ outputs.
scatter_index: Indices for scattering inputs to experts.
expert_offset: Offset indices for each expert.
expert_id: IDs of selected experts for each position.

Return type

Tuple of Tensors containing