moe_gate_dispatch_permute

paddle.incubate.nn.functional. moe_gate_dispatch_permute ( x: Tensor, gate_logits: Tensor, corr_bias: Tensor, k: int, capacity: int, world_size: int, name: str | None = None ) tuple[Tensor, Tensor, Tensor, Tensor, Tensor] [source]

Dispatch and permute for Mixture of Experts (MoE).

Parameters
  • x – Input tensor [batch_size, seq_len, hidden_dim].

  • gate_logits – Gate logits for choosing experts [batch_size, seq_len, num_experts].

  • corr_bias – Optional correction bias to adjust gate logits.

  • k – Top-k experts to be selected.

  • capacity – The maximum number of tokens an expert can handle.

  • world_size – Number of distributed processes.

  • name – Optional name for the operation.

Returns

  • y: Output tensor after dispatch and permute.

  • combine_weights: Weights for combining experts’ outputs.

  • scatter_index: Indices for scattering inputs to experts.

  • expert_offset: Offset indices for each expert.

  • expert_id: IDs of selected experts for each position.

Return type

Tuple of Tensors containing