softmax

paddle. softmax ( input: Tensor, dim: int | None = None, dtype: DTypeLike | None = None, *, out: Tensor | None = None ) Tensor [source]

This operator implements the compat.softmax. The calculation process is as follows:

  1. The dimension dim of input will be permuted to the last.

2. Then input will be logically flattened to a 2-D matrix. The matrix’s second dimension(row length) is the same as the dimension axis of input, and the first dimension(column length) is the product of all other dimensions of input. For each row of the matrix, the softmax operator squashes the K-dimensional(K is the width of the matrix, which is also the size of input’s dimension dim) vector of arbitrary real values to a K-dimensional vector of real values in the range [0, 1] that add up to 1.

3. After the softmax operation is completed, the inverse operations of steps 1 and 2 are performed to restore the two-dimensional matrix to the same dimension as the input .

It computes the exponential of the given dimension and the sum of exponential values of all the other dimensions in the K-dimensional vector input. Then the ratio of the exponential of the given dimension and the sum of exponential values of all the other dimensions is the output of the softmax operator.

For each row \(i\) and each column \(j\) in the matrix, we have:

\[softmax[i, j] = \frac{\exp(input[i, j])}{\sum_j(exp(input[i, j])}\]

Example:

Case 1:
  Input:
    input.shape = [2, 3, 4]
    input.data = [[[2.0, 3.0, 4.0, 5.0],
               [3.0, 4.0, 5.0, 6.0],
               [7.0, 8.0, 8.0, 9.0]],
              [[1.0, 2.0, 3.0, 4.0],
               [5.0, 6.0, 7.0, 8.0],
               [6.0, 7.0, 8.0, 9.0]]]

  Attrs:
    dim = -1

  Output:
    out.shape = [2, 3, 4]
    out.data = [[[0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.07232949, 0.19661193, 0.19661193, 0.53444665]],
                [[0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.0320586 , 0.08714432, 0.23688282, 0.64391426],
                 [0.0320586 , 0.08714432, 0.23688282, 0.64391426]]]

Case 2:
  Input:
    input.shape = [2, 3, 4]
    input.data = [[[2.0, 3.0, 4.0, 5.0],
               [3.0, 4.0, 5.0, 6.0],
               [7.0, 8.0, 8.0, 9.0]],
              [[1.0, 2.0, 3.0, 4.0],
               [5.0, 6.0, 7.0, 8.0],
               [6.0, 7.0, 8.0, 9.0]]]
  Attrs:
    dim = 1

  Output:
    out.shape = [2, 3, 4]
    out.data = [[[0.00657326, 0.00657326, 0.01714783, 0.01714783],
                 [0.01786798, 0.01786798, 0.04661262, 0.04661262],
                 [0.97555875, 0.97555875, 0.93623955, 0.93623955]],
                [[0.00490169, 0.00490169, 0.00490169, 0.00490169],
                 [0.26762315, 0.26762315, 0.26762315, 0.26762315],
                 [0.72747516, 0.72747516, 0.72747516, 0.72747516]]]
Parameters
  • input (Tensor) – The input Tensor with data type bfloat16, float16, float32, float64.

  • dim (int, optional) – The dim along which to perform softmax calculations. It should be in range [-D, D), where D is the rank of input . If dim < 0, it works the same way as \(dim + D\) . Default is None.

  • dtype (str, optional) – The data type of the output tensor, can be bfloat16, float16, float32, float64.

  • out (Tensor, optional) – The output Tensor.

Returns

A Tensor with the same shape and data type (use dtype if it is specified) as input.

Examples

>>> import paddle

>>> x = paddle.to_tensor([[[2.0, 3.0, 4.0, 5.0],
...                        [3.0, 4.0, 5.0, 6.0],
...                        [7.0, 8.0, 8.0, 9.0]],
...                       [[1.0, 2.0, 3.0, 4.0],
...                        [5.0, 6.0, 7.0, 8.0],
...                        [6.0, 7.0, 8.0, 9.0]]],dtype='float32')
>>> out1 = paddle.compat.softmax(x, -1)
>>> out2 = paddle.compat.softmax(x, -1, dtype='float64')
>>> #out1's data type is float32; out2's data type is float64
>>> #out1 and out2's value is as follows:
>>> print(out1)
>>> print(out2)
Tensor(shape=[2, 3, 4], dtype=float32, place=Place(cpu), stop_gradient=True,
[[[0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.07232949, 0.19661194, 0.19661194, 0.53444666]],
 [[0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.03205860, 0.08714432, 0.23688284, 0.64391428],
  [0.03205860, 0.08714432, 0.23688284, 0.64391428]]])
Tensor(shape=[2, 3, 4], dtype=float64, place=Place(cpu), stop_gradient=True,
[[[0.03205860, 0.08714432, 0.23688282, 0.64391426],
  [0.03205860, 0.08714432, 0.23688282, 0.64391426],
  [0.07232949, 0.19661193, 0.19661193, 0.53444665]],
 [[0.03205860, 0.08714432, 0.23688282, 0.64391426],
  [0.03205860, 0.08714432, 0.23688282, 0.64391426],
  [0.03205860, 0.08714432, 0.23688282, 0.64391426]]])