ConvParallel

class paddle.distributed. ConvParallel [source]

A strategy for enabling spatial parallelism on paddle.nn.Conv2D layers by sharding the input tensor along its Width (W) dimension.

When this ConvParallel configuration is applied to a Conv2D layer, the layer’s input tensor will have its width dimension split across devices in the model parallel group. This can help reduce memory usage from activations, especially when dealing with inputs that have a large width.

To enable width-wise input sharding correctly, make sure your Conv2D layer satisfies the following conditions along the width dimension:

Dilation must be set to 1.
If no width padding is used:
- The input width must be evenly divisible by the stride width.
- The stride width must be equal to the kernel width.
If width padding is used:
- The stride width must be 1.
- The total input width must be at least half the kernel width.

Examples

           >>> import paddle
>>> import paddle.nn as nn
>>> import paddle.distributed as dist

>>> class SimpleConvNet(nn.Layer):
...     def __init__(self, data_format="NCHW"):
...         super().__init__()
...         self.conv1 = nn.Conv2D(
...             3, 8, kernel_size=3, padding=1, data_format=data_format
...         )
...         self.relu = nn.ReLU()
...     def forward(self, x):
...         x = self.conv1(x)
...         return self.relu(x)
...
>>> 
>>> model = SimpleConvNet(data_format="NCHW")
>>> mp_config = {
...    "parallelize_plan": {
...        "conv1": dist.ConvParallel()
...     }
... }