visual_backbone

class Conv2d(*args, **kwargs)[source]

Bases: paddle.nn.layer.conv.Conv2D

forward(x)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class CNNBlockBase(in_channels, out_channels, stride)[source]

Bases: paddle.fluid.dygraph.layers.Layer

ResNetBlockBase

alias of paddlenlp.transformers.layoutxlm.visual_backbone.CNNBlockBase

class ShapeSpec(channels=None, height=None, width=None, stride=None)[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone._ShapeSpec

get_norm(norm, out_channels)[source]
Parameters
  • norm (str or callable) – either one of BN, SyncBN, FrozenBN, GN; or a callable that takes a channel number and returns the normalization layer as a nn.Layer.

  • out_channels (int) – out_channels

Returns

the normalization layer

Return type

nn.Layer or None

class FrozenBatchNorm(num_channels)[source]

Bases: paddle.fluid.dygraph.nn.BatchNorm

class Backbone[source]

Bases: paddle.fluid.dygraph.layers.Layer

abstract forward(*args)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class BasicBlock(in_channels, out_channels, *, stride=1, norm='BN')[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone.CNNBlockBase

The basic residual block for ResNet-18 and ResNet-34 defined in :paper:`ResNet`, with two 3x3 conv layers and a projection shortcut if needed.

class BottleneckBlock(in_channels, out_channels, *, bottleneck_channels, stride=1, num_groups=1, norm='BN', stride_in_1x1=False, dilation=1)[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone.CNNBlockBase

The standard bottleneck residual block used by ResNet-50, 101 and 152 defined in :paper:`ResNet`. It contains 3 conv layers with kernels 1x1, 3x3, 1x1, and a projection shortcut if needed.

forward(x)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class DeformBottleneckBlock(in_channels, out_channels, *, bottleneck_channels, stride=1, num_groups=1, norm='BN', stride_in_1x1=False, dilation=1, deform_modulated=False, deform_num_groups=1)[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone.CNNBlockBase

Similar to BottleneckBlock, but with :paper:`deformable conv <deformconv>` in the 3x3 convolution.

class BasicStem(in_channels=3, out_channels=64, norm='BN')[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone.CNNBlockBase

The standard ResNet stem (layers before the first residual block), with a conv, relu and max_pool.

forward(x)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class ResNet(stem, stages, num_classes=None, out_features=None, freeze_at=0)[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone.Backbone

forward(x)[source]
Parameters

x – Tensor of shape (N,C,H,W). H, W must be a multiple of self.size_divisibility.

Returns

names and the corresponding features

Return type

dict[str->Tensor]

static make_stage(block_class, num_blocks, *, in_channels, out_channels, **kwargs)[source]

Create a list of blocks of the same type that forms one ResNet stage.

Parameters
  • block_class (type) – a subclass of CNNBlockBase that’s used to create all blocks in this stage. A module of this type must not change spatial resolution of inputs unless its stride != 1.

  • num_blocks (int) – number of blocks in this stage

  • in_channels (int) – input channels of the entire stage.

  • out_channels (int) – output channels of every block in the stage.

  • kwargs – other arguments passed to the constructor of block_class. If the argument name is “xx_per_block”, the argument is a list of values to be passed to each block in the stage. Otherwise, the same argument is passed to every block in the stage.

Returns

a list of block module.

Return type

list[CNNBlockBase]

Examples:

stage = ResNet.make_stage(
    BottleneckBlock, 3, in_channels=16, out_channels=64,
    bottleneck_channels=16, num_groups=1,
    stride_per_block=[2, 1, 1],
    dilations_per_block=[1, 1, 2]
)

Usually, layers that produce the same feature map spatial size are defined as one “stage” (in :paper:`FPN`). Under such definition, stride_per_block[1:] should all be 1.

static make_default_stages(depth, block_class=None, **kwargs)[source]

Created list of ResNet stages from pre-defined depth (one of 18, 34, 50, 101, 152). If it doesn’t create the ResNet variant you need, please use make_stage() instead for fine-grained customization.

Parameters
  • depth (int) – depth of ResNet

  • block_class (type) – the CNN block class. Has to accept bottleneck_channels argument for depth > 50. By default it is BasicBlock or BottleneckBlock, based on the depth.

  • kwargs – other arguments to pass to make_stage. Should not contain stride and channels, as they are predefined for each depth.

Returns

modules in all stages; see arguments of

ResNet.__init__.

Return type

list[list[CNNBlockBase]]

class LastLevelMaxPool[source]

Bases: paddle.fluid.dygraph.layers.Layer

This module is used in the original FPN to generate a downsampled P6 feature from P5.

forward(x)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class FPN(bottom_up, in_features, out_channels, norm='', top_block=None, fuse_type='sum')[source]

Bases: paddlenlp.transformers.layoutxlm.visual_backbone.Backbone

forward(x)[source]
Parameters

x (dict[str->Tensor]) – mapping feature map name (e.g., “res5”) to feature map tensor for each feature level in high to low resolution order.

Returns

mapping from feature map name to FPN feature map tensor in high to low resolution order. Returned feature names follow the FPN paper convention: “p<stage>”, where stage has stride = 2 ** stage e.g., [“p2”, “p3”, …, “p6”].

Return type

dict[str->Tensor]

make_stage(*args, **kwargs)[source]

Deprecated alias for backward compatibiltiy.

build_resnet_backbone(cfg, input_shape=None)[source]

Create a ResNet instance from config.

Returns

a ResNet instance.

Return type

ResNet

class VisualBackbone(config)[source]

Bases: paddle.fluid.dygraph.layers.Layer

forward(images)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments