grl.neural_network

ConcatenateLayer

class grl.neural_network.ConcatenateLayer[source]
Overview:

Concatenate the input tensors along the last dimension.

Interface:

__init__, forward

__init__()[source]
Overview:

Initiate the concatenate layer.

forward(*x)[source]
Overview:

Return the concatenated tensor.

Parameters:

x (-) – The input tensor.

Return type:

Tensor

MultiLayerPerceptron

class grl.neural_network.MultiLayerPerceptron(hidden_sizes, output_size, activation, dropout=None, layernorm=False, final_activation=None, scale=None, shrink=None)[source]
Overview:

Multi-layer perceptron using fully-connected layers with activation, dropout, and layernorm. x -> fc1 -> act1 -> dropout -> layernorm -> … -> fcn -> actn -> out

Interface:

__init__, forward

__init__(hidden_sizes, output_size, activation, dropout=None, layernorm=False, final_activation=None, scale=None, shrink=None)[source]
Overview:

Initiate the multi-layer perceptron.

Parameters:
  • hidden_sizes (-) – The list of hidden sizes.

  • output_size (-) – The number of channels in the output tensor.

  • activation (-) – The optional activation function.

  • dropout (-) – Probability of an element to be zeroed in the dropout. Default is None.

  • layernorm (-) – Whether to use layernorm in the fully-connected block. Default is False.

  • final_activation (-) – The optional activation function in the final layer. Default is None.

  • scale (-) – The scale of the output tensor. Default is None.

  • shrink (-) – The shrinkage factor of the output tensor. Default is None.

forward(x)[source]
Overview:

Return the output of the multi-layer perceptron.

Parameters:

x (-) – The input tensor.

Return type:

Tensor

ConcatenateMLP

class grl.neural_network.ConcatenateMLP(**kwargs)[source]
Overview:

Concatenate the input tensors along the last dimension and then pass through a multi-layer perceptron.

Interface:

__init__, forward

__init__(**kwargs)[source]
Overview:

Initiate the concatenate MLP.

Parameters:

**kwargs (-) –

The keyword arguments for the multi-layer perceptron.

forward(*x)[source]
Overview:

Return the output of the concatenate MLP.

Parameters:

x (-) – The input tensor.

TemporalSpatialResidualNet

class grl.neural_network.TemporalSpatialResidualNet(hidden_sizes, output_dim, t_dim, input_dim=None, condition_dim=None, condition_hidden_dim=None, t_condition_hidden_dim=None)[source]
Overview:

Temporal Spatial Residual Network using multiple TemporalSpatialResBlock.

Interface:

__init__, forward

__init__(hidden_sizes, output_dim, t_dim, input_dim=None, condition_dim=None, condition_hidden_dim=None, t_condition_hidden_dim=None)[source]
Overview:

Initiate the temporal spatial residual network.

Parameters:
  • hidden_sizes (-) – The list of hidden sizes.

  • output_dim (-) – The number of channels in the output tensor.

  • t_dim (-) – The dimension of the temporal input.

  • condition_dim (-) – The number of channels in the condition tensor. Default is None.

  • condition_hidden_dim (-) – The number of channels in the hidden condition tensor. Default is None.

  • t_condition_hidden_dim (-) – The number of channels in the hidden temporal condition tensor. Default is None.

forward(t, x, condition=None)[source]
Overview:

Return the output of the temporal spatial residual network.

Parameters:
  • t (-) – The temporal input tensor.

  • x (-) – The input tensor.

  • condition (-) – The condition tensor. Default is None.

Return type:

Tensor

DiT

class grl.neural_network.DiT(input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, num_classes=1000, learn_sigma=True)[source]
Overview:

Diffusion model with a Transformer backbone. This is the official implementation of Github repo: https://github.com/facebookresearch/DiT/blob/main/models.py

Interfaces:

__init__, forward

__init__(input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, num_classes=1000, learn_sigma=True)[source]
Overview:

Initialize the DiT model.

Parameters:
  • input_size (int, defaults to 32) – The input size.

  • patch_size (int, defaults to 2) – The patch size.

  • in_channels (int, defaults to 4) – The number of input channels.

  • hidden_size (int, defaults to 1152) – The hidden size.

  • depth (int, defaults to 28) – The depth.

  • num_heads (int, defaults to 16) – The number of attention heads.

  • mlp_ratio (float, defaults to 4.0) – The hidden size of the MLP with respect to the hidden size of Attention.

  • class_dropout_prob (float, defaults to 0.1) – The class dropout probability.

  • num_classes (int, defaults to 1000) – The number of classes.

  • learn_sigma (bool, defaults to True) – Whether to learn sigma.

forward(t, x, condition=None)[source]
Overview:

Forward pass of DiT.

Parameters:
  • t (torch.Tensor) – Tensor of diffusion timesteps.

  • x (torch.Tensor) – Tensor of spatial inputs (images or latent representations of images).

  • condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

forward_with_cfg(t, x, condition=None, cfg_scale=1.0)[source]
Overview:

Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance.

Parameters:
  • t (torch.Tensor) – Tensor of diffusion timesteps.

  • x (torch.Tensor) – Tensor of spatial inputs (images or latent representations of images).

  • condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

  • cfg_scale (float, defaults to 1.0) – The scale for classifier-free guidance.

initialize_weights()[source]
Overview:

Initialize the weights of the model.

unpatchify(x)[source]
Overview:

Unpatchify the input tensor.

Parameters:

x (torch.Tensor) – The input tensor.

Returns:

The output tensor.

Return type:

imgs (torch.Tensor)

Shapes:

x (torch.Tensor): (N, T, patch_size**2 * C) imgs (torch.Tensor): (N, H, W, C)

DiT1D

class grl.neural_network.DiT1D(token_size, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, condition_embedder=None)[source]
Overview:

Transformer backbone for Diffusion model for 1D data.

Interfaces:

__init__, forward

__init__(token_size, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, condition_embedder=None)[source]
Overview:

Initialize the DiT model.

Parameters:
  • in_channels (Union[int, List[int], Tuple[int]]) – The number of input channels, defaults to 4.

  • hidden_size (int) – The hidden size of attention layer, defaults to 1152.

  • depth (int) – The depth of transformer, defaults to 28.

  • num_heads (int) – The number of attention heads, defaults to 16.

  • mlp_ratio (float) – The hidden size of the MLP with respect to the hidden size of Attention, defaults to 4.0.

forward(t, x, condition=None)[source]
Overview:

Forward pass of DiT for 3D data.

Parameters:
  • t (torch.Tensor) – Tensor of diffusion timesteps.

  • x (torch.Tensor) – Tensor of inputs with spatial information (originally at t=0 it is tensor of videos or latent representations of videos).

  • condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

initialize_weights()[source]
Overview:

Initialize the weights of the model.

DiT2D

grl.neural_network.DiT2D

alias of DiT

DiT3D

class grl.neural_network.DiT3D(patch_block_size=[10, 32, 32], patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, learn_sigma=True, convolved=False)[source]
Overview:

Transformer backbone for Diffusion model for data of 3D shape.

Interfaces:

__init__, forward

__init__(patch_block_size=[10, 32, 32], patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, learn_sigma=True, convolved=False)[source]
Overview:

Initialize the DiT model.

Parameters:
  • patch_block_size (Union[List[int], Tuple[int]]) – The size of patch block, defaults to [10, 32, 32].

  • patch_size (Union[int, List[int], Tuple[int]]) – The patch size of each token in attention layer, defaults to 2.

  • in_channels (Union[int, List[int], Tuple[int]]) – The number of input channels, defaults to 4.

  • hidden_size (int) – The hidden size of attention layer, defaults to 1152.

  • depth (int) – The depth of transformer, defaults to 28.

  • num_heads (int) – The number of attention heads, defaults to 16.

  • mlp_ratio (float) – The hidden size of the MLP with respect to the hidden size of Attention, defaults to 4.0.

  • learn_sigma (bool) – Whether to learn sigma, defaults to True.

  • convolved (bool) – Whether to use fully connected layer for all channels, defaults to False.

forward(t, x, condition=None)[source]
Overview:

Forward pass of DiT for 3D data.

Parameters:
  • t (torch.Tensor) – Tensor of diffusion timesteps.

  • x (torch.Tensor) – Tensor of inputs with spatial information (originally at t=0 it is tensor of videos or latent representations of videos).

  • condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

initialize_weights()[source]
Overview:

Initialize the weights of the model.

unpatchify(x)[source]
Overview:

Unpatchify the output tensor of attention layer.

Parameters:

x (torch.Tensor) – The input tensor of shape (N, total_patches = T’ * H’ * W’, patch_size[0] * patch_size[1] * patch_size[2] * C)

Returns:

The output tensor of shape (N, T, C, H, W).

Return type:

x (torch.Tensor)