grl.neural_network¶

ConcatenateLayer¶

class grl.neural_network.ConcatenateLayer[source]¶

Overview:: Concatenate the input tensors along the last dimension.
Interface:: __init__, forward

__init__()[source]¶

Overview:: Initiate the concatenate layer.

forward(*x)[source]¶

Overview:: Return the concatenated tensor.

Parameters:: x (-) – The input tensor.
Return type:: Tensor

MultiLayerPerceptron¶

class grl.neural_network.MultiLayerPerceptron(hidden_sizes, output_size, activation, dropout=None, layernorm=False, final_activation=None, scale=None, shrink=None)[source]¶

Overview:: Multi-layer perceptron using fully-connected layers with activation, dropout, and layernorm. x -> fc1 -> act1 -> dropout -> layernorm -> … -> fcn -> actn -> out
Interface:: __init__, forward

__init__(hidden_sizes, output_size, activation, dropout=None, layernorm=False, final_activation=None, scale=None, shrink=None)[source]¶

Overview:: Initiate the multi-layer perceptron.

Parameters:

hidden_sizes (-) – The list of hidden sizes.
output_size (-) – The number of channels in the output tensor.
activation (-) – The optional activation function.
dropout (-) – Probability of an element to be zeroed in the dropout. Default is None.
layernorm (-) – Whether to use layernorm in the fully-connected block. Default is False.
final_activation (-) – The optional activation function in the final layer. Default is None.
scale (-) – The scale of the output tensor. Default is None.
shrink (-) – The shrinkage factor of the output tensor. Default is None.

forward(x)[source]¶

Overview:: Return the output of the multi-layer perceptron.

Parameters:: x (-) – The input tensor.
Return type:: Tensor

ConcatenateMLP¶

class grl.neural_network.ConcatenateMLP(**kwargs)[source]¶

Overview:: Concatenate the input tensors along the last dimension and then pass through a multi-layer perceptron.
Interface:: __init__, forward

__init__(**kwargs)[source]¶

Overview:: Initiate the concatenate MLP.

Parameters:

**kwargs (-) –

The keyword arguments for the multi-layer perceptron.

forward(*x)[source]¶

Overview:: Return the output of the concatenate MLP.

Parameters:: x (-) – The input tensor.

TemporalSpatialResidualNet¶

class grl.neural_network.TemporalSpatialResidualNet(hidden_sizes, output_dim, t_dim, input_dim=None, condition_dim=None, condition_hidden_dim=None, t_condition_hidden_dim=None)[source]¶

Overview:: Temporal Spatial Residual Network using multiple TemporalSpatialResBlock.
Interface:: __init__, forward

__init__(hidden_sizes, output_dim, t_dim, input_dim=None, condition_dim=None, condition_hidden_dim=None, t_condition_hidden_dim=None)[source]¶

Overview:: Initiate the temporal spatial residual network.

Parameters:

hidden_sizes (-) – The list of hidden sizes.
output_dim (-) – The number of channels in the output tensor.
t_dim (-) – The dimension of the temporal input.
condition_dim (-) – The number of channels in the condition tensor. Default is None.
condition_hidden_dim (-) – The number of channels in the hidden condition tensor. Default is None.
t_condition_hidden_dim (-) – The number of channels in the hidden temporal condition tensor. Default is None.

forward(t, x, condition=None)[source]¶

Overview:: Return the output of the temporal spatial residual network.

Parameters:

t (-) – The temporal input tensor.
x (-) – The input tensor.
condition (-) – The condition tensor. Default is None.

Return type:

Tensor

DiT¶

class grl.neural_network.DiT(input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, num_classes=1000, learn_sigma=True, condition=True)[source]¶

Overview:: Diffusion model with a Transformer backbone. This is the official implementation of Github repo: https://github.com/facebookresearch/DiT/blob/main/models.py
Interfaces:: __init__, forward

__init__(input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, num_classes=1000, learn_sigma=True, condition=True)[source]¶

Overview:: Initialize the DiT model.

Parameters:

input_size (int, defaults to 32) – The input size.
patch_size (int, defaults to 2) – The patch size.
in_channels (int, defaults to 4) – The number of input channels.
hidden_size (int, defaults to 1152) – The hidden size.
depth (int, defaults to 28) – The depth.
num_heads (int, defaults to 16) – The number of attention heads.
mlp_ratio (float, defaults to 4.0) – The hidden size of the MLP with respect to the hidden size of Attention.
class_dropout_prob (float, defaults to 0.1) – The class dropout probability.
num_classes (int, defaults to 1000) – The number of classes.
learn_sigma (bool, defaults to True) – Whether to learn sigma.

forward(t, x, condition=None)[source]¶

Overview:: Forward pass of DiT.

Parameters:

t (torch.Tensor) – Tensor of diffusion timesteps.
x (torch.Tensor) – Tensor of spatial inputs (images or latent representations of images).
condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

forward_with_cfg(t, x, condition=None, cfg_scale=1.0)[source]¶

Overview:: Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance.

Parameters:

t (torch.Tensor) – Tensor of diffusion timesteps.
x (torch.Tensor) – Tensor of spatial inputs (images or latent representations of images).
condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.
cfg_scale (float, defaults to 1.0) – The scale for classifier-free guidance.

initialize_weights()[source]¶

Overview:: Initialize the weights of the model.

unpatchify(x)[source]¶

Overview:: Unpatchify the input tensor.

Parameters:: x (torch.Tensor) – The input tensor.
Returns:: The output tensor.
Return type:: imgs (torch.Tensor)

Shapes:: x (torch.Tensor): (N, T, patch_size**2 * C) imgs (torch.Tensor): (N, H, W, C)

DiT1D¶

class grl.neural_network.DiT1D(token_size, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, condition_embedder=None)[source]¶

Overview:: Transformer backbone for Diffusion model for 1D data.
Interfaces:: __init__, forward

__init__(token_size, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, condition_embedder=None)[source]¶

Overview:: Initialize the DiT model.

Parameters:

in_channels (Union[int, List[int], Tuple[int]]) – The number of input channels, defaults to 4.
hidden_size (int) – The hidden size of attention layer, defaults to 1152.
depth (int) – The depth of transformer, defaults to 28.
num_heads (int) – The number of attention heads, defaults to 16.
mlp_ratio (float) – The hidden size of the MLP with respect to the hidden size of Attention, defaults to 4.0.

forward(t, x, condition=None)[source]¶

Overview:: Forward pass of DiT for 3D data.

Parameters:

t (torch.Tensor) – Tensor of diffusion timesteps.
x (torch.Tensor) – Tensor of inputs with spatial information (originally at t=0 it is tensor of videos or latent representations of videos).
condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

initialize_weights()[source]¶

Overview:: Initialize the weights of the model.

DiT2D¶

grl.neural_network.DiT2D¶: alias of DiT

DiT3D¶

class grl.neural_network.DiT3D(patch_block_size=[10, 32, 32], patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, learn_sigma=True, convolved=False)[source]¶

Overview:: Transformer backbone for Diffusion model for data of 3D shape.
Interfaces:: __init__, forward

__init__(patch_block_size=[10, 32, 32], patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, learn_sigma=True, convolved=False)[source]¶

Overview:: Initialize the DiT model.

Parameters:

patch_block_size (Union[List[int], Tuple[int]]) – The size of patch block, defaults to [10, 32, 32].
patch_size (Union[int, List[int], Tuple[int]]) – The patch size of each token in attention layer, defaults to 2.
in_channels (Union[int, List[int], Tuple[int]]) – The number of input channels, defaults to 4.
hidden_size (int) – The hidden size of attention layer, defaults to 1152.
depth (int) – The depth of transformer, defaults to 28.
num_heads (int) – The number of attention heads, defaults to 16.
mlp_ratio (float) – The hidden size of the MLP with respect to the hidden size of Attention, defaults to 4.0.
learn_sigma (bool) – Whether to learn sigma, defaults to True.
convolved (bool) – Whether to use fully connected layer for all channels, defaults to False.

forward(t, x, condition=None)[source]¶

Overview:: Forward pass of DiT for 3D data.

Parameters:

t (torch.Tensor) – Tensor of diffusion timesteps.
x (torch.Tensor) – Tensor of inputs with spatial information (originally at t=0 it is tensor of videos or latent representations of videos).
condition (Union[torch.Tensor, TensorDict], optional) – The input condition, such as class labels.

initialize_weights()[source]¶

Overview:: Initialize the weights of the model.

unpatchify(x)[source]¶

Overview:: Unpatchify the output tensor of attention layer.

Parameters:: x (torch.Tensor) – The input tensor of shape (N, total_patches = T’ * H’ * W’, patch_size[0] * patch_size[1] * patch_size[2] * C)
Returns:: The output tensor of shape (N, T, C, H, W).
Return type:: x (torch.Tensor)