grl.datasets

QGPOD4RLDataset

class grl.datasets.QGPOD4RLDataset(env_id, device=None)[source]
Overview:

Dataset for QGPO algorithm. The training of QGPO algorithm is based on contrastive energy prediction, which needs true action and fake action. The true action is sampled from the dataset, and the fake action is sampled from the action support generated by the behaviour policy.

Interface:

__init__, __getitem__, __len__.

__init__(env_id, device=None)[source]
Overview:

Initialization method of QGPOD4RLDataset class

Parameters:
  • env_id (str) – The environment id

  • device (str) – The device to store the dataset

QGPODataset

class grl.datasets.QGPODataset[source]
Overview:

Dataset for QGPO algorithm. The training of QGPO algorithm is based on contrastive energy prediction, which needs true action and fake action. The true action is sampled from the dataset, and the fake action is sampled from the action support generated by the behaviour policy.

Interface:

__init__, __getitem__, __len__.

__init__()[source]
Overview:

Initialization method of QGPOD4RLDataset class

GPD4RLDataset

class grl.datasets.GPD4RLDataset(env_id, device=None)[source]
Overview:

D4RL Dataset for Generative Policy algorithm.

Interface:

__init__, __getitem__, __len__.

__init__(env_id, device=None)[source]
Overview:

Initialization method of GPD4RLDataset class

Parameters:
  • env_id (str) – The environment id

  • device (str) – The device to store the dataset

GPDataset

class grl.datasets.GPDataset[source]
Overview:

Dataset for Generative Policy algorithm. The training of Generative Policy algorithm sometimes needs true action and fake action. The true action is sampled from the dataset, and the fake action is sampled from the behaviour policy, which is data augmentation.

Interface:

__init__, __getitem__, __len__.

__init__()[source]
Overview:

Initialization method of GPD4RLDataset class