deepbots.supervisor.wrappers

class deepbots.supervisor.wrappers.keyboard_printer.KeyboardPrinter(*args: Any, **kwargs: Any)[source]

Bases: DeepbotsSupervisorEnv

On each timestep, the agent chooses an action for the previous observation, state_t, and the environment returns the next observation, state_t+1, the reward and whether the episode is done or not.

Each of the values returned is produced by implementations of other abstract methods defined below.

observation: The next observation from the environment reward: The amount of reward awarded on this step is_done: Whether the episode is done info: Diagnostic information mostly useful for debugging

Parameters: action – The agent’s action
Returns: tuple, (observation, reward, is_done, info)

is_done()[source]

Used to inform the agent that the problem is solved.

This method is use-case specific and needs to be implemented by the user.

Returns: bool, True if the episode is done

get_observations()[source]

Return the observations of the robot. For example, metrics from sensors, a camera image, etc.

This method is use-case specific and needs to be implemented by the user.

Returns: An object of observations

get_reward(action)[source]

Calculates and returns the reward for this step.

This method is use-case specific and needs to be implemented by the user.

Parameters: action – The agent’s action
Returns: The amount of reward awarded on this step

get_info()[source]: This method can be implemented to return any diagnostic information on each step, e.g. for debugging purposes.

reset()[source]

Used to reset the world to an initial state.

Default, problem-agnostic, implementation of reset method, using Webots-provided methods.

*Note that this works properly only with Webots versions >R2020b and must be overridden with a custom reset method when using earlier versions. It is backwards compatible due to the fact that the new reset method gets overridden by whatever the user has previously implemented, so an old supervisor can be migrated easily to use this class.

Returns: default observation provided by get_default_observation()

class deepbots.supervisor.wrappers.tensorboard_wrapper.TensorboardLogger(*args: Any, **kwargs: Any)[source]

Bases: DeepbotsSupervisorEnv

step(action)[source]

On each timestep, the agent chooses an action for the previous observation, state_t, and the environment returns the next observation, state_t+1, the reward and whether the episode is done or not.

Each of the values returned is produced by implementations of other abstract methods defined below.

observation: The next observation from the environment reward: The amount of reward awarded on this step is_done: Whether the episode is done info: Diagnostic information mostly useful for debugging

Parameters: action – The agent’s action
Returns: tuple, (observation, reward, is_done, info)

is_done()[source]

Used to inform the agent that the problem is solved.

This method is use-case specific and needs to be implemented by the user.

Returns: bool, True if the episode is done

get_observations()[source]

Return the observations of the robot. For example, metrics from sensors, a camera image, etc.

This method is use-case specific and needs to be implemented by the user.

Returns: An object of observations

get_reward(action)[source]

Calculates and returns the reward for this step.

This method is use-case specific and needs to be implemented by the user.

Parameters: action – The agent’s action
Returns: The amount of reward awarded on this step

get_info()[source]: This method can be implemented to return any diagnostic information on each step, e.g. for debugging purposes.

reset()[source]

Used to reset the world to an initial state.

Default, problem-agnostic, implementation of reset method, using Webots-provided methods.

*Note that this works properly only with Webots versions >R2020b and must be overridden with a custom reset method when using earlier versions. It is backwards compatible due to the fact that the new reset method gets overridden by whatever the user has previously implemented, so an old supervisor can be migrated easily to use this class.

Returns: default observation provided by get_default_observation()

flush()[source]

close()[source]

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.