deepbots.supervisor.controllers

class deepbots.supervisor.DeepbotsSupervisorEnv(*args: Any, **kwargs: Any)[source]

Bases: Supervisor, Env

This class is the highest class in deepbots class hierarchy, inheriting both the Webots Supervisor controller and the basic gym.Env.

Refer to gym.Env documentation on how to implement a custom gym.Env for additional functionality.

This class contains abstract methods that guide the development process for users that want to implement a simple environment.

This class is not intended for user usage, but to provide a common interface for all provided supervisor classes and make them compatible with reinforcement learning agents that work with the gym interface. Moreover, a problem-agnostic reset method is provided. Please use any of the children supervisor classes to be inherited by your own class, such as the RobotSupervisorEnv class. Nevertheless, advanced users can inherit this class to create their own supervisor classes if they wish.

step(action)[source]

On each timestep, the agent chooses an action for the previous observation, state_t, and the environment returns the next observation, state_t+1, the reward and whether the episode is done or not.

Each of the values returned is produced by implementations of other abstract methods defined below.

observation: The next observation from the environment reward: The amount of reward awarded on this step is_done: Whether the episode is done info: Diagnostic information mostly useful for debugging

Parameters: action – The agent’s action
Returns: tuple, (observation, reward, is_done, info)

reset()[source]

Used to reset the world to an initial state.

Default, problem-agnostic, implementation of reset method, using Webots-provided methods.

*Note that this works properly only with Webots versions >R2020b and must be overridden with a custom reset method when using earlier versions. It is backwards compatible due to the fact that the new reset method gets overridden by whatever the user has previously implemented, so an old supervisor can be migrated easily to use this class.

Returns: default observation provided by get_default_observation()

get_default_observation()[source]

This method should be implemented to return a default/starting observation that is use-case dependant. It is used by the reset implementation above.

Returns: list-like, contains default agent observation

get_observations()[source]

Return the observations of the robot. For example, metrics from sensors, a camera image, etc.

This method is use-case specific and needs to be implemented by the user.

Returns: An object of observations

get_reward(action)[source]

Calculates and returns the reward for this step.

This method is use-case specific and needs to be implemented by the user.

Parameters: action – The agent’s action
Returns: The amount of reward awarded on this step

is_done()[source]

Used to inform the agent that the problem is solved.

This method is use-case specific and needs to be implemented by the user.

Returns: bool, True if the episode is done

get_info()[source]: This method can be implemented to return any diagnostic information on each step, e.g. for debugging purposes.

class deepbots.supervisor.RobotSupervisorEnv(*args: Any, **kwargs: Any)[source]

Bases: DeepbotsSupervisorEnv

The RobotSupervisorEnv class implements both a robot controller and a supervisor RL environment, referred to as Robot-Supervisor scheme.

This class can be used when there is no need to separate the Robot from the Supervisor, or the observations of the robot are too big to be packaged in messages, e.g. high resolution images from a camera, that introduce a bottleneck and reduce performance significantly.

Controllers that inherit this method must run on Robot nodes that have supervisor privileges.

The user needs to implement the regular methods for the environment, reward(), get_observations(), get_default_observation, etc., from DeepbotsSupervisorEnv according to their use-case in addition to the method apply_action() introduced here.

apply_action(): (similar to use_message_data() of CSVRobot) This method takes an action argument and translates it to a robot action, e.g. motor speeds. Note that apply_action() is called during step().

get_timestep()[source]

property timestep

Getter of _timestep field. Timestep is defined in milliseconds

Returns: The timestep of the controller in milliseconds

step(action)[source]

The basic step method that steps the controller, calls the method that applies the action on the robot and returns the (observations, reward, done, info) object.

Parameters: action (Defined by the implementation of handle_emitter) – Whatever the use-case uses as an action, e.g. an integer representing discrete actions
Returns: tuple, (observations, reward, done, info) as provided by the corresponding methods as implemented for the use-case

apply_action(action)[source]

This method should be implemented to apply whatever actions the action argument contains on the robot, depending on the use-case. This method is called by the step() method which provides the action argument.

For example, if the action argument is in the form of an integer value, 0 could mean the action move forward. In this case, motor speeds should be set here accordingly so the robot moves forward.

Parameters: action – list, containing action data

class deepbots.supervisor.EmitterReceiverSupervisorEnv(*args: Any, **kwargs: Any)[source]

Bases: DeepbotsSupervisorEnv

This is the base class for the emitter - receiver scheme.

Subclasses implement a variety of communication formats such as CSV messages.

initialize_comms(emitter_name, receiver_name)[source]

Initializes the emitter and receiver devices with the names provided.

Parameters

emitter_name – The name of the emitter device on the supervisor node
receiver_name – The name of the receiver device on the supervisor node

Returns

The initialized emitter and receiver references

step(action)[source]

The basic step method that steps the controller, calls the method that sends the action through the emitter and returns the (observations, reward, done, info) object.

Parameters: action (Defined by the implementation of handle_emitter) – Whatever the use-case uses as an action, e.g. an integer representing discrete actions
Returns: (observations, reward, done, info) as provided by the corresponding methods as implemented for the use-case

handle_emitter(action)[source]

This method is implemented by subclasses depending on the communication format used.

Parameters: action – The action that is sent through the emitter device to the robot, e.g. an integer representing discrete actions

handle_receiver()[source]: This method is implemented by subclasses depending on the communication format used.

get_timestep()[source]

property timestep

Getter of _timestep field. Timestep is defined in milliseconds

Returns: The timestep of the controller in milliseconds

class deepbots.supervisor.CSVSupervisorEnv(*args: Any, **kwargs: Any)[source]

Bases: EmitterReceiverSupervisorEnv

This class implements the emitter-receiver scheme using Comma Separated Values.

handle_emitter(action)[source]

Implementation of the handle_emitter method expecting an iterable with Comma Separated Values (CSV).

Parameters: action (Iterable, for multiple values the CSV format is required, e.g. [0, 1] for two actions) – Whatever the use-case uses as an action, e.g. an integer representing discrete actions

handle_receiver()[source]

Implementation of the handle_receiver method expecting an iterable with Comma Separated Values (CSV).

Returns: Returns the message received from the robot, returns None if no message is received
Return type: List of string values