deepbots.supervisor.controllers
- class deepbots.supervisor.DeepbotsSupervisorEnv(*args: Any, **kwargs: Any)[source]
Bases:
Supervisor,EnvThis class is the highest class in deepbots class hierarchy, inheriting both the Webots Supervisor controller and the basic gym.Env.
Refer to gym.Env documentation on how to implement a custom gym.Env for additional functionality.
This class contains abstract methods that guide the development process for users that want to implement a simple environment.
This class is not intended for user usage, but to provide a common interface for all provided supervisor classes and make them compatible with reinforcement learning agents that work with the gym interface. Moreover, a problem-agnostic reset method is provided. Please use any of the children supervisor classes to be inherited by your own class, such as the RobotSupervisorEnv class. Nevertheless, advanced users can inherit this class to create their own supervisor classes if they wish.
- step(action)[source]
On each timestep, the agent chooses an action for the previous observation, state_t, and the environment returns the next observation, state_t+1, the reward and whether the episode is done or not.
Each of the values returned is produced by implementations of other abstract methods defined below.
observation: The next observation from the environment reward: The amount of reward awarded on this step is_done: Whether the episode is done info: Diagnostic information mostly useful for debugging
- Parameters
action – The agent’s action
- Returns
tuple, (observation, reward, is_done, info)
- reset()[source]
Used to reset the world to an initial state.
Default, problem-agnostic, implementation of reset method, using Webots-provided methods.
*Note that this works properly only with Webots versions >R2020b and must be overridden with a custom reset method when using earlier versions. It is backwards compatible due to the fact that the new reset method gets overridden by whatever the user has previously implemented, so an old supervisor can be migrated easily to use this class.
- Returns
default observation provided by get_default_observation()
- get_default_observation()[source]
This method should be implemented to return a default/starting observation that is use-case dependant. It is used by the reset implementation above.
- Returns
list-like, contains default agent observation
- get_observations()[source]
Return the observations of the robot. For example, metrics from sensors, a camera image, etc.
This method is use-case specific and needs to be implemented by the user.
- Returns
An object of observations
- get_reward(action)[source]
Calculates and returns the reward for this step.
This method is use-case specific and needs to be implemented by the user.
- Parameters
action – The agent’s action
- Returns
The amount of reward awarded on this step
- class deepbots.supervisor.RobotSupervisorEnv(*args: Any, **kwargs: Any)[source]
Bases:
DeepbotsSupervisorEnvThe RobotSupervisorEnv class implements both a robot controller and a supervisor RL environment, referred to as Robot-Supervisor scheme.
This class can be used when there is no need to separate the Robot from the Supervisor, or the observations of the robot are too big to be packaged in messages, e.g. high resolution images from a camera, that introduce a bottleneck and reduce performance significantly.
Controllers that inherit this method must run on Robot nodes that have supervisor privileges.
The user needs to implement the regular methods for the environment, reward(), get_observations(), get_default_observation, etc., from DeepbotsSupervisorEnv according to their use-case in addition to the method apply_action() introduced here.
apply_action(): (similar to use_message_data() of CSVRobot) This method takes an action argument and translates it to a robot action, e.g. motor speeds. Note that apply_action() is called during step().
- property timestep
Getter of _timestep field. Timestep is defined in milliseconds
- Returns
The timestep of the controller in milliseconds
- step(action)[source]
The basic step method that steps the controller, calls the method that applies the action on the robot and returns the (observations, reward, done, info) object.
- Parameters
action (Defined by the implementation of handle_emitter) – Whatever the use-case uses as an action, e.g. an integer representing discrete actions
- Returns
tuple, (observations, reward, done, info) as provided by the corresponding methods as implemented for the use-case
- apply_action(action)[source]
This method should be implemented to apply whatever actions the action argument contains on the robot, depending on the use-case. This method is called by the step() method which provides the action argument.
For example, if the action argument is in the form of an integer value, 0 could mean the action move forward. In this case, motor speeds should be set here accordingly so the robot moves forward.
- Parameters
action – list, containing action data
- class deepbots.supervisor.EmitterReceiverSupervisorEnv(*args: Any, **kwargs: Any)[source]
Bases:
DeepbotsSupervisorEnvThis is the base class for the emitter - receiver scheme.
Subclasses implement a variety of communication formats such as CSV messages.
- initialize_comms(emitter_name, receiver_name)[source]
Initializes the emitter and receiver devices with the names provided.
- Parameters
emitter_name – The name of the emitter device on the supervisor node
receiver_name – The name of the receiver device on the supervisor node
- Returns
The initialized emitter and receiver references
- step(action)[source]
The basic step method that steps the controller, calls the method that sends the action through the emitter and returns the (observations, reward, done, info) object.
- Parameters
action (Defined by the implementation of handle_emitter) – Whatever the use-case uses as an action, e.g. an integer representing discrete actions
- Returns
(observations, reward, done, info) as provided by the corresponding methods as implemented for the use-case
- handle_emitter(action)[source]
This method is implemented by subclasses depending on the communication format used.
- Parameters
action – The action that is sent through the emitter device to the robot, e.g. an integer representing discrete actions
- handle_receiver()[source]
This method is implemented by subclasses depending on the communication format used.
- property timestep
Getter of _timestep field. Timestep is defined in milliseconds
- Returns
The timestep of the controller in milliseconds
- class deepbots.supervisor.CSVSupervisorEnv(*args: Any, **kwargs: Any)[source]
Bases:
EmitterReceiverSupervisorEnvThis class implements the emitter-receiver scheme using Comma Separated Values.
- handle_emitter(action)[source]
Implementation of the handle_emitter method expecting an iterable with Comma Separated Values (CSV).
- Parameters
action (Iterable, for multiple values the CSV format is required, e.g. [0, 1] for two actions) – Whatever the use-case uses as an action, e.g. an integer representing discrete actions