5. Environment Classes

5.1. Environment Base Class Reference

class rlgraph.environments.environment.Environment(state_space, action_space, seed=None)[source]

Bases: rlgraph.utils.specifiable.Specifiable

An Env class used to run experiment-based RL.

render()[source]

Should render the Environment in its current state. May be implemented or not.

reset()[source]

Resets the state of the environment, returning an initial observation.

Returns:
tuple: The Env’s state after the reset.
seed(seed=None)[source]

Sets the random seed of the environment to the given value.

Args:
seed (int): The seed to use (default: current epoch seconds).
Returns:
int: The seed actually used.
step(**kwargs)[source]

Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.

Args:
kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this
Environment’s action_space (a call to self.action_space.contains(action) must return True)
Returns:
tuple:
  • The state s’ after(!) executing the given actions(s).
  • The reward received after taking a in s.
  • Whether s’ is a terminal state.
  • Some Environment specific info.
terminate()[source]

Clean up operation. May be implemented or not.

5.2. Random Environment

class rlgraph.environments.random_env.RandomEnv(state_space, action_space, reward_space=None, terminal_prob=0.1, deterministic=False)[source]

Bases: rlgraph.environments.environment.Environment

An Env producing random states no matter what actions come in.

reset()[source]

Resets the state of the environment, returning an initial observation.

Returns:
tuple: The Env’s state after the reset.
reset_for_env_stepper()[source]
seed(seed=None)[source]

Sets the random seed of the environment to the given value.

Args:
seed (int): The seed to use (default: current epoch seconds).
Returns:
int: The seed actually used.
step(actions=None)[source]

Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.

Args:
kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this
Environment’s action_space (a call to self.action_space.contains(action) must return True)
Returns:
tuple:
  • The state s’ after(!) executing the given actions(s).
  • The reward received after taking a in s.
  • Whether s’ is a terminal state.
  • Some Environment specific info.
step_for_env_stepper(actions=None)[source]

5.3. GridWorld Environments

class rlgraph.environments.grid_world.GridWorld(world='4x4', save_mode=False, reward_function='sparse', state_representation='discr')[source]

Bases: rlgraph.environments.environment.Environment

A classic grid world where the action space is up,down,left,right and the field types are: ‘S’ : starting point ‘ ‘ : free space ‘W’ : wall (blocks) ‘H’ : hole (terminates episode) (to be replaced by W in save-mode) ‘F’ : fire (usually causing negative reward) ‘G’ : goal state (terminates episode) TODO: Create an option to introduce a continuous action space.

MAPS = {'16x16': ['S H ', ' HH ', ' FF W W', ' W ', 'WWW FF H ', ' W ', ' FFFF W ', ' H H ', ' H ', ' H HH ', 'WWWW WWWWWWW', ' H W W ', ' FF W H W ', 'WWWW WW W ', ' FF W ', ' H H G'], '2x2': ['SH', ' G'], '4x4': ['S ', ' H H', ' H', 'H G'], '8x16': ['S H ', ' H HH ', ' FF WWWWWWW', ' H W ', ' FF W H ', ' W ', ' FF W ', ' H H G'], '8x8': ['S ', ' ', ' H ', ' H ', ' H ', ' HH H ', ' H H H ', ' H G'], 'chain': ['G S F G']}
get_discrete_pos(x, y)[source]

Returns a single, discrete int-value. Calculated by walking down the rows of the grid first (starting in upper left corner), then along the col-axis.

Args:
x (int): The x-coordinate. y (int): The y-coordinate.
Returns:
int: The discrete pos value corresponding to the given x and y.
get_dist_to_goal()[source]
get_possible_next_positions(discrete_pos, action)[source]

Given a discrete position value and an action, returns a list of possible next states and their probabilities. Only next states with non-zero probabilities will be returned. For now: Implemented as a deterministic MDP.

Args:
discrete_pos (int): The discrete position to return possible next states for. action (int): The action choice.
Returns:
List[Tuple[int,float]]: A list of tuples (s’, p(s’|s,a)). Where s’ is the next discrete position and
p(s’|s,a) is the probability of ending up in that position when in state s and taking action a.
refresh_state()[source]
render()[source]

Should render the Environment in its current state. May be implemented or not.

reset(randomize=False)[source]
Args:
randomize (bool): Whether to start the new episode in a random position (instead of “S”).
This could be an empty space (” “), the default start (“S”) or a fire field (“F”).
seed(seed=None)[source]

Sets the random seed of the environment to the given value.

Args:
seed (int): The seed to use (default: current epoch seconds).
Returns:
int: The seed actually used.
step(actions, set_discrete_pos=None)[source]

Action map: 0: up 1: right 2: down 3: left

Args:
actions (int): An integer 0-3 that describes the next action. set_discrete_pos (Optional[int]): An integer to set the current discrete position to before acting.
Returns:
tuple: State Space (Space), reward (float), is_terminal (bool), info (usually None).
update_cam_pixels()[source]
x
y

5.4. OpenAI Gym Environments

class rlgraph.environments.openai_gym.OpenAIGymEnv(gym_env, frameskip=None, max_num_noops=0, noop_action=0, episodic_life=False, fire_reset=False, monitor=None, monitor_safe=False, monitor_video=0, visualize=False, **kwargs)[source]

Bases: rlgraph.environments.environment.Environment

OpenAI Gym adapter for RLgraph: https://gym.openai.com/.

episodic_reset()[source]
noop_reset()[source]

Steps through reset and warm-start.

render()[source]

Should render the Environment in its current state. May be implemented or not.

reset()[source]

Resets the state of the environment, returning an initial observation.

Returns:
tuple: The Env’s state after the reset.
reset_for_env_stepper()[source]
seed(seed=None)[source]

Sets the random seed of the environment to the given value.

Args:
seed (int): The seed to use (default: current epoch seconds).
Returns:
int: The seed actually used.
step(actions)[source]

Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.

Args:
kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this
Environment’s action_space (a call to self.action_space.contains(action) must return True)
Returns:
tuple:
  • The state s’ after(!) executing the given actions(s).
  • The reward received after taking a in s.
  • Whether s’ is a terminal state.
  • Some Environment specific info.
step_for_env_stepper(actions)[source]
terminate()[source]

Clean up operation. May be implemented or not.

static translate_space(space, dtype=None)[source]

Translates openAI spaces into RLGraph Space classes.

Args:
space (gym.spaces.Space): The openAI Space to be translated.
Returns:
Space: The translated rlgraph Space.

5.5. DeepMind Lab Environments