5. Environment Classes¶
5.1. Environment Base Class Reference¶
-
class
rlgraph.environments.environment.
Environment
(state_space, action_space, seed=None)[source]¶ Bases:
rlgraph.utils.specifiable.Specifiable
An Env class used to run experiment-based RL.
-
reset
()[source]¶ Resets the state of the environment, returning an initial observation.
- Returns:
- tuple: The Env’s state after the reset.
-
seed
(seed=None)[source]¶ Sets the random seed of the environment to the given value.
- Args:
- seed (int): The seed to use (default: current epoch seconds).
- Returns:
- int: The seed actually used.
-
step
(**kwargs)[source]¶ Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.
- Args:
- kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this
- Environment’s action_space (a call to self.action_space.contains(action) must return True)
- Returns:
- tuple:
- The state s’ after(!) executing the given actions(s).
- The reward received after taking a in s.
- Whether s’ is a terminal state.
- Some Environment specific info.
-
5.2. Random Environment¶
-
class
rlgraph.environments.random_env.
RandomEnv
(state_space, action_space, reward_space=None, terminal_prob=0.1, deterministic=False)[source]¶ Bases:
rlgraph.environments.environment.Environment
An Env producing random states no matter what actions come in.
-
reset
()[source]¶ Resets the state of the environment, returning an initial observation.
- Returns:
- tuple: The Env’s state after the reset.
-
seed
(seed=None)[source]¶ Sets the random seed of the environment to the given value.
- Args:
- seed (int): The seed to use (default: current epoch seconds).
- Returns:
- int: The seed actually used.
-
step
(actions=None)[source]¶ Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.
- Args:
- kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this
- Environment’s action_space (a call to self.action_space.contains(action) must return True)
- Returns:
- tuple:
- The state s’ after(!) executing the given actions(s).
- The reward received after taking a in s.
- Whether s’ is a terminal state.
- Some Environment specific info.
-
5.3. GridWorld Environments¶
-
class
rlgraph.environments.grid_world.
GridWorld
(world='4x4', save_mode=False, reward_function='sparse', state_representation='discr')[source]¶ Bases:
rlgraph.environments.environment.Environment
A classic grid world where the action space is up,down,left,right and the field types are: ‘S’ : starting point ‘ ‘ : free space ‘W’ : wall (blocks) ‘H’ : hole (terminates episode) (to be replaced by W in save-mode) ‘F’ : fire (usually causing negative reward) ‘G’ : goal state (terminates episode) TODO: Create an option to introduce a continuous action space.
-
MAPS
= {'16x16': ['S H ', ' HH ', ' FF W W', ' W ', 'WWW FF H ', ' W ', ' FFFF W ', ' H H ', ' H ', ' H HH ', 'WWWW WWWWWWW', ' H W W ', ' FF W H W ', 'WWWW WW W ', ' FF W ', ' H H G'], '2x2': ['SH', ' G'], '4x4': ['S ', ' H H', ' H', 'H G'], '8x16': ['S H ', ' H HH ', ' FF WWWWWWW', ' H W ', ' FF W H ', ' W ', ' FF W ', ' H H G'], '8x8': ['S ', ' ', ' H ', ' H ', ' H ', ' HH H ', ' H H H ', ' H G'], 'chain': ['G S F G']}¶
-
get_discrete_pos
(x, y)[source]¶ Returns a single, discrete int-value. Calculated by walking down the rows of the grid first (starting in upper left corner), then along the col-axis.
- Args:
- x (int): The x-coordinate. y (int): The y-coordinate.
- Returns:
- int: The discrete pos value corresponding to the given x and y.
-
get_possible_next_positions
(discrete_pos, action)[source]¶ Given a discrete position value and an action, returns a list of possible next states and their probabilities. Only next states with non-zero probabilities will be returned. For now: Implemented as a deterministic MDP.
- Args:
- discrete_pos (int): The discrete position to return possible next states for. action (int): The action choice.
- Returns:
- List[Tuple[int,float]]: A list of tuples (s’, p(s’|s,a)). Where s’ is the next discrete position and
- p(s’|s,a) is the probability of ending up in that position when in state s and taking action a.
-
reset
(randomize=False)[source]¶ - Args:
- randomize (bool): Whether to start the new episode in a random position (instead of “S”).
- This could be an empty space (” “), the default start (“S”) or a fire field (“F”).
-
seed
(seed=None)[source]¶ Sets the random seed of the environment to the given value.
- Args:
- seed (int): The seed to use (default: current epoch seconds).
- Returns:
- int: The seed actually used.
-
step
(actions, set_discrete_pos=None)[source]¶ Action map: 0: up 1: right 2: down 3: left
- Args:
- actions (int): An integer 0-3 that describes the next action. set_discrete_pos (Optional[int]): An integer to set the current discrete position to before acting.
- Returns:
- tuple: State Space (Space), reward (float), is_terminal (bool), info (usually None).
-
x
¶
-
y
¶
-
5.4. OpenAI Gym Environments¶
-
class
rlgraph.environments.openai_gym.
OpenAIGymEnv
(gym_env, frameskip=None, max_num_noops=0, noop_action=0, episodic_life=False, fire_reset=False, monitor=None, monitor_safe=False, monitor_video=0, visualize=False, **kwargs)[source]¶ Bases:
rlgraph.environments.environment.Environment
OpenAI Gym adapter for RLgraph: https://gym.openai.com/.
-
reset
()[source]¶ Resets the state of the environment, returning an initial observation.
- Returns:
- tuple: The Env’s state after the reset.
-
seed
(seed=None)[source]¶ Sets the random seed of the environment to the given value.
- Args:
- seed (int): The seed to use (default: current epoch seconds).
- Returns:
- int: The seed actually used.
-
step
(actions)[source]¶ Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.
- Args:
- kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this
- Environment’s action_space (a call to self.action_space.contains(action) must return True)
- Returns:
- tuple:
- The state s’ after(!) executing the given actions(s).
- The reward received after taking a in s.
- Whether s’ is a terminal state.
- Some Environment specific info.
-