5. Environment Classes¶

5.1. Environment Base Class Reference¶

class rlgraph.environments.environment.Environment(state_space, action_space, seed=None)[source]¶

Bases: rlgraph.utils.specifiable.Specifiable

An Env class used to run experiment-based RL.

render()[source]¶: Should render the Environment in its current state. May be implemented or not.

reset()[source]¶

Resets the state of the environment, returning an initial observation.

Returns:: tuple: The Env’s state after the reset.

seed(seed=None)[source]¶

Sets the random seed of the environment to the given value.

Args:: seed (int): The seed to use (default: current epoch seconds).
Returns:: int: The seed actually used.

step(**kwargs)[source]¶

Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.

Args:

kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this: Environment’s action_space (a call to self.action_space.contains(action) must return True)

Returns:

tuple:

The state s’ after(!) executing the given actions(s).
The reward received after taking a in s.
Whether s’ is a terminal state.
Some Environment specific info.

terminate()[source]¶: Clean up operation. May be implemented or not.

5.2. Random Environment¶

class rlgraph.environments.random_env.RandomEnv(state_space, action_space, reward_space=None, terminal_prob=0.1, deterministic=False)[source]¶

Bases: rlgraph.environments.environment.Environment

An Env producing random states no matter what actions come in.

reset()[source]¶

Resets the state of the environment, returning an initial observation.

Returns:: tuple: The Env’s state after the reset.

reset_for_env_stepper()[source]¶

seed(seed=None)[source]¶

Sets the random seed of the environment to the given value.

Args:: seed (int): The seed to use (default: current epoch seconds).
Returns:: int: The seed actually used.

step(actions=None)[source]¶

Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.

Args:

kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this: Environment’s action_space (a call to self.action_space.contains(action) must return True)

Returns:

tuple:

The state s’ after(!) executing the given actions(s).
The reward received after taking a in s.
Whether s’ is a terminal state.
Some Environment specific info.

step_for_env_stepper(actions=None)[source]¶

5.3. GridWorld Environments¶

class rlgraph.environments.grid_world.GridWorld(world='4x4', save_mode=False, reward_function='sparse', state_representation='discr')[source]¶

Bases: rlgraph.environments.environment.Environment

A classic grid world where the action space is up,down,left,right and the field types are: ‘S’ : starting point ‘ ‘ : free space ‘W’ : wall (blocks) ‘H’ : hole (terminates episode) (to be replaced by W in save-mode) ‘F’ : fire (usually causing negative reward) ‘G’ : goal state (terminates episode) TODO: Create an option to introduce a continuous action space.

MAPS = {'16x16': ['S H ', ' HH ', ' FF W W', ' W ', 'WWW FF H ', ' W ', ' FFFF W ', ' H H ', ' H ', ' H HH ', 'WWWW WWWWWWW', ' H W W ', ' FF W H W ', 'WWWW WW W ', ' FF W ', ' H H G'], '2x2': ['SH', ' G'], '4x4': ['S ', ' H H', ' H', 'H G'], '8x16': ['S H ', ' H HH ', ' FF WWWWWWW', ' H W ', ' FF W H ', ' W ', ' FF W ', ' H H G'], '8x8': ['S ', ' ', ' H ', ' H ', ' H ', ' HH H ', ' H H H ', ' H G'], 'chain': ['G S F G']}¶

get_discrete_pos(x, y)[source]¶

Returns a single, discrete int-value. Calculated by walking down the rows of the grid first (starting in upper left corner), then along the col-axis.

Args:: x (int): The x-coordinate. y (int): The y-coordinate.
Returns:: int: The discrete pos value corresponding to the given x and y.

get_dist_to_goal()[source]¶

get_possible_next_positions(discrete_pos, action)[source]¶

Given a discrete position value and an action, returns a list of possible next states and their probabilities. Only next states with non-zero probabilities will be returned. For now: Implemented as a deterministic MDP.

Args:

discrete_pos (int): The discrete position to return possible next states for. action (int): The action choice.

Returns:

List[Tuple[int,float]]: A list of tuples (s’, p(s’|s,a)). Where s’ is the next discrete position and: p(s’|s,a) is the probability of ending up in that position when in state s and taking action a.

refresh_state()[source]¶

render()[source]¶: Should render the Environment in its current state. May be implemented or not.

reset(randomize=False)[source]¶

Args:

randomize (bool): Whether to start the new episode in a random position (instead of “S”).: This could be an empty space (” “), the default start (“S”) or a fire field (“F”).

seed(seed=None)[source]¶

Sets the random seed of the environment to the given value.

Args:: seed (int): The seed to use (default: current epoch seconds).
Returns:: int: The seed actually used.

step(actions, set_discrete_pos=None)[source]¶

Action map: 0: up 1: right 2: down 3: left

Args:: actions (int): An integer 0-3 that describes the next action. set_discrete_pos (Optional[int]): An integer to set the current discrete position to before acting.
Returns:: tuple: State Space (Space), reward (float), is_terminal (bool), info (usually None).

update_cam_pixels()[source]¶

x¶

y¶

5.4. OpenAI Gym Environments¶

class rlgraph.environments.openai_gym.OpenAIGymEnv(gym_env, frameskip=None, max_num_noops=0, noop_action=0, episodic_life=False, fire_reset=False, monitor=None, monitor_safe=False, monitor_video=0, visualize=False, **kwargs)[source]¶

Bases: rlgraph.environments.environment.Environment

OpenAI Gym adapter for RLgraph: https://gym.openai.com/.

episodic_reset()[source]¶

noop_reset()[source]¶: Steps through reset and warm-start.

render()[source]¶: Should render the Environment in its current state. May be implemented or not.

reset()[source]¶

Resets the state of the environment, returning an initial observation.

Returns:: tuple: The Env’s state after the reset.

reset_for_env_stepper()[source]¶

seed(seed=None)[source]¶

Sets the random seed of the environment to the given value.

Args:: seed (int): The seed to use (default: current epoch seconds).
Returns:: int: The seed actually used.

step(actions)[source]¶

Run one time step of the environment’s dynamics. When the end of an episode is reached, reset() should be called to reset the environment’s internal state.

Args:

kwargs (any): The action(s) to be executed by the environment. Actions have to be members of this: Environment’s action_space (a call to self.action_space.contains(action) must return True)

Returns:

tuple:

The state s’ after(!) executing the given actions(s).
The reward received after taking a in s.
Whether s’ is a terminal state.
Some Environment specific info.

step_for_env_stepper(actions)[source]¶

terminate()[source]¶: Clean up operation. May be implemented or not.

static translate_space(space, dtype=None)[source]¶

Translates openAI spaces into RLGraph Space classes.

Args:: space (gym.spaces.Space): The openAI Space to be translated.
Returns:: Space: The translated rlgraph Space.

Table Of Contents

Previous topic

This Page

5. Environment Classes¶

5.1. Environment Base Class Reference¶

5.2. Random Environment¶

5.3. GridWorld Environments¶

5.4. OpenAI Gym Environments¶

5.5. DeepMind Lab Environments¶