smarts.env.gymnasium.hiway_env_v1 module

class smarts.env.gymnasium.hiway_env_v1.HiWayEnvV1(*args: Any, **kwargs: Any)[source]

A generic environment for various driving tasks simulated by SMARTS.

Parameters:
  • scenarios (Sequence[str]) – A list of scenario directories that will be simulated.

  • agent_interfaces (Dict[str, AgentInterface]) – Specification of the agents needs that will be used to configure the environment.

  • sim_name (str, optional) – Simulation name. Defaults to None.

  • scenarios_order (ScenarioOrder, optional) – Configures the order of scenarios provided over successive resets. See ScenarioOrder.

  • headless (bool, optional) – If True, disables visualization in Envision. Defaults to False.

  • visdom (bool) – Deprecated. Use SMARTS_VISDOM_ENABLED.

  • fixed_timestep_sec (float, optional) – Step duration for all components of the simulation. May be None if time deltas are externally-driven. Defaults to None.

  • seed (int, optional) – Random number generator seed. Defaults to 42.

  • sumo_options (SumoOptions, Dict[str, Any]) – The configuration for the sumo instance. A dictionary with the fields can be used instead. See SumoOptions.

  • visualization_client_builder – A method that must must construct an object that follows the Envision interface. Allows tapping into a direct data stream from the simulation.

  • observation_options (ObservationOptions, str) – Defines the options for how the formatting matches the observation space. String version can be used instead. See ObservationOptions. Defaults to default.

  • action_options (ActionOptions, str) – Defines the options for how the formatting matches the action space. String version can be used instead. See ActionOptions. Defaults to default.

  • environment_return_mode (EnvReturnMode, str) – This configures between the environment step return information (i.e. reward means the environment reward) and the per-agent step return information (i.e. reward means rewards as key-value per agent). Defaults to per_agent.

action_space: gymnasium.spaces.Space
property agent_ids: Set[str]

Agent ids of all agents that potentially will be in the environment.

Returns:

Agent ids.

Return type:

(Set[str])

property agent_interfaces: Dict[str, AgentInterface]

Agent interfaces used for the environment.

Returns:

Agent interface defining the agents affect on the observation and action spaces

of this environment.

Return type:

(Dict[str, AgentInterface])

close()[source]

After the user has finished using the environment, close contains the code necessary to “clean up” the environment. This is critical for closing rendering windows, database or HTTP connections.

metadata = {'render_modes': ['rgb_array']}

Metadata for gym’s use.

property np_random: Generator

Returns the environment’s internal random number generator that if not set will initialize with a random seed.

Returns:

The internal instance of np.random.Generator.

observation_space: gymnasium.spaces.Space
render() gymnasium.core.RenderFrame | List[gymnasium.core.RenderFrame] | None[source]

Compute the render frames as specified by render_mode during the initialization of the environment. The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialized in __init__.

By convention, if the render_mode is:
  • None (default): no render is computed.

  • human: The environment is continuously rendered in the current display or terminal,

    usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • rgb_array: Return a single frame representing the current state of the environment.

    A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • ansi: Return a strings (str) or StringIO.StringIO containing a

    terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • rgb_array_list and ansi_list: List based version of render modes are possible

    (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

render_mode: str | None = None
reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[gymnasium.core.ObsType, Dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info. This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalized policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. Therefore, reset() should (in the typical use case) be called with a seed right after initialization and then never again.

Parameters:
  • seed (int, optional) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again.

  • options (dict, optional) – Additional information to specify how the environment is reset (optional, depending on the specific environment). Forwards to reset(). - “scenario” (Scenario): An explicit scenario to reset to. The default is a scenario from the scenario iter. - “start_time” (float): Forwards the start time of the current scenario. The default is 0.

Returns:

observation. Observation of the initial state. This will be an element of observation_space

and is analogous to the observation returned by step().

dict: info. This dictionary contains auxiliary information complementing observation. It should be analogous to

the info returned by step().

Return type:

dict

reward_range = (-inf, inf)
property scenario_log: Dict[str, float | str]

Simulation steps log.

Returns:

A dictionary with the following keys.

fixed_timestep_sec - Simulation time-step. scenario_map - Name of the current scenario. scenario_traffic - Traffic spec(s) used. mission_hash - Hash identifier for the current scenario.

Return type:

(Dict[str, Union[float,str]])

property seed

Returns the environment seed.

Returns:

Environment seed.

Return type:

int

property smarts

Gives access to the underlying simulator. Use this carefully.

Returns:

The smarts simulator instance.

Return type:

smarts.core.smarts.SMARTS

spec: gymnasium.envs.registration.EnvSpec | None = None
step(action: gymnasium.core.ActType) Tuple[Dict[str, Any], SupportsFloat, bool, bool, Dict[str, Any]] | Tuple[Dict[str, Any], Dict[str, float], Dict[str, bool], Dict[str, bool], Dict[str, Any]][source]

Run one time-step of the environment’s dynamics using the agent actions.

When the end of an episode is reached (terminated or truncated), it is necessary to call reset() to reset this environment’s state for the next episode.

Parameters:

action (ActType) – an action provided by the agent to update the environment state.

Returns:

  • observation. An element of the environment’s observation_space as the

    next observation due to the agent actions. This observation will change based on the provided agent_interfaces. Check observation_space after initialization.

  • reward. The reward as a result of taking the action.

  • terminated. Whether the agent reaches the terminal state (as defined under the MDP of the task)

    which can be positive or negative. An example is reaching the goal state. If true, the user needs to call reset().

  • truncated. Whether the truncation condition outside the scope of the MDP is satisfied.

    Typically, this is a time-limit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call reset().

  • info. Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).

    This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.

Return type:

(dict, SupportsFloat, bool, bool, dict)

property unwrapped: gymnasium.Env.(gymnasium.core.ObsType, gymnasium.core.ActType)

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

gym.Env