smarts.env.gymnasium.hiway_env_v1 module
- class smarts.env.gymnasium.hiway_env_v1.HiWayEnvV1(*args: Any, **kwargs: Any)[source]
A generic environment for various driving tasks simulated by SMARTS.
- Parameters:
scenarios (Sequence[str]) – A list of scenario directories that will be simulated.
agent_interfaces (Dict[str, AgentInterface]) – Specification of the agents needs that will be used to configure the environment.
sim_name (str, optional) – Simulation name. Defaults to None.
scenarios_order (ScenarioOrder, optional) – Configures the order of scenarios provided over successive resets. See
ScenarioOrder
.headless (bool, optional) – If True, disables visualization in Envision. Defaults to False.
visdom (bool) – Deprecated. Use SMARTS_VISDOM_ENABLED.
fixed_timestep_sec (float, optional) – Step duration for all components of the simulation. May be None if time deltas are externally-driven. Defaults to None.
seed (int, optional) – Random number generator seed. Defaults to 42.
sumo_options (SumoOptions, Dict[str, Any]) – The configuration for the sumo instance. A dictionary with the fields can be used instead. See
SumoOptions
.visualization_client_builder – A method that must must construct an object that follows the Envision interface. Allows tapping into a direct data stream from the simulation.
observation_options (ObservationOptions, str) – Defines the options for how the formatting matches the observation space. String version can be used instead. See
ObservationOptions
. Defaults todefault
.action_options (ActionOptions, str) – Defines the options for how the formatting matches the action space. String version can be used instead. See
ActionOptions
. Defaults todefault
.environment_return_mode (EnvReturnMode, str) – This configures between the environment step return information (i.e. reward means the environment reward) and the per-agent step return information (i.e. reward means rewards as key-value per agent). Defaults to
per_agent
.
- action_space: gymnasium.spaces.Space
- property agent_ids: Set[str]
Agent ids of all agents that potentially will be in the environment.
- Returns:
Agent ids.
- Return type:
(Set[str])
- property agent_interfaces: Dict[str, AgentInterface]
Agent interfaces used for the environment.
- Returns:
- Agent interface defining the agents affect on the observation and action spaces
of this environment.
- Return type:
(Dict[str, AgentInterface])
- close()[source]
After the user has finished using the environment, close contains the code necessary to “clean up” the environment. This is critical for closing rendering windows, database or HTTP connections.
- metadata = {'render_modes': ['rgb_array']}
Metadata for gym’s use.
- property np_random: Generator
Returns the environment’s internal random number generator that if not set will initialize with a random seed.
- Returns:
The internal instance of
np.random.Generator
.
- observation_space: gymnasium.spaces.Space
- render() gymnasium.core.RenderFrame | List[gymnasium.core.RenderFrame] | None [source]
Compute the render frames as specified by
render_mode
during the initialization of the environment. The environment’smetadata
render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.Note
As the
render_mode
is known during__init__
, the objects used to render the environment state should be initialized in__init__
.- By convention, if the
render_mode
is: None (default): no render is computed.
- rgb_array: Return a single frame representing the current state of the environment.
A frame is a
np.ndarray
with shape(x, y, 3)
representing RGB values for an x-by-y pixel image.
- ansi: Return a strings (
str
) orStringIO.StringIO
containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
- ansi: Return a strings (
- rgb_array_list and ansi_list: List based version of render modes are possible
(except Human) through the wrapper,
gymnasium.wrappers.RenderCollection
that is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list")
. The frames collected are popped afterrender()
is called orreset()
.
Note
Make sure that your class’s
metadata
"render_modes"
key includes the list of supported modes.- By convention, if the
- render_mode: str | None = None
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[gymnasium.core.ObsType, Dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info. This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalized policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. Therefore,reset()
should (in the typical use case) be called with a seed right after initialization and then never again.- Parameters:
seed (int, optional) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again.options (dict, optional) – Additional information to specify how the environment is reset (optional, depending on the specific environment). Forwards to
reset()
. - “scenario” (Scenario
): An explicit scenario to reset to. The default is a scenario from the scenario iter. - “start_time” (float): Forwards the start time of the current scenario. The default is 0.
- Returns:
- observation. Observation of the initial state. This will be an element of
observation_space
and is analogous to the observation returned by
step()
.- dict: info. This dictionary contains auxiliary information complementing
observation
. It should be analogous to the
info
returned bystep()
.
- observation. Observation of the initial state. This will be an element of
- Return type:
dict
- reward_range = (-inf, inf)
- property scenario_log: Dict[str, float | str]
Simulation steps log.
- Returns:
- A dictionary with the following keys.
fixed_timestep_sec - Simulation time-step. scenario_map - Name of the current scenario. scenario_traffic - Traffic spec(s) used. mission_hash - Hash identifier for the current scenario.
- Return type:
(Dict[str, Union[float,str]])
- property seed
Returns the environment seed.
- Returns:
Environment seed.
- Return type:
int
- property smarts
Gives access to the underlying simulator. Use this carefully.
- Returns:
The smarts simulator instance.
- Return type:
- spec: gymnasium.envs.registration.EnvSpec | None = None
- step(action: gymnasium.core.ActType) Tuple[Dict[str, Any], SupportsFloat, bool, bool, Dict[str, Any]] | Tuple[Dict[str, Any], Dict[str, float], Dict[str, bool], Dict[str, bool], Dict[str, Any]] [source]
Run one time-step of the environment’s dynamics using the agent actions.
When the end of an episode is reached (
terminated or truncated
), it is necessary to callreset()
to reset this environment’s state for the next episode.- Parameters:
action (ActType) – an action provided by the agent to update the environment state.
- Returns:
- observation. An element of the environment’s
observation_space
as the next observation due to the agent actions. This observation will change based on the provided
agent_interfaces
. Checkobservation_space
after initialization.
- observation. An element of the environment’s
reward. The reward as a result of taking the action.
- terminated. Whether the agent reaches the terminal state (as defined under the MDP of the task)
which can be positive or negative. An example is reaching the goal state. If true, the user needs to call
reset()
.
- truncated. Whether the truncation condition outside the scope of the MDP is satisfied.
Typically, this is a time-limit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call
reset()
.
- info. Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.
- Return type:
(dict, SupportsFloat, bool, bool, dict)
- property unwrapped: gymnasium.Env.(gymnasium.core.ObsType, gymnasium.core.ActType)
Returns the base non-wrapped environment.
- Returns:
The base non-wrapped
gymnasium.Env
instance- Return type:
gym.Env