Observation, Action, and Reward

Observation

The complete set of possible Observation returned by SMARTS environment is shown below.

Observation	Type	Remarks
dt	float	Amount of simulation time the last step took.
step_count	int	Number of steps taken by SMARTS thus far in the current scenario.
steps_completed	int	Number of steps this agent has taken within SMARTS.
elapsed_sim_time	float	Amount of simulation time elapsed for the current scenario.
events	`Events`	Classified observations that can trigger agent done status.
ego_vehicle_state	`EgoVehicleObservation`	Ego vehicle status.
under_this_agent_control	bool	Whether this agent currently has control of the vehicle.
neighborhood_vehicle_states	Optional[List[`VehicleObservation`]]	List of neighborhood vehicle states.
waypoint_paths	Optional[List[List[`Waypoint`]]]	Dynamic evenly-spaced points on the road ahead of the vehicle.
distance_travelled	float	Road distance driven by the vehicle.
lidar_point_cloud		Lidar point cloud consisting of [points, hits, (ray_origin, ray_vector)].
drivable_area_grid_map	Optional[`DrivableAreaGridMap`]	Drivable area map.
occupancy_grid_map	Optional[`OccupancyGridMap`]	Occupancy map.
top_down_rgb	Optional[`TopDownRGB`]	RGB camera observation.
road_waypoints	Optional[`RoadWaypoints`]	Per-road waypoints information.
via_data	`Vias`	Listing of nearby collectible ViaPoints and ViaPoints collected in the last step.
signals	Optional[List[`SignalObservation`]]	List of nearby traffic signal (light) states on this timestep.

Note

The occupancy_grid_map is recommended when using scenarios with pedestrians. A higher resolution is preferable to ensure pedestrians are visible.

Note

Some observations like occupancy_grid_map, drivable_area_grid_map, and top_down_rgb, require the installation of optional packages for image rendering, so install them via pip install -e .[camera-obs].

Reward

The default reward from SMARTS environment is a function of the distance travelled. Here, x is the distance travelled in meters from the last time step where a non-zero reward was given.

\[\begin{split}\begin{equation} reward(x)= \begin{cases} x, & \text{if $\|x\|>0.5$}\\ 0, & \text{otherwise} \end{cases} \end{equation}\end{split}\]

Action

Prior to a simulation, an agent’s action type and its policy to provide compliant actions, can be configured via its agent specification instance of AgentSpec. Refer to Agent for details.

An agent can be configured to emit any one of the following action types from ActionSpaceType.

Tip

Depending on the agent’s policy, ActuatorDynamic action type might allow the agent to learn faster than Continuous action type because learning to correct steering could be simpler than learning a mapping to all the absolute steering angle values.