Observation, Action, and Reward

Observation

The complete set of possible Observation returned by SMARTS environment is shown below.

Observation

Type

Remarks

dt

float

Amount of simulation time the last step took.

step_count

int

Number of steps taken by SMARTS thus far in the current scenario.

steps_completed

int

Number of steps this agent has taken within SMARTS.

elapsed_sim_time

float

Amount of simulation time elapsed for the current scenario.

events

Events

Classified observations that can trigger agent done status.

ego_vehicle_state

EgoVehicleObservation

Ego vehicle status.

under_this_agent_control

bool

Whether this agent currently has control of the vehicle.

neighborhood_vehicle_states

Optional[List[VehicleObservation]]

List of neighborhood vehicle states.

waypoint_paths

Optional[List[List[Waypoint]]]

Dynamic evenly-spaced points on the road ahead of the vehicle.

distance_travelled

float

Road distance driven by the vehicle.

lidar_point_cloud

Lidar point cloud consisting of [points, hits, (ray_origin, ray_vector)].

drivable_area_grid_map

Optional[DrivableAreaGridMap]

Drivable area map.

occupancy_grid_map

Optional[OccupancyGridMap]

Occupancy map.

top_down_rgb

Optional[TopDownRGB]

RGB camera observation.

road_waypoints

Optional[RoadWaypoints]

Per-road waypoints information.

via_data

Vias

Listing of nearby collectible ViaPoints and ViaPoints collected in the last step.

signals

Optional[List[SignalObservation]]

List of nearby traffic signal (light) states on this timestep.

Note

The occupancy_grid_map is recommended when using scenarios with pedestrians. A higher resolution is preferable to ensure pedestrians are visible.

Note

Some observations like occupancy_grid_map, drivable_area_grid_map, and top_down_rgb, require the installation of optional packages for image rendering, so install them via pip install -e .[camera-obs].

Reward

The default reward from SMARTS environment is a function of the distance travelled. Here, x is the distance travelled in meters from the last time step where a non-zero reward was given.

\[\begin{split}\begin{equation} reward(x)= \begin{cases} x, & \text{if $\|x\|>0.5$}\\ 0, & \text{otherwise} \end{cases} \end{equation}\end{split}\]

Action

Prior to a simulation, an agent’s action type and its policy to provide compliant actions, can be configured via its agent specification instance of AgentSpec. Refer to Agent for details.

An agent can be configured to emit any one of the following action types from ActionSpaceType.

Tip

Depending on the agent’s policy, ActuatorDynamic action type might allow the agent to learn faster than Continuous action type because learning to correct steering could be simpler than learning a mapping to all the absolute steering angle values.