Full Project Report — AFE

1. Project Overview

This project implements a fully autonomous ground robot system capable of exploring unknown indoor environments, building real-time occupancy maps, and detecting semantic hazards — all without any human teleoperation or pre-loaded maps.

The platform is a TurtleBot4 Lite running ROS 2 Jazzy on Ubuntu 24.04. The robot is equipped with an RPLIDAR A1M8 360° LiDAR for mapping and navigation, an OAK-D spatial AI stereo camera for depth sensing and hazard classification, a Create3 mobile base with onboard sensors (cliff, bump, IMU, wheel encoders), and a Raspberry Pi 4B as the onboard computer.

The complete system consists of five custom ROS 2 nodes communicating over topics and action servers, plus SLAM Toolbox for mapping. The project spans three course milestones over Spring 2026, from initial architecture design to validated empirical benchmarking.

Final result: A working autonomous exploration system that detects frontier cells, plans collision-free paths using A*, and classifies HAZMAT signs, fire, cliffs, potholes, and narrow corridors using a 5-layer sensor fusion pipeline. Running in both Gazebo simulation (maze world) and on real hardware.

2. Motivation & Goals

The core problem we wanted to solve: how can a robot safely enter an unknown dangerous environment and give first responders a complete picture of what's inside before they enter?

Real-world applications include post-disaster search and rescue (collapsed buildings, fire scenes), industrial inspection (chemical plants, confined spaces), and nuclear facility assessment — all environments where human entry carries significant risk before a situation assessment is complete.

Project Goals

Autonomous map-building in unknown environments using SLAM (no pre-loaded map required)
Systematic frontier-based exploration — explore the nearest unknown areas first
Collision-free navigation without Nav2 (custom A* planner)
Semantic hazard detection: HAZMAT signs (13 classes), fire, cliffs, potholes, narrow corridors, glass walls
Real-time RViz2 visualization of map, robot pose, path, and hazard markers
Unified launch for both simulation and real hardware via single flag

3. System Architecture

The system is built as five custom ROS 2 nodes communicating over a well-defined topic interface. The design principle was clear separation of concerns — each node has one job and communicates via standard ROS 2 interfaces.

ROS 2 Topic Graph

Publisher	Topic	Message Type	Subscriber
slam_toolbox	/map	OccupancyGrid	frontier_explorer, navigation_planner
frontier_explorer	/frontier_goals	PoseArray	behavior_coordinator
behavior_coordinator	/navigate_to_pose (action)	NavigateToPose	navigation_planner
navigation_planner	/cmd_vel_unstamped	Twist	Create3 base
semantic_hazard	/hazard_alert	String	behavior_coordinator
semantic_hazard	/exploration/stop	Bool	behavior_coordinator, navigation_planner
semantic_hazard	/hazard_markers	MarkerArray	RViz2
Create3	/hazard_detection	HazardDetectionVector	semantic_hazard
scan_relay	/scan_fixed	LaserScan	slam_toolbox (sim only)

Hardware Setup

Robot: TurtleBot4 Lite — Create3 base + Raspberry Pi 4B + OAK-D camera + RPLIDAR A1M8
Compute (development): Laptop for simulation — NVIDIA RTX 5070 Ti, CUDA 13.1, Ubuntu 24.04
Networking: ROS 2 Jazzy with Zenoh discovery server at 192.168.50.93:11811
Simulation: Gazebo Sim 8 (Harmonic), turtlebot4_gz_bringup, maze world

4. SLAM & Mapping

We use slam_toolbox in async mapping mode. The robot builds a 2D occupancy grid at 0.05m/cell resolution as it moves. The SLAM node processes LiDAR scans and publishes a map→odom TF transform that the entire navigation stack depends on.

Configuration

Key parameters in slam_sim.yaml: scan_topic: /scan_fixed (simulation), resolution: 0.05, map_update_interval: 2.0, transform_timeout: 0.2, tf_buffer_duration: 20.0.

Real vs Simulation Difference

On real hardware, SLAM starts and builds maps immediately — the RPLIDAR A1M8 publishes in rplidar_link frame which matches exactly what SLAM expects. In simulation, the LiDAR publishes in turtlebot4/rplidar_link/rplidar frame (a known Gazebo naming quirk) which required a scan relay node to fix the frame_id before SLAM could process it.

5. Frontier Exploration

Frontier exploration is the core algorithm for autonomous navigation in unknown environments. A frontier is defined as a free cell adjacent to an unknown cell — the boundary between what the robot knows and doesn't know.

Algorithm

The frontier explorer runs every time a new map is received (approximately 1Hz). It scans all free cells, identifies those adjacent to unknown cells (and not adjacent to occupied cells), groups them into clusters via 4-connected flood fill, discards clusters with fewer than 10 cells (noise), and scores each cluster by inverse distance from the robot — nearest first.

Scoring Evolution

We went through two scoring approaches during development:

Original: score = (α × cluster_size + β × unknown_nearby) / (dist + ε) — weighted combination of frontier size, nearby unknown area, and distance. This caused the robot to chase large unknown areas rather than exploring systematically.
Final (nearest-first wavefront): score = 1.0 / (dist + ε) — simply go to the closest frontier. Combined with the proximity lock in the behavior coordinator, this produces systematic nearest-first exploration.

Key insight: Nearest-first scoring is simpler AND better in practice. The robot explores nearby areas completely before moving to distant frontiers, which is exactly what you want in a confined indoor environment.

6. Navigation Planner

We built a complete custom navigation planner as a ROS 2 action server implementing the /navigate_to_pose interface — the same interface Nav2 uses. This means our planner is a drop-in replacement for Nav2 without requiring any Nav2 infrastructure.

Why Not Nav2?

We tried Nav2 initially. The issues: (1) Nav2's costmap inflation was very aggressive and kept marking valid frontier cells as obstacles, (2) Nav2 adds significant complexity and has many parameters to tune, (3) Our custom planner gives us full control over the exact behavior we want.

What We Built

Costmap builder: Converts SLAM OccupancyGrid to a cost map with 3 tiers — free (0), soft inflation zone (10), hard inflation zone (80), wall (1000). Wall cells are dilated by 3cm before inflation to account for sensor noise.
A* path planner: 8-connected grid search with Euclidean heuristic. Costs from the costmap bias the planner away from walls without preventing paths near walls.
Waypoint pruning: Removes intermediate waypoints that have line-of-sight to each other (swept-corridor test accounting for robot half-width 0.20m). Dramatically reduces unnecessary turning.
P-controller drive loop: 20Hz control loop. Aligns to waypoint heading first (in-place rotation), then drives forward while maintaining heading. Progressive slowdown as the robot approaches each waypoint.

Key Parameters (after tuning)

Parameter	Final Value	Notes
ALIGN_THRESHOLD	0.20 rad	Was 0.10 — too tight, caused oscillation
SNAP_SEARCH_RADIUS	20 cells	Was 10 — frontier goals in walls
V_MAX	0.35 m/s	Safe for real hardware
HARD_RADIUS	0.15 m	Robot half-width clearance
NO_DRIVE_TIMEOUT	6.0 s	Abort if stuck turning

7. Semantic Hazard Detection

The hazard classifier is the most complex node. It fuses five sensor layers into a single hazard alert and publishes RViz2 visualization markers.

5-Layer Architecture

L1 — Hardware (Create3): Subscribes to /hazard_detection. Cliff sensors trigger immediate E-stop. Response time <10ms.
L2a — YOLO Fire (YOLOv5): Custom-trained fire_best.pt model. Confidence threshold 0.45. Detects fire from phone screen at 63% confidence. Printed images below threshold.
L2b — HSV Vision: Fixed thresholds for fire (H∈[0,15]∪[165,179], S>150, V>200). Calibrated under ASU ISTB lab fluorescent lighting. Water detection disabled — caused false positives on blue walls.
L3 — Depth Analysis (OAK-D): NaN ratio >50% in floor region → GLASS. Depth drop >0.20m → POTHOLE. Pothole implemented but not yet tested on real hardware.
L4 — LiDAR Sector: Minimum ranges in 3 sectors (front, left, right) after correcting for the +90° RPLIDAR mounting offset in the URDF. NARROW_CORRIDOR if any sector <0.30m.
L5 — DeepHAZMAT: YOLOv3-tiny model with 13 HAZMAT classes (explosive, flammable, toxic, radioactive, etc.). 7M parameters, 15.8 GFLOPs. ~30ms inference on RTX 5070 Ti.

YOLO Worker Thread

A critical design decision: YOLO inference runs in a dedicated background thread with a 1-second cooldown (YOLO_INTERVAL_SEC = 1.0). This prevents YOLO from blocking the main ROS 2 executor. Princess had the old version without this — she experienced a 30-second camera backlog because inference was blocking the main thread.

LiDAR Angle Correction

The RPLIDAR A1M8 is mounted with a +90° rotation offset in the TurtleBot4 Lite URDF (rpy="0 0 π/2"). Without correcting for this, the sector analysis computes wrong angles. Fix: θ_robot = θ_lidar − π/2 before all sector computations.

8. Behavior Coordinator

The behavior coordinator is the "brain" that orchestrates the exploration. It manages the lifecycle from startup to completion, handles goal selection, tracks failures, and implements stability mechanisms.

6-State FSM

States: IDLE → SELECTING → PENDING → NAVIGATING → ARRIVED → DONE. The robot starts in SELECTING, picks the nearest frontier, sends a navigation goal, tracks success/failure, and continues until all frontiers are explored or battery is low.

Proximity Lock (Key Innovation)

Without the proximity lock, the robot would constantly switch targets as the SLAM map updated and new frontiers appeared closer. The lock says: if the robot is within 2 × PROXIMITY_LOCK_RADIUS = 6m of its current goal anchor, ignore all frontier updates. This was changed from 1.0m → 2.0m → 3.0m through testing. At 3.0m, the robot commits to its frontier long enough to actually reach it before the map changes enough to warrant switching.

Frontier Similarity Radius

Frontiers drift as the map updates — the same physical area gets slightly different centroid coordinates each update. Rather than cancelling a goal every time the centroid drifts by a few cells, we check if any current frontier is within 0.50m of the original goal anchor. If yes, update the goal coordinates silently (no cancel/replan).

9. Simulation Debugging

Getting the simulation working took significant effort. Multiple independent bugs needed to be found and fixed before the system worked end-to-end in Gazebo.

Bug 1: LiDAR all readings 0.164m (minimum range)

Fixed

Root cause: Gazebo Sim 8 uses Ogre1 as the default GPU renderer for ray sensors. The TurtleBot4 self-collision geometry caused all 640 LiDAR rays to hit the robot's own mesh at minimum range. This produced a LiDAR that saw nothing but itself.

Fix: Change <render_engine>ogre</render_engine> to ogre2 in /opt/ros/jazzy/share/irobot_create_description/urdf/create3.urdf.xacro. Required sudo to edit the read-only system file. This is upstream GitHub issue #676.

sudo sed -i 's/<render_engine>ogre<\/render_engine>/<render_engine>ogre2<\/render_engine>/' \
  /opt/ros/jazzy/share/irobot_create_description/urdf/create3.urdf.xacro

Bug 2: SLAM frame_id mismatch

Fixed with scan_relay_node

Root cause: Gazebo publishes the LiDAR scan in frame turtlebot4/rplidar_link/rplidar (Gazebo's full entity path naming). SLAM toolbox expected frame rplidar_link. The frame mismatch caused SLAM to silently drop all scan messages.

Fix: Created scan_relay_node.py — a simple ROS 2 node that subscribes to /scan and republishes to /scan_fixed with the corrected frame_id. SLAM configured to use /scan_fixed via slam_sim.yaml.

Bug 3: SLAM scan_topic overwritten by turtlebot4 launch

Fixed by bypassing turtlebot4 SLAM launch

Root cause: The turtlebot4_navigation SLAM launch file uses RewrittenYaml which programmatically overwrites scan_topic to namespace/scan — completely ignoring any custom yaml we passed in.

Fix: Bypass the turtlebot4 SLAM launch entirely. Run slam_toolbox directly with our slam_sim.yaml. Also required manual lifecycle activation: ros2 lifecycle set /slam_toolbox configure && ros2 lifecycle set /slam_toolbox activate.

Bug 4: diffdrive_controller rejecting velocity commands

Fixed

Root cause: The navigation planner uses wall clock time (time.time()) for its control loop, but Gazebo runs on sim time. The diffdrive_controller checks command timestamps and rejects any command older than 0.5 seconds — wall clock timestamps appeared 0.5s+ "old" from the sim clock's perspective.

Fix: ros2 param set /diffdrive_controller cmd_vel_timeout 2.0. Also ensures all custom nodes use use_sim_time:=true.

Bug 5: SLAM lifecycle — stuck at "Node using stack size"

Workaround: manual lifecycle activation

Root cause: When slam_toolbox is launched as an unmanaged node with our direct invocation, it initializes but doesn't auto-configure/activate. It sits in unconfigured state waiting.

Fix: After launching SLAM, manually trigger: ros2 lifecycle set /slam_toolbox configure then ros2 lifecycle set /slam_toolbox activate. The run_exploration.sh script handles this automatically.

Bug 6: Frontier goals in obstacle cells ("surrounded by obstacles")

Fixed with SNAP_SEARCH_RADIUS=20

Root cause: Frontier centroids land at the exact boundary between free and unknown space — which in the costmap often falls within the wall inflation zone. The planner rejected these as impassable.

Fix: Increased SNAP_SEARCH_RADIUS from 10 to 20 cells. The planner now searches up to 20 cells around the goal for the nearest free cell before giving up.

10. Hardware Testing

The system was tested on the real TurtleBot4 Lite in the ASU lab environment. Key observations from hardware runs:

What worked well on hardware

SLAM builds maps immediately — the real RPLIDAR publishes in the correct frame, no relay node needed
DeepHAZMAT detected Explosive 1.1 at 92% confidence and Radioactive I at 99% confidence at ~0.5m range
Cliff sensor E-stop worked reliably — tested on table edge, triggered within <10ms
Navigation planner navigated corridors without bumping walls with the tuned parameters
YOLOv5 fire detection worked at 63% confidence on phone screen video

Issues on hardware

Camera processing backlog (~30s delay) on Princess's machine — fixed by pulling the latest YOLO worker thread version
Marker placement offset — HAZMAT markers placed at fixed 0.70m ahead of robot rather than at actual depth-estimated distance. Known limitation, future fix: use OAK-D depth to project marker position.
WiFi latency causes occasional topic dropouts — Zenoh discovery server at 192.168.50.93:11811 must be reachable

Hardware Run Order

source ~/mobile-robotics-frontier-exploration/open_ros_env.sh
source ~/mobile-robotics-frontier-exploration/ros2_ws/install/setup.bash
# Terminal 1: Launch everything
ros2 launch frontier_exploration_mapping exploration.launch.py use_sim_time:=false
# Terminal 2: Debug monitor
python3 ~/debug_monitor.py

11. Issues Found (Complete List)

Every problem we encountered and how we resolved it:

#	Issue	Root Cause	Resolution
1	LiDAR all 0.164m in sim	Ogre1 renderer + self-collision	Change render_engine to ogre2
2	SLAM not building map (sim)	frame_id mismatch Gazebo vs ROS2	scan_relay_node.py
3	SLAM ignoring custom scan_topic	turtlebot4 launch uses RewrittenYaml	Bypass turtlebot4 SLAM launch, run slam_toolbox directly
4	Robot not moving (sim)	diffdrive_controller timestamp mismatch	cmd_vel_timeout 2.0 + use_sim_time
5	SLAM stuck unconfigured	Lifecycle not auto-activated	Manual lifecycle set configure + activate
6	All frontier goals → "surrounded by obstacles"	Frontiers in wall inflation zone	SNAP_SEARCH_RADIUS = 20 cells
7	Robot oscillating between frontiers	Score changes every map update	PROXIMITY_LOCK_RADIUS = 3.0m
8	ALIGN_THRESHOLD too tight → oscillation	0.10 rad → robot kept rotating	ALIGN_THRESHOLD = 0.20 rad
9	Camera 30s backlog (Princess)	Old YOLO blocking main thread	Threaded YOLO worker + cooldown 1Hz
10	LiDAR sector angles wrong	+90° URDF mounting offset ignored	θ_robot = θ_lidar − π/2
11	Water detection false positives	HSV water range matched blue walls	Disabled water HSV detection
12	Multiple scan_relay nodes running	Old inline Python scripts + new node	Kill all scan_relay before fresh start
13	Root-level build/install folders	colcon build run from wrong directory	Always build from ros2_ws/ directory
14	Hazard marker placement offset	Fixed 0.70m ahead, no depth projection	Known limitation — future fix with OAK-D depth

12. What We Learned

Technical Lessons

Simulation ≠ hardware. Every bug we fixed in simulation was a different bug than what exists on hardware. The Ogre2 LiDAR fix, the frame_id relay, the lifecycle management — none of these exist on real hardware. Always test on real hardware as early as possible.
Simple algorithms often beat complex ones. The original frontier scoring formula (weighted combination of size, unknown area, and distance) sounded more principled than nearest-first, but it caused the robot to oscillate. score = 1/(dist+ε) with a proximity lock is simpler and works better.
Simulation time is subtle. The diffdrive_controller timestamp bug took hours to debug. Wall clock vs sim clock mismatches can manifest in unexpected ways — always add use_sim_time to every node.
Threading matters for real-time systems. YOLO inference at 30ms/frame is fast enough, but blocking the main ROS executor with it caused a 30-second backlog. Always run heavy computation in a separate thread.
ROS 2 lifecycle nodes need explicit activation. slam_toolbox being a lifecycle node that requires manual configure+activate is not obvious from the documentation.
Check your URDF. The LiDAR angle correction (+90°) was buried in the TurtleBot4 URDF and took time to find. Always visualize TF trees when sensor data looks wrong.

Process Lessons

Use the debug monitor from day one — having real-time visibility into map size, frontier count, and LiDAR status saved hours of debugging.
Push frequently and use git properly — being able to find exactly when a bug was introduced via git bisect is invaluable.
Document issues as you find them — this report section was written from memory and git commit messages. Future projects should log issues in real time.
Split simulation debugging from hardware testing explicitly — they have completely different failure modes.

What We Would Do Differently

Start with real hardware testing earlier — we spent too long on simulation before testing on the robot
Use Nav2's lifecycle manager properly or build our own instead of working around it
Add depth-based marker placement from the start instead of fixed offset
Implement a proper rosbag-based regression test suite so parameter changes don't break existing functionality

13. Future Work

Depth-based hazard marker placement: Use OAK-D depth frame to project the 2D bounding box center through camera intrinsics → 3D world position → transform to odom frame. Would eliminate the fixed-offset placement error.
Multi-floor exploration: Current system assumes flat ground. Adding floor-change detection and staircase navigation would significantly expand applicability.
Semantic costmap integration: HAZMAT and fire detections currently publish markers only. Integrating semantic hazards into the navigation costmap (marking detected hazard zones as high-cost) would prevent the robot from navigating through detected danger zones.
Hardware pothole testing: The pothole depth analysis is implemented but not tested on real hardware. Need to create a physical test environment with a small drop (>0.20m).
Predictive hazard propagation: Mark adjacent cells as high-risk before the robot reaches a detected hazard, closing the temporal vulnerability window between frontier selection and hazard classification.
Next-Best-View frontier scoring: Replace nearest-first with information-gain-per-meter-traveled scoring (NBV). Each frontier would be evaluated for how much new unknown area it would reveal, weighted by travel cost.
Multi-robot coordination: Partition the environment between multiple robots to reduce exploration time. Each robot claims frontiers and communicates claimed areas to prevent duplicate work.
3D mapping: Extend from 2D OccupancyGrid to 3D voxel map using OctoMap, enabling exploration of multi-level environments.

← Milestone 3 Home →