1. Project Overview
This project implements a fully autonomous ground robot system capable of exploring unknown indoor environments, building real-time occupancy maps, and detecting semantic hazards — all without any human teleoperation or pre-loaded maps.
The platform is a TurtleBot4 Lite running ROS 2 Jazzy on Ubuntu 24.04. The robot is equipped with an RPLIDAR A1M8 360° LiDAR for mapping and navigation, an OAK-D spatial AI stereo camera for depth sensing and hazard classification, a Create3 mobile base with onboard sensors (cliff, bump, IMU, wheel encoders), and a Raspberry Pi 4B as the onboard computer.
The complete system consists of five custom ROS 2 nodes communicating over topics and action servers, plus SLAM Toolbox for mapping. The project spans three course milestones over Spring 2026, from initial architecture design to validated empirical benchmarking.
2. Motivation & Goals
The core problem we wanted to solve: how can a robot safely enter an unknown dangerous environment and give first responders a complete picture of what's inside before they enter?
Real-world applications include post-disaster search and rescue (collapsed buildings, fire scenes), industrial inspection (chemical plants, confined spaces), and nuclear facility assessment — all environments where human entry carries significant risk before a situation assessment is complete.
Project Goals
- Autonomous map-building in unknown environments using SLAM (no pre-loaded map required)
- Systematic frontier-based exploration — explore the nearest unknown areas first
- Collision-free navigation without Nav2 (custom A* planner)
- Semantic hazard detection: HAZMAT signs (13 classes), fire, cliffs, potholes, narrow corridors, glass walls
- Real-time RViz2 visualization of map, robot pose, path, and hazard markers
- Unified launch for both simulation and real hardware via single flag
3. System Architecture
The system is built as five custom ROS 2 nodes communicating over a well-defined topic interface. The design principle was clear separation of concerns — each node has one job and communicates via standard ROS 2 interfaces.
ROS 2 Topic Graph
| Publisher | Topic | Message Type | Subscriber |
|---|---|---|---|
| slam_toolbox | /map | OccupancyGrid | frontier_explorer, navigation_planner |
| frontier_explorer | /frontier_goals | PoseArray | behavior_coordinator |
| behavior_coordinator | /navigate_to_pose (action) | NavigateToPose | navigation_planner |
| navigation_planner | /cmd_vel_unstamped | Twist | Create3 base |
| semantic_hazard | /hazard_alert | String | behavior_coordinator |
| semantic_hazard | /exploration/stop | Bool | behavior_coordinator, navigation_planner |
| semantic_hazard | /hazard_markers | MarkerArray | RViz2 |
| Create3 | /hazard_detection | HazardDetectionVector | semantic_hazard |
| scan_relay | /scan_fixed | LaserScan | slam_toolbox (sim only) |
Hardware Setup
- Robot: TurtleBot4 Lite — Create3 base + Raspberry Pi 4B + OAK-D camera + RPLIDAR A1M8
- Compute (development): Laptop for simulation — NVIDIA RTX 5070 Ti, CUDA 13.1, Ubuntu 24.04
- Networking: ROS 2 Jazzy with Zenoh discovery server at 192.168.50.93:11811
- Simulation: Gazebo Sim 8 (Harmonic), turtlebot4_gz_bringup, maze world
4. SLAM & Mapping
We use slam_toolbox in async mapping mode. The robot builds a 2D occupancy grid at 0.05m/cell resolution as it moves. The SLAM node processes LiDAR scans and publishes a map→odom TF transform that the entire navigation stack depends on.
Configuration
Key parameters in slam_sim.yaml: scan_topic: /scan_fixed (simulation), resolution: 0.05, map_update_interval: 2.0, transform_timeout: 0.2, tf_buffer_duration: 20.0.
Real vs Simulation Difference
On real hardware, SLAM starts and builds maps immediately — the RPLIDAR A1M8 publishes in rplidar_link frame which matches exactly what SLAM expects. In simulation, the LiDAR publishes in turtlebot4/rplidar_link/rplidar frame (a known Gazebo naming quirk) which required a scan relay node to fix the frame_id before SLAM could process it.
5. Frontier Exploration
Frontier exploration is the core algorithm for autonomous navigation in unknown environments. A frontier is defined as a free cell adjacent to an unknown cell — the boundary between what the robot knows and doesn't know.
Algorithm
The frontier explorer runs every time a new map is received (approximately 1Hz). It scans all free cells, identifies those adjacent to unknown cells (and not adjacent to occupied cells), groups them into clusters via 4-connected flood fill, discards clusters with fewer than 10 cells (noise), and scores each cluster by inverse distance from the robot — nearest first.
Scoring Evolution
We went through two scoring approaches during development:
- Original:
score = (α × cluster_size + β × unknown_nearby) / (dist + ε)— weighted combination of frontier size, nearby unknown area, and distance. This caused the robot to chase large unknown areas rather than exploring systematically. - Final (nearest-first wavefront):
score = 1.0 / (dist + ε)— simply go to the closest frontier. Combined with the proximity lock in the behavior coordinator, this produces systematic nearest-first exploration.
7. Semantic Hazard Detection
The hazard classifier is the most complex node. It fuses five sensor layers into a single hazard alert and publishes RViz2 visualization markers.
5-Layer Architecture
- L1 — Hardware (Create3): Subscribes to
/hazard_detection. Cliff sensors trigger immediate E-stop. Response time <10ms. - L2a — YOLO Fire (YOLOv5): Custom-trained
fire_best.ptmodel. Confidence threshold 0.45. Detects fire from phone screen at 63% confidence. Printed images below threshold. - L2b — HSV Vision: Fixed thresholds for fire (H∈[0,15]∪[165,179], S>150, V>200). Calibrated under ASU ISTB lab fluorescent lighting. Water detection disabled — caused false positives on blue walls.
- L3 — Depth Analysis (OAK-D): NaN ratio >50% in floor region → GLASS. Depth drop >0.20m → POTHOLE. Pothole implemented but not yet tested on real hardware.
- L4 — LiDAR Sector: Minimum ranges in 3 sectors (front, left, right) after correcting for the +90° RPLIDAR mounting offset in the URDF. NARROW_CORRIDOR if any sector <0.30m.
- L5 — DeepHAZMAT: YOLOv3-tiny model with 13 HAZMAT classes (explosive, flammable, toxic, radioactive, etc.). 7M parameters, 15.8 GFLOPs. ~30ms inference on RTX 5070 Ti.
YOLO Worker Thread
A critical design decision: YOLO inference runs in a dedicated background thread with a 1-second cooldown (YOLO_INTERVAL_SEC = 1.0). This prevents YOLO from blocking the main ROS 2 executor. Princess had the old version without this — she experienced a 30-second camera backlog because inference was blocking the main thread.
LiDAR Angle Correction
The RPLIDAR A1M8 is mounted with a +90° rotation offset in the TurtleBot4 Lite URDF (rpy="0 0 π/2"). Without correcting for this, the sector analysis computes wrong angles. Fix: θ_robot = θ_lidar − π/2 before all sector computations.
8. Behavior Coordinator
The behavior coordinator is the "brain" that orchestrates the exploration. It manages the lifecycle from startup to completion, handles goal selection, tracks failures, and implements stability mechanisms.
6-State FSM
States: IDLE → SELECTING → PENDING → NAVIGATING → ARRIVED → DONE. The robot starts in SELECTING, picks the nearest frontier, sends a navigation goal, tracks success/failure, and continues until all frontiers are explored or battery is low.
Proximity Lock (Key Innovation)
Without the proximity lock, the robot would constantly switch targets as the SLAM map updated and new frontiers appeared closer. The lock says: if the robot is within 2 × PROXIMITY_LOCK_RADIUS = 6m of its current goal anchor, ignore all frontier updates. This was changed from 1.0m → 2.0m → 3.0m through testing. At 3.0m, the robot commits to its frontier long enough to actually reach it before the map changes enough to warrant switching.
Frontier Similarity Radius
Frontiers drift as the map updates — the same physical area gets slightly different centroid coordinates each update. Rather than cancelling a goal every time the centroid drifts by a few cells, we check if any current frontier is within 0.50m of the original goal anchor. If yes, update the goal coordinates silently (no cancel/replan).
9. Simulation Debugging
Getting the simulation working took significant effort. Multiple independent bugs needed to be found and fixed before the system worked end-to-end in Gazebo.
Root cause: Gazebo Sim 8 uses Ogre1 as the default GPU renderer for ray sensors. The TurtleBot4 self-collision geometry caused all 640 LiDAR rays to hit the robot's own mesh at minimum range. This produced a LiDAR that saw nothing but itself.
Fix: Change <render_engine>ogre</render_engine> to ogre2 in /opt/ros/jazzy/share/irobot_create_description/urdf/create3.urdf.xacro. Required sudo to edit the read-only system file. This is upstream GitHub issue #676.
sudo sed -i 's/<render_engine>ogre<\/render_engine>/<render_engine>ogre2<\/render_engine>/' \ /opt/ros/jazzy/share/irobot_create_description/urdf/create3.urdf.xacro
Root cause: Gazebo publishes the LiDAR scan in frame turtlebot4/rplidar_link/rplidar (Gazebo's full entity path naming). SLAM toolbox expected frame rplidar_link. The frame mismatch caused SLAM to silently drop all scan messages.
Fix: Created scan_relay_node.py — a simple ROS 2 node that subscribes to /scan and republishes to /scan_fixed with the corrected frame_id. SLAM configured to use /scan_fixed via slam_sim.yaml.
Root cause: The turtlebot4_navigation SLAM launch file uses RewrittenYaml which programmatically overwrites scan_topic to namespace/scan — completely ignoring any custom yaml we passed in.
Fix: Bypass the turtlebot4 SLAM launch entirely. Run slam_toolbox directly with our slam_sim.yaml. Also required manual lifecycle activation: ros2 lifecycle set /slam_toolbox configure && ros2 lifecycle set /slam_toolbox activate.
Root cause: The navigation planner uses wall clock time (time.time()) for its control loop, but Gazebo runs on sim time. The diffdrive_controller checks command timestamps and rejects any command older than 0.5 seconds — wall clock timestamps appeared 0.5s+ "old" from the sim clock's perspective.
Fix: ros2 param set /diffdrive_controller cmd_vel_timeout 2.0. Also ensures all custom nodes use use_sim_time:=true.
Root cause: When slam_toolbox is launched as an unmanaged node with our direct invocation, it initializes but doesn't auto-configure/activate. It sits in unconfigured state waiting.
Fix: After launching SLAM, manually trigger: ros2 lifecycle set /slam_toolbox configure then ros2 lifecycle set /slam_toolbox activate. The run_exploration.sh script handles this automatically.
Root cause: Frontier centroids land at the exact boundary between free and unknown space — which in the costmap often falls within the wall inflation zone. The planner rejected these as impassable.
Fix: Increased SNAP_SEARCH_RADIUS from 10 to 20 cells. The planner now searches up to 20 cells around the goal for the nearest free cell before giving up.
10. Hardware Testing
The system was tested on the real TurtleBot4 Lite in the ASU lab environment. Key observations from hardware runs:
What worked well on hardware
- SLAM builds maps immediately — the real RPLIDAR publishes in the correct frame, no relay node needed
- DeepHAZMAT detected Explosive 1.1 at 92% confidence and Radioactive I at 99% confidence at ~0.5m range
- Cliff sensor E-stop worked reliably — tested on table edge, triggered within <10ms
- Navigation planner navigated corridors without bumping walls with the tuned parameters
- YOLOv5 fire detection worked at 63% confidence on phone screen video
Issues on hardware
- Camera processing backlog (~30s delay) on Princess's machine — fixed by pulling the latest YOLO worker thread version
- Marker placement offset — HAZMAT markers placed at fixed 0.70m ahead of robot rather than at actual depth-estimated distance. Known limitation, future fix: use OAK-D depth to project marker position.
- WiFi latency causes occasional topic dropouts — Zenoh discovery server at 192.168.50.93:11811 must be reachable
Hardware Run Order
source ~/mobile-robotics-frontier-exploration/open_ros_env.sh source ~/mobile-robotics-frontier-exploration/ros2_ws/install/setup.bash # Terminal 1: Launch everything ros2 launch frontier_exploration_mapping exploration.launch.py use_sim_time:=false # Terminal 2: Debug monitor python3 ~/debug_monitor.py
11. Issues Found (Complete List)
Every problem we encountered and how we resolved it:
| # | Issue | Root Cause | Resolution |
|---|---|---|---|
| 1 | LiDAR all 0.164m in sim | Ogre1 renderer + self-collision | Change render_engine to ogre2 |
| 2 | SLAM not building map (sim) | frame_id mismatch Gazebo vs ROS2 | scan_relay_node.py |
| 3 | SLAM ignoring custom scan_topic | turtlebot4 launch uses RewrittenYaml | Bypass turtlebot4 SLAM launch, run slam_toolbox directly |
| 4 | Robot not moving (sim) | diffdrive_controller timestamp mismatch | cmd_vel_timeout 2.0 + use_sim_time |
| 5 | SLAM stuck unconfigured | Lifecycle not auto-activated | Manual lifecycle set configure + activate |
| 6 | All frontier goals → "surrounded by obstacles" | Frontiers in wall inflation zone | SNAP_SEARCH_RADIUS = 20 cells |
| 7 | Robot oscillating between frontiers | Score changes every map update | PROXIMITY_LOCK_RADIUS = 3.0m |
| 8 | ALIGN_THRESHOLD too tight → oscillation | 0.10 rad → robot kept rotating | ALIGN_THRESHOLD = 0.20 rad |
| 9 | Camera 30s backlog (Princess) | Old YOLO blocking main thread | Threaded YOLO worker + cooldown 1Hz |
| 10 | LiDAR sector angles wrong | +90° URDF mounting offset ignored | θ_robot = θ_lidar − π/2 |
| 11 | Water detection false positives | HSV water range matched blue walls | Disabled water HSV detection |
| 12 | Multiple scan_relay nodes running | Old inline Python scripts + new node | Kill all scan_relay before fresh start |
| 13 | Root-level build/install folders | colcon build run from wrong directory | Always build from ros2_ws/ directory |
| 14 | Hazard marker placement offset | Fixed 0.70m ahead, no depth projection | Known limitation — future fix with OAK-D depth |
12. What We Learned
Technical Lessons
- Simulation ≠ hardware. Every bug we fixed in simulation was a different bug than what exists on hardware. The Ogre2 LiDAR fix, the frame_id relay, the lifecycle management — none of these exist on real hardware. Always test on real hardware as early as possible.
- Simple algorithms often beat complex ones. The original frontier scoring formula (weighted combination of size, unknown area, and distance) sounded more principled than nearest-first, but it caused the robot to oscillate. score = 1/(dist+ε) with a proximity lock is simpler and works better.
- Simulation time is subtle. The diffdrive_controller timestamp bug took hours to debug. Wall clock vs sim clock mismatches can manifest in unexpected ways — always add use_sim_time to every node.
- Threading matters for real-time systems. YOLO inference at 30ms/frame is fast enough, but blocking the main ROS executor with it caused a 30-second backlog. Always run heavy computation in a separate thread.
- ROS 2 lifecycle nodes need explicit activation. slam_toolbox being a lifecycle node that requires manual configure+activate is not obvious from the documentation.
- Check your URDF. The LiDAR angle correction (+90°) was buried in the TurtleBot4 URDF and took time to find. Always visualize TF trees when sensor data looks wrong.
Process Lessons
- Use the debug monitor from day one — having real-time visibility into map size, frontier count, and LiDAR status saved hours of debugging.
- Push frequently and use git properly — being able to find exactly when a bug was introduced via git bisect is invaluable.
- Document issues as you find them — this report section was written from memory and git commit messages. Future projects should log issues in real time.
- Split simulation debugging from hardware testing explicitly — they have completely different failure modes.
What We Would Do Differently
- Start with real hardware testing earlier — we spent too long on simulation before testing on the robot
- Use Nav2's lifecycle manager properly or build our own instead of working around it
- Add depth-based marker placement from the start instead of fixed offset
- Implement a proper rosbag-based regression test suite so parameter changes don't break existing functionality
13. Future Work
- Depth-based hazard marker placement: Use OAK-D depth frame to project the 2D bounding box center through camera intrinsics → 3D world position → transform to odom frame. Would eliminate the fixed-offset placement error.
- Multi-floor exploration: Current system assumes flat ground. Adding floor-change detection and staircase navigation would significantly expand applicability.
- Semantic costmap integration: HAZMAT and fire detections currently publish markers only. Integrating semantic hazards into the navigation costmap (marking detected hazard zones as high-cost) would prevent the robot from navigating through detected danger zones.
- Hardware pothole testing: The pothole depth analysis is implemented but not tested on real hardware. Need to create a physical test environment with a small drop (>0.20m).
- Predictive hazard propagation: Mark adjacent cells as high-risk before the robot reaches a detected hazard, closing the temporal vulnerability window between frontier selection and hazard classification.
- Next-Best-View frontier scoring: Replace nearest-first with information-gain-per-meter-traveled scoring (NBV). Each frontier would be evaluated for how much new unknown area it would reveal, weighted by travel cost.
- Multi-robot coordination: Partition the environment between multiple robots to reduce exploration time. Each robot claims frontiers and communicates claimed areas to prevent duplicate work.
- 3D mapping: Extend from 2D OccupancyGrid to 3D voxel map using OctoMap, enabling exploration of multi-level environments.