Voice-Interactive Tour Guide Robot Enhances Museum Experience

Voice-Interactive Tour Guide Robot Enhances Museum Experience

In an era where automation and artificial intelligence are reshaping public services, a new advancement in service robotics is capturing attention across the global tech community. Researchers from Lanzhou University of Technology have unveiled an innovative voice-interactive tour guide robot designed to revolutionize the way visitors engage with cultural and educational spaces. This cutting-edge robot, developed by graduate students Cheng Qichao and Zhou Jiawu, integrates advanced AI-driven voice interaction, robust navigation systems, and seamless device connectivity to deliver a dynamic, intelligent, and immersive experience in public venues such as museums, exhibition halls, and science centers.

The research, published in the June 2021 issue of Computer & Digital Engineering, introduces a robot that transcends traditional automated guides by combining human-like interaction with precise environmental awareness and responsive control capabilities. Unlike earlier models that primarily focused on pre-recorded audio tours or basic motion control, this new system emphasizes real-time engagement, adaptive behavior, and integration with surrounding smart infrastructure—features that align closely with the evolving expectations of modern users.

At the heart of the robot’s functionality is its integration of iFLYTEK’s AIUI platform—a next-generation speech interaction solution known for its high-accuracy voice recognition and natural language processing. This technology enables the robot to understand spoken commands, respond in context, and even recognize the direction of a speaker’s voice using a six-microphone array. The spatial awareness allows the robot’s head to turn toward the person speaking, simulating natural human interaction and enhancing the sense of presence during conversations.

What sets this robot apart is not just its ability to listen and speak, but how it processes and acts upon user input. The AIUI system supports customizable wake-up words and dialogue scripts, allowing institutions to tailor interactions to their specific branding or educational goals. For example, a science museum could program the robot to respond with age-appropriate explanations or fun facts when asked about planetary motion, while an art gallery might configure it to describe the historical context of a painting. This level of personalization transforms the robot from a mere information dispenser into a true interactive companion.

Beyond voice, the physical design of the robot contributes significantly to its appeal. Shaped like a stylized astronaut and rocket—capable of transforming between two forms—the robot uses motorized arms and a dynamically moving head to express actions that complement its narration. These movements are not arbitrary; they are synchronized with the content being delivered, adding a performative dimension to the tour. When explaining the launch of a spacecraft, for instance, the arms extend upward as if mimicking liftoff, while the head tilts skyward. Such choreographed gestures make the experience more engaging, particularly for younger audiences.

The transformation mechanism itself is powered by dual DC geared motors—one controlling arm extension and retraction, the other managing vertical movement. Optical sensors and limit switches ensure precise positioning and prevent mechanical overtravel, contributing to both safety and reliability. The head, equipped with two additional motors, can pan left and right up to 45 degrees and tilt up 15 degrees or down 5 degrees, enabling lifelike scanning motions as it follows sound sources or gestures during interaction.

Navigation is another critical component of the robot’s autonomy. Rather than relying solely on vision-based systems like cameras or LiDAR—which can be disrupted by lighting changes, reflective surfaces, or crowded environments—the team opted for a hybrid approach combining magnetic guidance and RFID technology. A magnetic tape embedded in the floor serves as the primary path, which the robot follows using an eight-point magnetic sensor array mounted on its base. By detecting deviations from the centerline of the tape, the system adjusts the differential speed of the two drive wheels to maintain alignment, ensuring smooth and accurate trajectory tracking.

Complementing this is a low-power RFID reader that detects tags placed beneath the magnetic path at key locations—such as exhibit entrances or interactive zones. Each tag contains encoded data corresponding to a specific point in the tour, including not only positional identification but also instructions for synchronized actions. Upon reading a tag, the robot knows exactly where it is and what content to deliver, eliminating the drift and ambiguity often associated with GPS-denied indoor navigation.

This dual-mode system offers superior stability compared to purely visual or wireless triangulation methods. It operates effectively in environments with high ambient light, moving crowds, or electromagnetic interference—common challenges in busy public spaces. Moreover, because the infrastructure is passive (magnetic tape and RFID tags require no power), maintenance costs remain low, and deployment is relatively straightforward.

One of the most significant innovations lies in the robot’s ability to interact with venue infrastructure. Through a multi-protocol wireless control unit, it communicates with demonstration equipment, lighting systems, multimedia displays, HVAC units, and other connected devices. When the robot reaches a designated exhibit, it sends commands via Wi-Fi, Bluetooth, infrared, or RF signals to trigger coordinated effects—such as dimming lights, starting a video, activating a model animation, or adjusting room temperature.

This level of integration elevates the entire visitor experience from passive observation to active participation. Imagine entering a climate exhibit where, as the robot begins discussing global warming trends, the thermostat subtly increases the room temperature, fans turn off, and a time-lapse projection of melting glaciers appears on the wall—all initiated automatically by the robot. Such multisensory synchronization deepens comprehension and emotional engagement, turning abstract concepts into tangible experiences.

The communication architecture enabling these interactions is built around the ModBus-RTU protocol, a widely adopted industrial standard known for its simplicity and reliability. Running over RS-485 physical layers, ModBus ensures stable data exchange even in electrically noisy environments. The system employs a master-slave configuration, with the main STM32F4-based controller on the robot’s base acting as the master, polling subordinate modules—head control, arm mechanism, display unit, audio processor—before issuing commands. Each module has a unique device address, and data frames include CRC checksums for error detection, minimizing the risk of miscommunication.

All onboard processing is handled by three STM32F4 microcontrollers distributed across the robot: one in the head for voice and motion control, one in the torso for managing the transformation mechanism, and one in the base for sensor fusion, navigation, and overall system coordination. This modular design enhances fault tolerance and simplifies software development, as each subsystem can be updated or debugged independently.

The main control logic is written in C using a modular programming paradigm, improving code readability and maintainability. Upon startup, the robot performs a self-diagnostic routine, checking communication links, verifying actuator positions, and initializing displays. Once cleared, it enters standby mode, where it remains responsive to voice commands or touchscreen inputs. A large 8-inch LCD touchscreen on the robot’s midsection provides a visual interface for users who prefer non-verbal interaction, supporting touch-based menu navigation and content selection.

Behind the scenes, the software manages a complex state machine that transitions between idle, navigation, presentation, and interaction phases. During a guided tour, the robot moves along the magnetic path, periodically pausing at RFID-marked waypoints. At each stop, it delivers a narrated segment—pre-recorded or dynamically generated—while executing predefined gestures. To avoid interruptions, the AIUI voice recognition system is temporarily disabled during playback. However, once the narration concludes, the robot prompts visitors to ask questions, re-enabling voice detection and activating real-time sound source tracking.

If no query is detected within 10 seconds, or if an ongoing conversation exceeds two minutes, the system assumes the interaction phase has ended and resumes the tour. This balance between responsiveness and flow ensures that the experience remains structured yet flexible, accommodating spontaneous inquiries without derailing the overall schedule.

Safety is embedded throughout the design. In addition to mechanical limit switches, the robot is equipped with ultrasonic sensors that scan up to 120 degrees in front of it, detecting obstacles within a range of several meters. Depending on proximity, the robot will either slow down or come to a complete stop, issuing verbal warnings to alert nearby staff. An infrared pyroelectric sensor detects human presence, allowing the robot to initiate greetings when someone approaches—adding a welcoming touch without being intrusive.

For emergency situations, a physical e-stop button is located on the robot’s back, providing immediate manual override. The entire system runs on a 24V lithium battery pack, with voltage regulation circuits supplying appropriate levels to different components. Power management strategies help extend operational time, making multi-hour tours feasible without recharging.

User experience testing conducted in a laboratory environment demonstrated the robot’s stability, responsiveness, and intuitive operation. Visitors reported high satisfaction with both the clarity of the audio output and the naturalness of the robot’s movements. The combination of expressive features, reliable navigation, and intelligent interaction created a perception of personality and intentionality—qualities that foster emotional connection and enhance memorability.

While the current prototype shows strong performance, the researchers acknowledge areas for improvement. Motion precision, particularly in the arm actuators, could be enhanced to reduce mechanical backlash and noise. Future iterations may explore the use of brushless motors or harmonic drives for smoother, quieter operation. Additionally, expanding the robot’s environmental perception—perhaps through supplementary depth sensors or SLAM algorithms—could enable limited off-path navigation or obstacle detouring, increasing flexibility in dynamic spaces.

From a broader perspective, this project reflects a growing trend in robotics: the shift from task-specific automation to socially integrated assistants. As public institutions seek to improve accessibility, engagement, and operational efficiency, intelligent robots like this one offer scalable solutions. They can provide consistent, multilingual commentary, assist visitors with disabilities, and collect anonymized usage data to inform exhibit design and crowd management.

Moreover, the open architecture and use of standard protocols make the system adaptable to various settings. With minimal modifications, the same platform could serve as a retail assistant in a shopping mall, a guide in a hospital, or an educator in a school. The modular hardware and software framework allows developers to swap out components or add new functionalities—such as facial expression rendering via OLED screens in the robot’s visor or integration with cloud-based knowledge bases for real-time information retrieval.

The implications extend beyond user experience. By reducing reliance on human staff for routine guidance tasks, such robots free up personnel for more complex, high-value interactions—such as personalized consultations or hands-on demonstrations. They also operate continuously without fatigue, ensuring consistent service quality throughout the day.

In the context of global robotics development, this work underscores China’s growing role in applied AI and service automation. While much of the international spotlight focuses on industrial or military robotics, innovations in consumer-facing technologies demonstrate a maturing ecosystem capable of addressing real-world societal needs. The collaboration between academic institutions and technology providers—such as the use of iFLYTEK’s AIUI platform—highlights a model of innovation that combines theoretical research with practical engineering.

As cities become smarter and venues more digitized, the demand for intelligent, interactive interfaces will only grow. Robots like the one developed by Cheng Qichao and Zhou Jiawu represent a bridge between digital content and physical space, transforming static environments into responsive, adaptive experiences. Their work exemplifies how thoughtful integration of voice, motion, navigation, and connectivity can create systems that are not only functional but also meaningful.

Looking ahead, future developments may include swarm coordination—where multiple robots manage different visitor groups simultaneously—or integration with augmented reality applications, allowing users to view additional digital content through mobile devices triggered by the robot’s location. Machine learning algorithms could also enable the robot to adapt its behavior based on visitor demographics, preferences, or engagement patterns, further personalizing the experience.

For now, the successful demonstration of this voice-interactive tour guide robot marks a significant step forward in the evolution of public service robotics. It proves that with careful system design, attention to user needs, and strategic use of existing technologies, it is possible to build machines that inform, entertain, and connect with people in authentic ways.

Voice-Interactive Tour Guide Robot Developed by Cheng Qichao and Zhou Jiawu from Lanzhou University of Technology, Published in Computer & Digital Engineering, DOI: 10.3969/j.issn.1672-9722.2021.06.040