Back to Stories Applied AI

Building Physical AI Agents to Spark Dialogue

The Candytron 4000 demonstrates how AI systems 'listen,' 'interpret,' and 'act' as an integrated whole, making AI tangible and understandable through a playful multimodal demonstration.

January 1, 2025 | State of AI 2025 Report | Page 23
Agricultural robot with solar panels in field

At RISE, we believe it’s valuable to regularly make AI tangible with demos, bringing AI closer to collaborators, potential clients, and the wider society. An example of that is the Candytron 4000: a playful AI-agent demonstration where an AI system listens to your voice, understands your request, and instructs a robot arm to deliver a piece of candy.

The Technology Stack

Behind this simple gesture is a complete multimodal AI pipeline running in real time:

  • Computer Vision: YOLO for object detection
  • Speech Recognition: Whisper for understanding voice commands
  • Speech Synthesis: Piper for generating responses
  • Reasoning Model: Gemma 3 coordinating the robot arm

Making AI Understandable

The Candytron isn’t about showcasing cutting-edge robotics; it’s about making AI understandable. By combining several models into one functioning agent, the demo reveals how modern AI systems “listen,” “interpret,” and “act” as an integrated whole. It gives visitors an intuitive, hands-on way to grasp what an AI agent actually is.

The Value of Demos

When technology becomes concrete, curiosity follows, and when curiosity opens the door, meaningful innovation begins. Demos like this help spark conversations and lower the threshold for engagement, regardless of technical background.

Share this story