Building Physical AI Agents to Spark Dialogue
The Candytron 4000 demonstrates how AI systems 'listen,' 'interpret,' and 'act' as an integrated whole, making AI tangible and understandable through a playful multimodal demonstration.
At RISE, we believe it’s valuable to regularly make AI tangible with demos, bringing AI closer to collaborators, potential clients, and the wider society. An example of that is the Candytron 4000: a playful AI-agent demonstration where an AI system listens to your voice, understands your request, and instructs a robot arm to deliver a piece of candy.
The Technology Stack
Behind this simple gesture is a complete multimodal AI pipeline running in real time:
- Computer Vision: YOLO for object detection
- Speech Recognition: Whisper for understanding voice commands
- Speech Synthesis: Piper for generating responses
- Reasoning Model: Gemma 3 coordinating the robot arm
Making AI Understandable
The Candytron isn’t about showcasing cutting-edge robotics; it’s about making AI understandable. By combining several models into one functioning agent, the demo reveals how modern AI systems “listen,” “interpret,” and “act” as an integrated whole. It gives visitors an intuitive, hands-on way to grasp what an AI agent actually is.
The Value of Demos
When technology becomes concrete, curiosity follows, and when curiosity opens the door, meaningful innovation begins. Demos like this help spark conversations and lower the threshold for engagement, regardless of technical background.


