We introduce MotionCanvas, a spatial art creation tool that integrates Generative AI and motion capture technologies into artistic interactive experiences.
Existing interactive platforms provide engaging environments and allow users to interact through touch or gestures. However, they are often limited by predefined interactive content and gestures, lacking the ability to respond to user movements comprehensively.
We combine Generative AI models with motion capture technology to create novel forms of interactive art, with three primary objectives: 1) ensuring that the generated content aligns with the intended aesthetic, 2) advancing the possibilities for creative collaboration, and 3) minimizing latency within the interaction pipeline. The results of the user study confirm that our system can enhance user engagement and provides dynamic and immersive experiences.
Users engage with the system, initiating the motion sequence. Subsequently, the motion capture system records and transmits their movements, channeling raw data to MQTT topic one for processing. A Python program then does the base processing of the data from topic one, and publishes refined information to MQTT topic two. The Unity program then integrates the processed data from topic two, transforming 3D motion data into simplified 2D drawing inputs. These drawing inputs are subsequently forwarded to the Image Generation API, where a systematic server is built to generate elaborate flower images based on the provided data. Finally, the image data is sent back to the Unity program, and images are displayed on the screen.
The flower generation pipeline first receives point data from Unity,
followed by a line processing that smooths lines and incorporates geometric shapes to enhance the base input image.
Subsequently, the processed image is converted into a depth image using a Depth estimation model. Then, together
with textual prompts, this depth image is fed into the fine-tuned image generation module. The final step involves
employing the BiRefNet model to enhance visual clarity by eliminating background colors from the generated images.
The whole generation process takes about 1 second with one single NVIDIA GeForce GTX 4090 (24GB) GPU.
We express our deep gratitude to our supervisors, Sangxia Huang and Günter Alce, for their invaluable guidance, support, and advice throughout this thesis. Their insights and encouragement were crucial in the successful completion of this work.