MotionCanvas: Generative Motion-Driven Interactive Art Experience

Qinxin Shu^*, Shuai Xu^*

Department of Design Sciences

Faculty of Engineering LTH | Lund University

*Indicates Equal Contribution

Paper Presentation pages Pop-Sci Summary

MotionCanvas translates simple user movements into fine-styled artworks, such as flower paintings, allowing users to have engaging real-time art creation experiences. The canvas features a line color flowing effect to enhance appearance and reduce perceptive processing time. MotionCanvas also supports co-creation among multiple users within the same space. enhancing the interactive experience, forstering community environment, further encouraging the collaborative engagement in artistic creation.

Abstract

We introduce MotionCanvas, a spatial art creation tool that integrates Generative AI and motion capture technologies into artistic interactive experiences.

Existing interactive platforms provide engaging environments and allow users to interact through touch or gestures. However, they are often limited by predefined interactive content and gestures, lacking the ability to respond to user movements comprehensively.

We combine Generative AI models with motion capture technology to create novel forms of interactive art, with three primary objectives: 1) ensuring that the generated content aligns with the intended aesthetic, 2) advancing the possibilities for creative collaboration, and 3) minimizing latency within the interaction pipeline. The results of the user study confirm that our system can enhance user engagement and provides dynamic and immersive experiences.

Motion to Flower

User Drawings

Free-Form Drawing

Reference image for users.

Drawing results.

Structured Drawing

Reference image for users.

Drawing results.

System Workflow Overview

Users engage with the system, initiating the motion sequence. Subsequently, the motion capture system records and transmits their movements, channeling raw data to MQTT topic one for processing. A Python program then does the base processing of the data from topic one, and publishes refined information to MQTT topic two. The Unity program then integrates the processed data from topic two, transforming 3D motion data into simplified 2D drawing inputs. These drawing inputs are subsequently forwarded to the Image Generation API, where a systematic server is built to generate elaborate flower images based on the provided data. Finally, the image data is sent back to the Unity program, and images are displayed on the screen.

Flower Generation Pipeline

The flower generation pipeline first receives point data from Unity, followed by a line processing that smooths lines and incorporates geometric shapes to enhance the base input image. Subsequently, the processed image is converted into a depth image using a Depth estimation model. Then, together with textual prompts, this depth image is fed into the fine-tuned image generation module. The final step involves employing the BiRefNet model to enhance visual clarity by eliminating background colors from the generated images.
The whole generation process takes about 1 second with one single NVIDIA GeForce GTX 4090 (24GB) GPU.

Acknowledgements

We express our deep gratitude to our supervisors, Sangxia Huang and Günter Alce, for their invaluable guidance, support, and advice throughout this thesis. Their insights and encouragement were crucial in the successful completion of this work.