While visual generation has seen tremendous advancements, integrating touch remains an open challenge. Vision provides global information about an object’s appearance and geometry, whereas touch captures fine-grained details of texture and material properties. In this work, we bridge these two modalities to achieve globally consistent yet locally high-resolution texture synthesis.
We focus on learning the correspondence between visual and tactile signals, enabling the synthesis of realistic textures for novel objects based on text or sketch inputs. Our approach begins with texture synthesis for rigid objects and extends to estimating material properties for deformable objects. By combining visual and tactile information, we aim to enhance generative models with richer, multimodal representations, opening new possibilities for realistic virtual environments, haptic rendering, and robotics applications.