Google releases Gemini Canvas: a multimodal model for visual-to-UI translation

AI · 6 min read

Gemini Canvas is a multimodal model from Google designed to interpret visual input — screenshots, wireframes, or hand sketches — and output structured UI artifacts: component trees, layout constraints, and CSS-like properties. The model emphasizes semantic mapping so it understands 'this block is a nav' or 'this is a product card' and assigns appropriate component types.

The model offers several export formats: JSON component trees compatible with common design systems, React component skeletons, and Figma import packages that preserve auto-layout and tokens. Gemini Canvas can also convert annotations (like arrows or notes) into interaction hints, producing proto-interactions alongside static layouts.

Google highlighted use cases in design systems migration, rapid prototyping from whiteboard sessions, and accessibility remediation for legacy interfaces. Privacy controls include on-device inference for mobile sketches and a managed-cloud option with enterprise data governance for teams who opt into server-side processing.