OpenWeights Consortium publishes standardized tokenized design dataset
AI · 6 min read
The dataset — curated from permissively-licensed design systems and contributor-submitted components — tokenizes layouts into structured records: component type, spacing tokens, color tokens, state variants, and accessibility metadata. The consortium's goal is to reduce the noisy, ad-hoc datasets that have hindered reproducible research in UI-focused generative models.
Alongside the dataset, the consortium published a reference schema and evaluation metrics for layout coherence, token preservation, and accessibility compliance. These metrics help researchers and engineers measure whether a model preserves design-system constraints during generation and whether it introduces accessibility regressions.
The project is open for contributions and includes conversion scripts for popular design exports (Figma JSON, Sketch files) to the tokenized schema. Consortium maintainers hope the dataset will become a benchmark for UI model research and spur more interoperable tooling across the design ML ecosystem.