Meta releases Llama‑3.1 with layout‑aware attention for UI and document tasks

AI · 6 min read

Llama‑3.1 modifies the transformer architecture to incorporate 2D positional encodings and attention biases that respect object bounding boxes. The change helps models reason about spatial relationships between elements like buttons, labels, and images.

Benchmarks show significant gains on layout tasks: screenshot captioning, HTML reconstruction, and component extraction all improved by double‑digit percentages compared to Llama‑3.0. Meta released fine‑tuned checkpoints specifically for web UI generation and PDF-to-structured-data extraction.

Meta also published a set of evaluation tools for layout understanding and urged the community to explore responsible use, noting risks like overfitting to common web templates. The new model is available through Meta's API and select partner integrations.