Apple launches CoreML-Transformers 2 with on-device sparse attention for designers

Tech · 5 min read

Apple announced CoreML-Transformers 2, which brings a transformer runtime optimized for Apple silicon and iOS devices. Key features include sparse attention kernels, dynamic quantization, and improved memory management for multimodal tasks such as image-to-text generation and layout parsing.

The update enables app developers and design tool vendors to run more sophisticated models on-device without sacrificing battery life or responsiveness. Apple highlighted secure enclaves and privacy-preserving APIs that allow models to operate without sending raw user data to cloud services.

For designers, this means prototyping assistants and real-time asset suggestions can run directly on devices used in user research or location-specific testing. Apple also released developer samples showing how to integrate the runtime with popular creative frameworks and vector-editing toolkits.

Tool vendors applauded the performance gains, but some warned that on-device parity with cloud models will remain a moving target, and careful UX design is needed when model behavior differs across device classes.