OpenPrompt Launches On-Device Low-Latency LLM for Mobile Developers

AI · 4 min read

OpenPrompt released a new model family optimized for on-device performance and a corresponding SDK designed for iOS and Android developers. The models run quantized on-device with specialized kernels that exploit NPU and GPU capabilities, delivering conversational responses in under 50ms on recent hardware.

The SDK includes secure on-device fine-tuning primitives that keep user data local, along with privacy-preserving personalization features that synthesize compact user embeddings. OpenPrompt positions the product as ideal for assistants, keyboard prediction, and AR UX layers where latency and privacy are critical.

The company is launching partnerships with a handful of major mobile app developers for early integration and will offer an enterprise bundle that includes model hosting, on-device deployment tools, and analytics for personalization quality. CEO Nikhil Rao stressed the goal of reducing network dependency while preserving model responsiveness.