Google expands Gemini API with multimodal fine-tuning and latency tiers

AI · 5 min read

The new Gemini multimodal fine-tuning allows teams to upload paired datasets of images, audio, and structured text to create models tuned for domain-specific tasks such as visual search or guided design generation. Fine-tuned endpoints can be provisioned privately and secured under organization-level IAM policies.

To address production needs, Google introduced latency tiers with SLAs and replenishable compute reservations, making it easier for developers to pick a predictable performance profile for interactive apps. Pricing and quota controls were updated to support consistent performance across peaks.

Additional SDKs for web and mobile include safety filters and a moderation pipeline that can be customized per endpoint. Google also published best-practice patterns for integrating multimodal outputs into UX while preserving clarity and editability for users.