Google Cloud releases VertexML for composable multimodal model hosting

AI · 5 min read

VertexML is designed around model chains and handlers: teams can stitch together specialized models for OCR, vision, speech and reasoning in a hosted graph that handles batching, routing and fallbacks. Google highlighted native composability as a way to reduce glue code and maintain performance at scale.

Operational features include per-edge latency SLAs, automatic warmers for cold starts, and integrated A/B rollout support. VertexML’s observability surfaces token-level traces, resource attribution, and drift alerts tied to model inputs so operators can identify bottlenecks in multimodal pipelines.

Security features include encrypted model artifacts at rest, fine-grained IAM for model endpoints, and a model registry with versioned schemas. Google positioned VertexML as an opinionated platform to accelerate productionization of multimodal applications across enterprise use cases.