Meta opens up Llama 4 chat embeddings API with privacy-first hosted options

AI · 5 min read

Meta's new embeddings API is aimed at search, recommendation, and retrieval use cases. The hosted option provides ephemeral context handling, where vectors are invalidated after a configurable retention period to reduce data residency risks.

For enterprises with strict data policies, Meta offered a managed on-prem connector that runs model inference in a customer VPC and streams only metadata back to Meta for billing. The API also includes quantized runtime variants to lower cost for high-volume embeddings workloads.

Meta emphasized tooling for model evaluation and fairness, shipping a validation suite that helps teams detect bias in semantic retrieval and rank ordering. The company also published performance baselines across common vector similarity libraries.