Databricks (Mosaic) AI Gateway Explained: Features & Fit
If your data and machine learning workloads already live inside Databricks, the Databricks AI Gateway — sometimes called Mosaic AI Gateway or Databricks Model Serving Gateway — is the governance and routing layer that comes with the platform. It lets you call external LLM providers and Databricks-hosted models through a unified, policy-controlled endpoint, without leaving your lakehouse. This article explains what the Databricks AI Gateway is, what it does well, what to weigh before relying on it, and where a standalone, provider-agnostic gateway fits for teams outside the Databricks ecosystem. For a category primer, see what is an AI gateway.
What is the Databricks AI Gateway?
The Databricks AI Gateway (also surfaced under the Mosaic AI umbrella in Databricks' product naming) is a governance and access layer built into the Databricks platform. It provides a single endpoint through which authorized users and applications can reach both external LLM providers — such as OpenAI, Anthropic, and others — and Databricks-hosted models served via Databricks Model Serving. Rather than wiring every notebook, pipeline, or application directly to each provider's API, teams configure routes through the gateway and manage access, rate limits, and usage in one place.
The mental model is a governance and access control layer, not a standalone router or a token reseller. Databricks does not sell you inference tokens through the gateway in place of a provider; the value is in unified access control, audit logging, payload logging, rate limiting, and usage tracking — all anchored in the Databricks platform and integrated with Unity Catalog for permissions. That framing sets expectations for who this product is built for and what it optimizes around.
How the Databricks (Mosaic) AI Gateway works
The gateway is configured inside your Databricks workspace:
- Route definitions. You create routes that map a logical name to an upstream endpoint — an external provider (credentials stored in Databricks Secrets) or a Databricks-hosted model on Model Serving.
- Unified endpoint. Applications call the gateway instead of individual provider APIs; the gateway resolves the route, authenticates to the upstream, and returns the response.
- Governance controls. Each route carries rate limits (requests and tokens per minute), Unity Catalog permissions, and usage tracking for team-level chargebacks.
- Payload logging. Request and response payloads can be logged to a Delta table in your lakehouse for audit trails and fine-tuning datasets.
For current configuration steps, supported providers, and API shapes, consult Databricks' official documentation — these evolve with platform releases.
Strengths of the Databricks AI Gateway
- Deep lakehouse integration. Payload logging lands in Delta tables, making AI usage data queryable alongside your operational and analytics data with no ETL step.
- Unity Catalog permissions. Fine-grained access control and attribute-based policies for model endpoints, consistent with how data assets are governed across the workspace.
- Usage tracking and chargebacks. Per-route, per-team attribution of token consumption for cost allocation without building a separate accounting layer.
- Managed credentials. Provider API keys and secrets are stored in Databricks' secure secrets store, not distributed to individual developers or notebooks.
- Single endpoint for hosted + external models. Applications do not need to distinguish between a Databricks-served fine-tuned model and a call to a third-party provider — the gateway abstracts that.
- No additional vendor. If Databricks is already your data and ML platform, the gateway adds zero new infrastructure or contract surface.
Considerations before committing
The Databricks AI Gateway is a well-engineered fit for a specific context. Before treating it as a general-purpose LLM gateway, weigh these points:
- Databricks-centric by design. The product is built to be used within a Databricks workspace. If your application stack does not live in Databricks — a standalone Node.js backend, a Python service on a different cloud, a local development environment — integration requires additional work and may not feel native.
- Routing is access-control-oriented, not cost- or latency-optimized. The gateway routes requests to a pre-configured endpoint; it does not dynamically select the cheapest or fastest provider for a given request, race multiple providers, or A/B-test model quality on live traffic. These are different jobs from governance and access control.
- Pricing lives inside Databricks billing. The cost of using the gateway is part of your Databricks platform cost. For exact current pricing, consult Databricks' pricing pages directly — do not rely on third-party articles for numbers, since these change.
- External provider support. Which external providers are supported, and how, evolves over time. Verify the current provider list and any restrictions in the official docs before assuming a specific provider is supported.
- Not a standalone tool. You cannot adopt the Databricks AI Gateway without adopting Databricks itself. For teams outside the Databricks ecosystem, the entry cost of the platform makes it impractical as a gateway-only solution.
Databricks AI Gateway vs. a standalone, provider-agnostic gateway
The table below captures the structural differences. Always verify current feature details against each product's documentation.
| Dimension | Databricks AI Gateway | Standalone BYOK gateway (e.g. flo2) |
|---|---|---|
| What it is | Governance + access layer inside Databricks | Developer-first LLM router/proxy, any stack |
| Ecosystem requirement | Requires Databricks workspace | Any HTTP, OpenAI, or Anthropic client |
| Primary design goal | Governance, access control, audit, payload logging | Routing, fallback, racing, cost accounting |
| Token markup | Part of Databricks platform billing | Zero markup — you pay providers directly (BYOK) |
| Dynamic routing | Fixed route to configured endpoint | Cheapest/fastest routing, fallback chains, racing |
| Payload logging | Delta tables in your lakehouse | Per-call cost accounting at provider list prices |
| API compatibility | Databricks-native; check docs for OpenAI mode | Drop-in OpenAI- and Anthropic-compatible key |
Where flo2 fits for teams outside the Databricks ecosystem
If your stack does not revolve around Databricks — or you want a routing layer that works regardless of your ML platform — the needs look different. You want a gateway that treats intelligent routing and honest cost accounting as first-class concerns, not as by-products of a larger platform's governance layer.
flo2 is a developer-first LLM gateway built around exactly that job. You bring your own keys for providers like OpenAI, Anthropic, Google Gemini, Groq, Cerebras, DeepInfra, Mistral, and xAI, and flo2 exposes a single key that is drop-in compatible with both the OpenAI API and the Anthropic API. There is zero token markup — flo2 never sits in the money path; you pay each provider directly at their published rates. What flo2 adds on top of that routing layer:
- Smart routing that sends each request to the cheapest or fastest qualifying provider for the task, so a quick summarization does not hit a flagship model at flagship cost.
- Fallback chains that degrade gracefully across providers when one is down or rate-limited, without the call failing.
- Racing — fire the same prompt at multiple providers and take the fastest acceptable response back, useful when latency is the constraint.
- A/B testing with a judge to measure model–task fit on real traffic, so you pick the winning model on evidence rather than assumption.
- True per-call cost accounting in real dollars per request and per model, not aggregate token tallies you have to interpret.
- Opt-in response caching to eliminate spend on repeated identical calls.
flo2 does not offer Unity Catalog integration, payload logging to Delta tables, or Databricks-specific governance. If that is your need, Databricks is the right product. If you need a portable, zero-markup routing layer that drops into any codebase with a key swap, the Databricks gateway is not the right shape for the job.
How to decide
- Databricks is your data and ML home — use the Databricks AI Gateway. Unity Catalog, Delta payload logs, and managed credentials are compelling when everything is already there.
- You are outside the Databricks ecosystem — a platform-embedded gateway adds overhead without payoff. A standalone BYOK gateway is portable and keeps you at provider list prices.
- You need intelligent routing, racing, or cost optimization — even Databricks shops sometimes want a separate layer for dynamic provider selection and economics. These are separable concerns.
For a broader comparison, see our best LLM gateway comparison. If you are evaluating a standalone gateway, flo2 is free during its Beta — point an existing OpenAI or Anthropic SDK at it and compare routing, cost, and reliability against your current setup with no contract and no token markup.