Field notes · 2026-05-25 · 6 min read

Hosted agents across Anthropic, Google, and OpenAI, which one for your business

Three labs, three hosted agent platforms, and a real decision framework. How we pick across Anthropic, Google Vertex AI, and OpenAI Assistants for production work.

We are a multi-provider studio by deliberate choice. Smile PreVue runs on Google Vertex AI Gemini 3 Pro because the BAA matters. Watch DOGS attribution runs on Anthropic Claude because the orchestration is intricate. Our internal tooling uses OpenAI gpt-image-2 because the text rendering is unmatched. Three projects, three labs, one studio.

The most common question from a buyer in 2026 is not "should we use AI." It is "which lab." Here is the decision framework we walk every client through.

The three platforms

Anthropic Hosted Agents sits on top of Claude. Anthropic runs the model loop, the tool dispatch, and the state. The orchestration model is the cleanest of the three, sub-agents, skills, MCP-native tool wiring, and the API surface is the most opinionated. If your problem looks like an agent that needs to reason carefully and call a small set of tools, this is the strongest default.

Google Vertex AI Agent Builder sits on top of Gemini. Same shape as Anthropic, managed runtime, tools, state, but the compliance posture is several years more mature. BAA, FedRAMP, ISO 27001, SOC 2, data residency controls per region. The latency for image and multimodal workloads is the best of the three because the model serving is local to Google's data infrastructure.

OpenAI Assistants and the Responses API sits on top of GPT and gpt-image. The ecosystem is the broadest, connector marketplace, code interpreter, file search, function calling, vision, image generation, voice, all under one roof. If your agent needs to do many different things and you do not want to wire each one separately, OpenAI's surface area is the easiest entry point.

All three are good. All three are fast. The choice is not about quality; it is about fit.

How we pick, five real constraints

1. Compliance

Default to Google Vertex if any of the following are true: HIPAA, FedRAMP, regulated finance, healthcare PHI, FINRA-touching workflows, EU GDPR with data-residency commitments, or anything that needs a BAA.

Smile PreVue runs on Vertex AI Gemini 3 Pro under a BAA specifically because dental practices need a signed agreement on how patient photos are handled. The BAA is a contractual artifact, not a technical one. Google has signed it. Anthropic and OpenAI are both moving toward the same posture but, as of this quarter, the production-grade BAA for healthcare lives on Vertex.

If compliance is not a constraint, this dimension does not matter.

2. Orchestration complexity

Default to Anthropic if the agent needs to do any of the following: sub-agents calling sub-agents, dynamic skill loading per tenant, complex tool chaining where order matters, custom retry policies, or long-running deliberation with intermediate state.

Claude's tool-calling architecture and the way Anthropic structures sub-agents and skills make this category of work cleaner. We built the AI attribution engine for the Watch DOGS platform on Claude because the orchestration was the moat. The model has to read a partial customer record, match it against a multi-tenant organization directory, decide between auto-attributing or queueing for human review, and explain its reasoning. That is several decision layers in one agent. Claude handles it best.

3. Multimodality and image-first workflows

Default to Google for image-heavy production workloads. Default to OpenAI for image generation that needs text rendering.

Smile PreVue takes a patient's intra-oral photo and renders a treatment preview. That entire pipeline is Gemini 3 Pro on Vertex AI. Latency for the image transformation is roughly six seconds end to end. We tested the equivalent on OpenAI gpt-image and on a self-hosted multimodal model. Gemini was both faster and more accurate for the dental use case.

That said, when we needed to render a social-share card with a perfectly legible wordmark and tagline (the OG card on this website), we used OpenAI gpt-image-2. The text rendering is sharper than any of the alternatives this quarter. Different image use cases, different defaults.

4. Latency

If you need sub-200ms first-token latency or sub-second total response time, you almost certainly want to self-host with a smaller model. Hosted agents from any of the three labs add network hops and cannot guarantee that envelope.

For everything else, workflows where 1-to-6 seconds is fine, the three labs are within 30% of each other. Anthropic is currently slightly behind on raw model latency but ahead on tool-call efficiency. Google is fastest for multimodal. OpenAI is fastest for plain-text completions.

For our work, we rarely choose by latency unless we are deep into clinical or in-store real-time use cases.

5. Ecosystem and connector availability

Default to OpenAI if your team is already operating on the ChatGPT ecosystem. Defaults are sticky. If your buyers, developers, and users are already in the GPT muscle memory, switching providers introduces friction that often is not worth the marginal benefit.

Default to Anthropic if you are starting fresh and want the most opinionated, cleanest stack. The MCP standard, the skills system, and the Claude Code dev tools are tight enough that a greenfield team will move faster on Anthropic.

Default to Google if you are already in the Google Cloud ecosystem. The integration with BigQuery, Cloud Storage, Cloud SQL, and the rest of GCP is friction-free. Spinning up Vertex AI alongside the rest of your data stack is the lowest-overhead option.

Decision matrix

A working table we use internally:

| Constraint | Anthropic | Google | OpenAI | |---|---|---|---| | HIPAA / BAA | Roadmap | Default | Roadmap | | Sub-agents / complex orchestration | Default | Strong | Workable | | Image generation with text | Workable | Good | Default | | Multimodal vision in production | Strong | Default | Strong | | Image transformation pipelines | Workable | Default | Strong | | Greenfield, opinionated stack | Default | Strong | Workable | | Already in GCP | Strong | Default | Workable | | Already in GPT ecosystem | Workable | Workable | Default | | Low total cost on simple workflows | Strong | Strong | Strong |

None of the three is a wrong answer. The wrong answer is picking one because of a blog post you read in 2024 without re-evaluating the landscape.

What we built, and which lab we picked

  • Smile PreVue (live on App Store). Google Vertex AI Gemini 3 Pro under BAA. Compliance was the decisive constraint.
  • Howdy Dispatch (paying fleets in production). Anthropic Claude for the agent that classifies driver photos and routes them through the right workflow. Orchestration was the decisive constraint.
  • Watch DOGS multi-tenant platform (stealth, national scale). Anthropic Claude for AI-powered order attribution. The need for nuanced reasoning across multi-tenant ambiguity was the decisive constraint.
  • RunLink (B2B pilot in motion). OpenAI Assistants for the lightweight chat coach inside the iOS app. Speed of iteration and the existing dev team's familiarity were the decisive constraints.
  • This website. OpenAI gpt-image-2 for the OG card and the four case study heroes. Text rendering quality was the decisive constraint.

The studio is multi-provider because the problems are multi-shape.

What this means for you, today

If you are about to commit to a single lab for a multi-quarter AI program, do the comparison first. Spend the $250 on an hour with us. We will look at your actual constraints, compliance, latency, orchestration, ecosystem, and tell you which of the three labs is the right default for the next 12 months, and which one to keep an eye on.

If we are already wrong because the labs shipped something new yesterday, we will update the framework in front of you. That is what the call is for.

Book a consultation · Read the approach

hosted agentsAnthropicGoogleOpenAIVertex AIAI integrationbuild vs buy