Methodology · 2026-05-25 · 8 min read
How to integrate AI into a business workflow without burning your first six months
How to integrate AI into a business workflow: a studio's operator-level breakdown of what the work actually looks like, where teams get stuck, and the thin-slice path that ships.
Most AI integration projects do not fail at the AI part. They fail at the integration part. The model works. The demo is great. Then six months pass and nothing is in production, because the team underestimated the part of the project that has nothing to do with prompts or tokens.
This post is what we tell founders and operators when they ask, "we want to integrate AI into our business, where do we start?" It is the same frame we have used to ship four platforms in production, and it is the same frame we use when we sit down with a new client at the studio. Nothing in here is proprietary, the methodology is the marketing.
If you want the bigger frame, here is our page on AI integration for business. This post zooms into the methodology under it.
The 10 percent myth
The first thing to get clear on is that the AI itself is roughly ten percent of an AI integration project. The other ninety is unfashionable work. Data orchestration, identity, multi-tenant isolation, eval harness, observability, fallback behavior, audit trail, billing, the integrations that move the AI output back into the system of record. None of it is novel. All of it has to be right.
We say this because the standard onboarding conversation for a small business going down the AI path looks like this. The founder has played with a chat model. They have an idea. They want to ship it. They are convinced the hard part is the prompt or the model choice. By month three, they have a working prompt, a brittle script, no eval, no observability, no error handling, and a champion at the company who has personally become the failure mode of the whole system.
The work that takes a thin AI capability to a production-grade workflow is mostly the boring part. That is the work we orient our engagements around.
The methodology: Interview, Analyze, Execute
We run almost every engagement through the same three phases. The names are unsexy on purpose. There is no acronym. The point is to keep the team honest about which phase they are actually in.
Interview
Before we look at the stack, we sit with the people doing the work. Operations leads, customer service reps, dispatchers, treatment coordinators, store managers, the actual humans whose calendar will change if AI lands here.
Two questions matter most in the interview phase. The first is "what is the most repetitive failure mode in your week?" The second is "if a coworker took this off your plate, what would they have to know?" Those two questions surface the real constraint faster than any consulting deck.
The thing we are looking for is not "where could AI go?" The right question is "where is the work most clearly bottlenecked by something a model can credibly do today?" Those are different questions, and the second one is the one with a real shipping date.
Analyze
The analyze phase is where we look at the data, the systems, and the workflow honestly. Three questions drive this phase.
- What data exists, where does it live, and is it clean enough to use? In our experience, this is where most projects quietly die. There is no data, or it is locked inside a vendor system, or it is technically present but inconsistent enough that no model will perform reliably on it.
- What are the existing integrations, and what does the audit trail need to look like? AI that produces an answer with no provenance is unshippable in any regulated or money-touching workflow.
- What is the failure cost? A wrong answer that wastes a minute is a different product than a wrong answer that costs $5,000. The eval bar and the fallback behavior come from this.
The analyze phase is also where we decide which AI provider fits the problem. We are deliberately multi-provider. Anthropic Claude is excellent at long-context reasoning and complex orchestration. Google's Vertex AI Gemini is the right call for regulated workloads (we run Smile PreVue on Vertex under a BAA for that reason). OpenAI is often the cleanest fit for voice, image, and certain agentic patterns. The right answer is rarely "the lab I like personally."
Execute
The execute phase is where the temptation to over-build is highest, and where most teams quietly burn the project. The discipline we hold is a thin slice in week one.
A thin slice is the smallest end-to-end flow that proves the integration is real, in production, with one specific user in mind. Not a sandbox. Not a notebook. The user opens the existing app or admin tool, presses the button, AI runs, the output lands in the system, and the audit trail records what happened. Even if the slice covers ten percent of the eventual scope, it has to be live.
The reason we hold this line is that the work changes after the thin slice ships. The eval bar gets concrete. The edge cases become real instead of imagined. Stakeholders see something that works in their actual environment, which de-risks the rest of the build politically. Every project we have ever shipped started with a thin slice that looked too small.
A worked example: how AI rate-confirmation intake landed at Howdy Dispatch
Howdy Dispatch is the trucking dispatch platform we operate. It runs paying fleets on $149 to $745 per month tiers. The first real AI feature we shipped on it was rate-confirmation parsing.
In the interview phase, we sat with dispatchers running 10-to-50-truck fleets. The repetitive failure mode was not "trucks are in the wrong place." It was "I retype the broker rate confirmation into the dispatch system 30 times a day." That was the constraint. The dispatcher's day was bottlenecked on a flat data-entry surface, not on judgment.
In the analyze phase, the question became where to run it. The rate confirmation is a PDF with a fairly stable structure but enough variance across brokers (header, footer, letterhead, dispatch contact section) that a templated extraction was not going to hold. The data does not have to be HIPAA-grade, but the audit trail does have to be clean, because the parsed numbers drive what eventually gets invoiced. We picked Vertex AI for the structured PDF extraction with Google address validation downstream. That decision was about reliability and integration surface, not about lab loyalty.
In the execute phase, the thin slice was the upload page, the parsed-fields review screen, and the save-into-existing-job flow. No bulk upload. No retroactive backfill. No fancy admin tooling. The dispatcher uploads a PDF, sees the parsed fields, fixes any that look wrong, saves the load. That was week one.
The unsexy ninety percent (the eval pass on a hold-out set of broker PDFs, the per-tenant audit log, the structured fallback when the model returns junk, the monitoring on cost per parse) is what took the rest of the build. None of it is glamorous. All of it is what makes it production-grade.
What goes wrong when teams skip the methodology
When a team skips the interview phase, they end up building something a vendor pitched them on. The AI works. Nobody uses it. The chosen workflow was not actually the bottleneck.
When a team skips the analyze phase, they ship a flow that breaks the first time real data hits it. The model performed well on the cherry-picked five examples. It performs at 60 percent on the actual messy distribution, which is unshippable for anything that touches money or compliance.
When a team skips the execute phase, they spend six months building the perfect version of a feature they have never seen running on a real user. Scope balloons. The thin slice gets postponed for "after we figure out the schema." Nothing ever ships. This is the most common failure mode we see.
The methodology is not magic. It is a discipline that forces the team to talk to users before building, look at data before designing, and ship something small before scaling it.
What "AI in production" actually requires
If you are evaluating whether a team is ready to ship an AI workflow into production, the checklist is roughly this.
- Is there an eval harness that runs on a real, representative dataset, not a curated demo set?
- Is there an observability layer that tells you the cost, latency, and error rate per call?
- Is there a fallback when the model returns garbage, and does the user know the fallback fired?
- Is there an audit log of every model call, with the prompt, the response, and the user it ran for?
- Is the system multi-tenant if it needs to be, with strict tenant isolation in the data layer?
- Is the integration two-way? Does the AI output flow back into the system of record automatically, or does someone retype it?
If the answer to any of these is "we will figure that out later," the project is not ready for production. It is ready for an internal demo, which is a different thing.
What we say no to
We are a small studio. We say no to a lot of work. Two patterns we will not take.
We do not take engagements where the team wants us to ship a chatbot on top of a codebase nobody has ever audited. The chatbot is fine, the underlying mess will surface in week two, and the AI gets blamed for failures that have nothing to do with the AI.
We do not take engagements where the success metric is "use AI." The metric has to be operational. Cycle time. Acceptance rate. Loads per dispatcher per day. Tickets per CS rep per shift. If the success metric is the technology, the project will not survive its first review.
Where to start
If you are sitting on the other side of this and trying to figure out where AI integration belongs in your business, the honest answer is that we cannot tell you from a blog post. The interview phase has to happen first. That conversation is what we sell.
If you want to start that conversation, the way in is yikesdude.com/contact. If you want to read the longer version of how we run engagements, it lives at yikesdude.com/approach. Either path lands at the same place. A small first conversation, a real piece of work shipped, a methodology you can run with or without us afterwards.
Liked this?
Tell us what is broken. We’ll tell you what the first week looks like.
Next read →
Hosted agents across Anthropic, Google, and OpenAI, which one for your business