R&D operations
How to evaluate an AI vendor: a fractional operator's checklist
A practical checklist for evaluating an AI vendor — separating capability from pitch across technical fit, delivery evidence, data and integration, commercial terms, and exit.
Choosing an AI vendor is where a lot of money goes wrong, because the pitch and the capability look identical from the buyer’s chair. AI vendors are unusually good at demos and unusually varied in what they can actually deliver in your real conditions. This is the checklist an operator runs — useful whether or not you bring one in, because the point is to buy capability, not story.
Are you buying capability or buying a story?
Hold that question through every section below. A polished narrative, an impressive demo, and confident answers are table stakes, not evidence. What you’re testing for is whether this vendor can deliver your outcome under your constraints — which is a different and harder thing than whether they can present well.
1. Technical fit
- Can they engage with the specifics of your problem, or do they redirect to their generic capabilities?
- Do they understand where your problem is genuinely hard — your data quality, your reliability bar, your edge cases — or do they wave it away?
- Is their proposed approach appropriate, or are they applying a one-size template?
- Red flag: the demo works beautifully on clean, curated data and they get vague about your messy reality.
2. Delivery evidence
- Can they show comparable work they’ve actually delivered — not logos, but real projects with real outcomes?
- Will they let you speak to a reference doing something genuinely similar?
- Do their case studies survive follow-up questions, or dissolve under specifics?
- Red flag: lots of impressive-sounding experience that gets thinner the more precisely you ask.
3. Data and integration reality
- Have they seriously addressed how they’ll handle your data — its quality, quantity, access, and privacy constraints?
- Do they understand what integrating with your actual systems and workflows involves?
- Are they honest about what your data can and can’t support, or do they promise results your data probably can’t deliver?
- Red flag: confident performance claims with no serious engagement with your data situation.
4. Honesty about uncertainty and failure
- Will they tell you what might not work, and under what conditions?
- Do they distinguish what’s proven from what’s genuinely uncertain in your case?
- Or does everything sound guaranteed?
- Green flag: a vendor who names real risks and failure conditions is more trustworthy than one who claims none. Certainty is a warning sign in genuinely uncertain work.
5. Commercial terms and incentives
- Is the pricing structured so they’re incentivised toward your outcome, or just toward billing?
- Are milestones and deliverables concrete and tied to payment, or vague?
- What happens commercially if the work underdelivers?
- Red flag: terms that pay out fully regardless of whether the thing actually works.
6. IP and exit
- Who owns what’s built? (For anything you’ll build a product on, this must be unambiguous — and default terms may not favour you.)
- Can you leave without being trapped — is there lock-in through proprietary formats, dependencies, or inaccessible models?
- If the relationship ends, what do you keep and what can you continue?
- Red flag: ambiguity on ownership, or an architecture that quietly makes you dependent on them forever.
Run it as a comparison, not a verdict
Score vendors against these six areas side by side rather than judging one in isolation — the gaps become obvious in comparison in a way they never do alone. And weight delivery evidence and honesty about uncertainty most heavily; those two predict real outcomes better than any demo.
Why an operator does this well
The reason vendor evaluation is core operator work is independence and fluency: an operator can follow the technical substance well enough to test the claims, and has no stake in any particular vendor winning. That combination — technical enough to challenge the pitch, independent enough to be honest about it — is exactly what a buyer under a good sales process usually lacks. Whether you run this checklist yourself or have someone run it for you, the discipline is the same: make the vendor prove capability, not just perform it.
Related: Five signs your AI project needs an outside operator · What is a fractional AI operator? · When to hire a fractional AI operator instead of a full-time AI lead