Autonomous AI Agents — SkillMix

Not one agent. A team of them. A swarm of specialised AI agents working together inside your stack — researchers, drafters, reviewers, dispatchers — each one doing one thing well, each one swappable as the frontier moves. Fixed price, fixed date, no-cure-no-pay.

The observation — most teams buy one big agent. Then the frontier moves.

The 2024 pattern was a single large model wired to your tools, told to do everything. It mostly worked. Then a better model shipped. Then another. The thing you paid for last summer looks like a toy now.

The old way — one agent, one model, one god-prompt

Breaks when the model changes.
Bad at any one thing in particular.
Full rebuild every 6 months.

The SkillMix way — a team of specialists, each one swappable

One agent per job, best model for each.
Swap one agent when something better ships.
System stays. Engine underneath keeps improving.

You don't buy a frontier agent. You buy the posture that keeps it on the frontier.

Anatomy of a swarm — seven roles in conversation

Not every swarm uses every role. Simple jobs run with three. Complex ones run with fifteen. But almost every swarm we build is composed from this shortlist of archetypes.

01 · Scout

Fetches. Searches the web, your knowledge base, the CRM, anywhere the answer might live. Doesn't judge what it finds; just returns it. Tool-heavy · parallel · cheap model.

02 · Archivist

Remembers. Maintains the swarm's shared memory: past decisions, client context, canonical facts. Asked before any new research runs. Long-context · retrieval.

03 · Drafter

Writes. Proposals, replies, summaries, reports. Uses the Scout's findings and the Archivist's memory to produce the first version. Frontier model · style-tuned.

04 · Critic

Reviews. Fact-checks every claim, stress-tests the logic, flags what doesn't hold up. The swarm's skeptic. Never ships anything unreviewed. Different model · lower temp.

05 · Router

Dispatches. Decides which role handles what, what runs in parallel, what escalates. The swarm's traffic controller. Small model · fast · deterministic.

06 · Executor

Acts. Writes to the CRM, sends the email, books the meeting, files the ticket. The only role that touches your real systems. Narrow and audited. Scoped permissions · audited.

07 · Escalator

Knows when to call a human. Watches for edge cases, confidence drops, policy boundaries. Hands off with full context attached. Rules + judgment · Slack / email.

08 · A role that doesn't exist yet

Every serious swarm we've built introduced at least one role we hadn't named before. Yours will too.

The promise — same swarm, different engines underneath

Here's what you're actually buying: the system stays stable. The models underneath get better.

Nov '25 — baseline. The swarm we shipped late last year. Solid for its time. Routing ran on a frontier model because small models weren't good enough yet. Archivist leaned on a vector DB. Same five core roles as today.
Jan '26 — model swap. Router → Haiku 4.5: 40× cheaper dispatch. Scout → Gemini 3 Flash: native web grounding with citations out of the box, three scaffolds deleted. System untouched.
Feb '26 — new role. Operator (computer-use, Anthropic) shipped on first pilot swarm — for legacy systems without APIs.
Mar '26 — retired. Three vector databases retired in favour of native long-context (1M tokens).
Apr '26 — model swap. Drafter → GPT-5.5 (better long-form voice consistency). Critic → Opus 4.7 (different provider on purpose). Same org chart. Newer engines.
Watching: open-weight, local-first swarms for clients with hard data-residency constraints.

What ships, what's coming, what we'll say no to

We only ship what works. Here's the honest current map.

Ships today

Document drafting with review loops
Research synthesis across 50+ sources
Structured data extraction from messy inputs
Inbox triage with judgment, not rules
Workflows with clear escalation points
Narrow desktop automation (computer use)

Close, watching

Multi-hour autonomous execution
Negotiation with real money on the line
On-device swarms for regulated data
Reliable voice agents for complex intake
Cross-session learning without retraining

We'll tell you no

Irreversible decisions without human sign-off
Legal or medical advice as the system of record
Anything where a 1% error rate is catastrophic
Replacing judgment your team is paid for
"Agentic" for its own sake. Just use a script.

How it goes — four phases, plus one that never ends

Phase 01 · Scope Sprint — Paid · 1–2 weeks · €1,500 – €2,500

We watch the work being done by the humans who do it now. Then we write the swarm's org chart: which roles it needs, which models power each one, where it escalates, what "done" looks like.

Role inventory · model selection (per role) · escalation policy · written acceptance criteria.
Worst case: you walk away with an org chart for a team you might never hire. Still useful.

Phase 02 · Prototype — No cure, no pay · ~2 weeks · ~€12,000

A working swarm, on real data, doing real work in front of you. Rough edges visible on purpose. If it doesn't clear the bar we agreed on, you don't pay.

Live swarm, end-to-end · your actual documents, your actual inbox · measurable outcome vs. human baseline · go / no-go decision at the end.
If the swarm can't keep up with the humans, we stop here.

Phase 03 · Harden — Fixed price · 4–8 weeks · €15,000 – €25,000

Prototype becomes production. Guardrails, audit logs, approval queues, budget caps, rollback. Weekly demos. Every role model-swappable from day one.

Full audit log, every message, every tool call · approval queues + kill switch per agent · budget caps and blast-radius limits · acceptance gate before go-live.

Phase 04 · Run & Upgrade — Frontier retainer · monthly

This is the phase the rest of the industry doesn't offer. We track the frontier, swap models when something meaningfully better ships, and re-run your acceptance tests before any change goes live. No rebuild fees.

Continuous frontier tracking · model swaps on a cadence, not a contract · acceptance tests re-run on every swap · monthly changelog scoped to your swarm.

Pricing summary

Phase	Pricing	No-cure-no-pay
Scope Sprint	€1,500 – €2,500	No
Prototype	~€12,000	Yes
Harden (build)	€15,000 – €25,000	Yes
Frontier retainer	monthly, scoped per swarm	—

Agent work costs more than pure automation because the swarm runs on frontier inference; the retainer covers continuous model tracking and swaps on your behalf.

Frequently asked

Aren't AI agents still too unreliable? A single large agent — often, yes. A swarm of specialised agents with a Critic role and human escalation — routinely no. The reliability story of 2023 is not the reliability story of 2026. We only ship what clears the acceptance gate we wrote with you.

What happens when a better model ships a week after you deliver? That's what the frontier retainer is for. We track model releases, swap the affected role, re-run your acceptance tests, and only promote the change if it clears them. No rebuild fees. If you don't want the retainer, you still get a fully model-swappable architecture — you just do the swaps yourself.

Who owns the IP and the code? You do. Code, prompts, role definitions, memory, audit logs. All yours, handed over, documented. You can cancel the retainer and keep running. We don't lock swarms to our infra.

Which models do you actually use? Whichever is best for the role right now. Today that usually means a mix: a frontier model for the Drafter, a different frontier model for the Critic (different provider on purpose, so failure modes don't rhyme), a small fast model for the Router, and open-weight or on-prem for anything touching regulated data. This mix will be different in six months.

How is this different from Automation? Automation handles work where the shape is known and the rules are stable. Agents handle work where every case is different and the answer requires judgment. If you can describe the steps, you want Automation. If you can only describe the outcome, you want a swarm.

What about safety and data? Every tool call is audited. Every agent has scoped permissions. Every action that touches money, customers, or irreversible state routes through an approval queue by default. EU-hosted standard. On-prem and local-swarm for regulated data. Kill switch per agent.