AI coding agents need to interact with external tools — GitHub, databases, cloud providers, build systems. There are two approaches emerging: shell out to CLI tools, or use the Model Context Protocol (MCP).
Both call the same APIs under the hood. The difference is how the agent invokes them. And that difference has real architectural implications.
The agent runs shell commands just like a developer would:
Advantages:
LLMs were trained on billions of CLI examples. They know git, curl, grep, jq, aws intimately. The model doesn't need to learn a new schema — it already speaks this language natively.
CLI tools chain naturally with Unix pipes. Something like gh pr list | jq '.[] | .title' | grep "fix" runs in a single LLM call. It's composable by design.
No schema loading means no wasted context window. The agent doesn't need to load tool descriptions before starting work.
Disadvantages:
Shell escaping is a nightmare. One misquoted variable and the command fails or worse — does something unintended. Output parsing is fragile. The agent has to parse human-readable text that wasn't designed for machine consumption.
Auth is shared. CLI agents inherit a single token or credential set. You can't revoke one user's access without rotating everyone's credentials.
Every command spawns a new process. There's no persistent connection, no session state. Stateful workflows require the agent to manually track context between calls.
The Model Context Protocol defines a structured interface between the agent and tools:
Advantages:
Structured input and output. No shell escaping, no text parsing. The response is typed JSON that the agent can work with directly.
Per-user OAuth. MCP supports proper authentication flows where each user has their own token. Revoking one user's access doesn't affect others.
Persistent sessions. MCP keeps a running server with connection pooling. Stateful workflows are natural — the server maintains context between calls.
Enterprise governance built in. Structured audit logs, access revocation, monitoring — all part of the protocol, not bolted on.
Disadvantages:
The full JSON schema (tool names, descriptions, parameter types) must be loaded into the context window before any work begins. For a server with 50 tools, that's thousands of tokens consumed before the agent does anything useful.
No native chaining. Each tool call is a separate round-trip. The agent must orchestrate calls sequentially — there's no pipe equivalent.
The model encounters MCP schemas for the first time at runtime. Unlike CLI patterns it was trained on, MCP requires in-context learning for every new server.
| Dimension | CLI | MCP |
|-----------|-----|-----|
| Token cost | Low — no schema loading | High — full schema in context |
| Native knowledge | Trained on billions of examples | Custom JSON, learned at runtime |
| Composability | Unix pipes, single call | Separate orchestrated calls |
| Multi-user auth | Shared token, all-or-nothing | Per-user OAuth |
| Stateful sessions | New process per command | Persistent connection |
| Audit & governance | ~/.bash_history | Structured logs, access control |
| Error handling | Exit codes + stderr | Typed error responses |
| Setup complexity | Zero — tools already installed | Server deployment required |
CLI wins when:
grep | sort | head)MCP wins when:
In practice, the best agent setups use both. CLI for quick, composable tasks where the model's training data gives it fluency. MCP for structured, multi-user workflows where governance matters.
Developer's local machine:
→ CLI agents (fast, composable, zero setup)
Team's shared agent platform:
→ MCP servers (auth, audit, governance)
Production automation:
→ MCP with structured monitoring and access control
The mistake is treating this as a binary choice. CLI and MCP solve different problems at different layers of the stack.
CLI for fluency, MCP for governance. Use both.
— blanho
Every architecture solves one problem and creates three new ones. Here's what nobody tells you before you commit.
You don't have Netflix's problems. You have 3 developers and a Postgres database.
You type a prompt, an AI writes code. Simple, right? Behind the scenes, it's five systems arguing with each other.
# Agent runs these commands directly
gh pr list --state open --json title,number
aws s3 ls s3://my-bucket/
kubectl get pods -n production -o json
docker ps --format '{{.Names}}: {{.Status}}'# One-liner that chains 3 tools — no orchestration needed
gh api repos/org/repo/pulls --jq '.[].title' | sort | head -5{
"tool": "github.list_pull_requests",
"parameters": {
"repo": "org/repo",
"state": "open",
"limit": 5
}
}{
"result": {
"pull_requests": [
{"number": 42, "title": "Fix auth bug", "author": "alice"},
{"number": 43, "title": "Add caching", "author": "bob"}
]
}
}