April 13, 20266 min readmcp / llms / architecture

MCP vs function calling: when each one wins

Function calling and MCP solve overlapping problems with different tradeoffs. Here's the decision tree we use — and the costs that bite when you pick wrong.

Function calling (OpenAI's term; Anthropic calls it tool use, Google calls it function declarations) and Model Context Protocol (MCP) both extend an LLM with external capabilities. They overlap heavily — to the point where teams sometimes ship both for the same workflow without realising.

They aren't interchangeable. Here's how to decide.

The short version

Function calling is per-prompt: you describe tools in the API request, the model emits calls in the response, your code executes them. The tool definitions live in your application code.
MCP is per-client: a server exposes tools over a standard protocol (stdio or HTTP). The client (Claude Desktop, Claude Code, Cursor) connects to the server once. Any LLM session through that client gets the tools.

Function calling is for tools that belong to one application. MCP is for tools that should be available to many.

When function calling wins

You're shipping a feature inside your product. The tool calls are specific to your business logic — look_up_customer, refund_charge, update_subscription_tier. Each tool has tight coupling to your data model and your auth boundaries.

In this case: function calling. Tool definitions live in your codebase, get versioned with the rest of your code, integrate with your existing logging and error handling, and have access to your application context (database connections, request user, feature flags).

Wrapping these as an MCP server would mean a separate process, an extra hop, a new transport to debug — for no gain. The tools never leave the application.

When MCP wins

You want the same set of capabilities available across multiple AI clients. A developer should be able to invoke "GitHub: open PR" from Claude Desktop, Claude Code, Cursor, or a custom internal client without re-implementing the integration four times.

In this case: MCP. The protocol is the standardisation layer. Build the server once; clients consume it over stdio or HTTP.

This is why the big MCP servers — GitHub, Postgres, Slack, Notion — exist as MCP servers and not as function-calling SDKs. Their value compounds across every client that supports MCP.

When you need both

A common pattern in production: function calling for in-app tools, MCP for cross-system integrations.

The in-app tools (refund_charge, update_subscription_tier) sit in your application code as function-calling tools. The cross-system integrations (GitHub, Slack, Linear) come from MCP servers your application connects to at startup.

Concretely, in a Claude API call:

tools array in the request: your in-app function-calling tools
Plus tools your application proxies in from connected MCP servers (which your code is the MCP client for)

Anthropic's Claude API supports MCP clients natively now (as of late 2025), so you don't have to write the protocol layer yourself. Your application can register MCP servers and pass through their tools as if they were native.

The cost trap

MCP looks cheap. Each server is "just" another tool. But every connected MCP server's tool definitions get injected into every API call's prompt — and those definitions are tokens you pay for.

A typical MCP server defines 5-15 tools. Each tool has a name, description, input schema. Across a representative set of servers (GitHub, Postgres, Slack, Notion, Brave Search), you can easily reach 4-6K tokens of tool definitions per API call.

If you make 1000 calls per day with that setup, you're paying for 4-6M tokens of tool definitions before any real work happens. With Claude Sonnet 4.6 at $3/M input, that's $12-18/day just to advertise capabilities the model might not even use.

Mitigation: prompt caching. Tool definitions are the perfect cache candidate — they're stable across calls, they're large, they sit at the top of the prompt. Our prompt caching post walks through the math.

Without caching, the cost of running 5 MCP servers in production is meaningful. With caching, it drops to near-zero per call.

When MCP is wrong

Tools that need request-scoped context. MCP servers run in a separate process and don't have access to the calling user's session, request ID, or feature-flag state. If your tool needs those, it should live as a function-calling tool inside your application.
Tools with hot-path latency requirements. MCP adds an IPC hop. For tools called inside a user-facing latency budget, that's a real cost (5-20ms typical for stdio MCP, 30-100ms for HTTP MCP).
One-off prototypes. Don't reach for MCP when a 10-line function-calling tool would work. Standardisation pays off when you have 3+ clients consuming the same tool, not when you have one.

When function calling is wrong

Tools you want to expose to multiple AI clients without re-implementing. Build the MCP server once, install it in Claude Desktop, Cursor, and Claude Code from the same definition.
Capabilities that aren't tied to your application's data model. Web search, file system access, web scraping — these are generic enough that an MCP server is the right unit.

The decision

Is the tool tied to your application's data, auth, or request context? → Function calling.
Will the same tool be useful across multiple AI clients? → MCP.
Is it a generic capability (web, files, search)? → MCP, use an existing one if it exists.
Both apply? → Run both. Function calling for the in-app tools, MCP for the integrations.

Our MCPHub directory lists 14 production-ready MCP servers with copy-paste install configs for Claude Desktop, Claude Code, and Cursor. Each has its own detail page with the install snippet and common use cases.