Filed under: Cloud Engineering · AI Infrastructure · Local Lab
Picking Up Where We Left Off
In Part 1, we got Bifrost running locally on your Mac, wired up Ollama with qwen3.5, and verified the whole stack was humming. Requests flowing through the gateway, streaming working, tool calling confirmed.
Now we’re going deeper.
This post is about MCP — the Model Context Protocol. If Part 1 was about giving your AI a reliable phone line, Part 2 is about giving it hands. By the end you’ll have a local MCP server running on your machine that exposes real tools, connected through Bifrost, so qwen3.5 can actually do things on your behalf — check system info, run safe shell commands, do math — all without touching the cloud.
What Is MCP?
MCP stands for Model Context Protocol. It’s an open standard, originally released by Anthropic in late 2024 and now governed by the Linux Foundation, that defines how AI models discover and call external tools and data sources.
Before MCP, every AI framework had its own tool integration format. OpenAI’s function calling, LangChain’s tool abstraction, Anthropic’s tool use spec — they all solved the same problem differently. If you wanted your database lookup to work with multiple AI clients, you wrote multiple integrations.
MCP fixes that. Write one server, and any MCP-compatible client can use it — Claude Desktop, your own agent, Bifrost, whatever. It’s the USB-C moment for AI tooling.
There are two transport modes worth knowing:
- stdio — local, process-to-process communication via stdin/stdout. The gateway spawns your script as a subprocess. No ports, no networking, no auth needed. This is what we’re building today.
- HTTP/SSE — remote, for servers others connect to over a network. That’s a future post.
The Architecture
The flow looks like this:
Your app → Bifrost (port 8080) → spawns MCP server via stdio → tool runs locally → result returned
Bifrost acts as both the LLM gateway and the MCP client. It spawns your local MCP server process, hands it tool requests from qwen3.5, and passes results back. Your app never needs to know any of this is happening — it just sends a chat request and gets a response back that includes tool results.
Prerequisites
You need everything from Part 1 still running:
- Node.js 18+ (
node --version) - Ollama running with qwen3.5 (
ollama serve) - Bifrost running (
npx -y @maximhq/bifrost)
Plus one new package — the official MCP SDK:
mkdir ~/my-local-mcp && cd ~/my-local-mcp
npm init -y
npm install @modelcontextprotocol/sdk zod
Update package.json to add "type": "module" so we can use ES module syntax:
{
"name": "my-local-mcp",
"version": "1.0.0",
"type": "module",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.11.0",
"zod": "^3.24.4"
}
}
Step 1: Build the MCP Server
Create server.js in your ~/my-local-mcp/ folder:
#!/usr/bin/env node
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { execSync } from "child_process";
import os from "os";
const server = new McpServer({
name: "local-host-tools",
version: "1.0.0",
});
// Tool 1: system info
server.tool(
"get_system_info",
"Returns CPU, memory, and OS info from the local machine",
{},
async () => {
const info = {
platform: os.platform(),
arch: os.arch(),
hostname: os.hostname(),
totalMemoryGB: (os.totalmem() / 1e9).toFixed(2),
freeMemoryGB: (os.freemem() / 1e9).toFixed(2),
cpus: os.cpus().length,
uptime_hours: (os.uptime() / 3600).toFixed(1),
};
return {
content: [{ type: "text", text: JSON.stringify(info, null, 2) }],
};
}
);
// Tool 2: safe shell commands (allowlisted only)
server.tool(
"run_command",
"Runs a read-only shell command and returns output",
{ command: z.string().describe("The shell command to run") },
async ({ command }) => {
const allowed = ["df", "du", "ls", "pwd", "whoami", "date", "uptime", "ollama"];
const base = command.trim().split(" ")[0];
if (!allowed.includes(base)) {
return {
content: [{ type: "text", text: `Command '${base}' not in allowlist.` }],
isError: true,
};
}
try {
const output = execSync(command, { encoding: "utf8", timeout: 5000 });
return { content: [{ type: "text", text: output }] };
} catch (err) {
return { content: [{ type: "text", text: `Error: ${err.message}` }], isError: true };
}
}
);
// Tool 3: calculator
server.tool(
"calculator",
"Performs basic arithmetic",
{
operation: z.enum(["add", "subtract", "multiply", "divide"]),
a: z.number(),
b: z.number(),
},
async ({ operation, a, b }) => {
const results = {
add: a + b,
subtract: a - b,
multiply: a * b,
divide: b !== 0 ? a / b : "cannot divide by zero"
};
return { content: [{ type: "text", text: String(results[operation]) }] };
}
);
// ⚠️ Critical: NEVER use console.log() in a stdio MCP server
// It writes to stdout and corrupts the JSON-RPC stream
// Always use console.error() for any debug output
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("MCP server running on stdio");
Three tools exposed:
get_system_info— reads OS, CPU, memory stats from your machinerun_command— runs safe, allowlisted shell commandscalculator— basic arithmetic
The one rule you cannot break with stdio servers: never use
console.log(). It writes to stdout, which is the same pipe the MCP protocol uses for JSON-RPC messages. One strayconsole.log()will corrupt the stream and silently break everything. Useconsole.error()for all debug output — it goes to stderr instead.
Step 2: Test the Server in Isolation
Before connecting it to Bifrost, verify it works on its own using the MCP Inspector:
cd ~/my-local-mcp
npx @modelcontextprotocol/inspector node server.js
This opens a browser UI at localhost:5173 where you can call each tool interactively and see the raw JSON responses. If get_system_info returns your machine’s stats and calculator returns correct math — your server is good.
Step 3: Register the MCP Server in Bifrost
Make sure Bifrost is running, then register your local server via the API:
curl -X POST http://localhost:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "local-host-tools",
"connection_type": "stdio",
"stdio_config": {
"command": ["node", "/Users/YOUR_USERNAME/my-local-mcp/server.js"],
"args": []
}
}'
Replace /Users/YOUR_USERNAME/my-local-mcp/server.js with the actual absolute path to your file. You can get it by running pwd inside the my-local-mcp folder.
Alternatively, add it through the Bifrost web UI at http://localhost:8080:
- Go to MCP Clients in the sidebar
- Click Add MCP Client
- Set connection type to stdio
- Set command to
nodeand args to the full path of yourserver.js
Verify Bifrost picked it up:
curl http://localhost:8080/api/mcp/clients
You should see local-host-tools in the response.
Step 4: Let qwen3.5 Use Your Tools
Now the fun part. Send a request that lets qwen3.5 reach for your local tools:
Ask about your machine
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3.5:latest",
"messages": [
{"role": "user", "content": "What are the system specs on this machine? Use the get_system_info tool."}
],
"mcp_servers": ["local-host-tools"]
}'
Run a shell command
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3.5:latest",
"messages": [
{"role": "user", "content": "How much disk space is available? Use the run_command tool with df -h."}
],
"mcp_servers": ["local-host-tools"]
}'
Do some math
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3.5:latest",
"messages": [
{"role": "user", "content": "What is 1,847 multiplied by 93? Use the calculator tool."}
],
"mcp_servers": ["local-host-tools"]
}'
qwen3.5 will reason about which tool to use, call it through Bifrost, get the result, and incorporate it into its response — all running completely locally, no internet required.
What Just Happened?
Let’s be clear about what’s actually going on here because it’s worth appreciating:
- Your app sent a plain chat request to Bifrost
- Bifrost forwarded it to qwen3.5 via Ollama with the available tool schemas attached
- qwen3.5 decided which tool to call and with what arguments
- Bifrost spawned your
server.jssubprocess and sent it the tool call over stdio - Your server ran the tool, returned the result
- Bifrost handed the result back to qwen3.5
- qwen3.5 incorporated the result into a natural language response
- Your app got back a normal chat response
Every step of that happened on your laptop. No cloud. No API keys. No data leaving your machine.
Extending Your MCP Server
This is where it gets interesting for real use cases. The pattern scales to anything you can write a function for. A few ideas that make sense for a home lab:
Check which Ollama models are loaded:
server.tool(
"list_ollama_models",
"Lists all locally available Ollama models",
{},
async () => {
const output = execSync("ollama list", { encoding: "utf8" });
return { content: [{ type: "text", text: output }] };
}
);
Read a local file:
server.tool(
"read_file",
"Reads the contents of a local text file",
{ path: z.string().describe("Absolute path to the file") },
async ({ path }) => {
const content = fs.readFileSync(path, "utf8");
return { content: [{ type: "text", text: content }] };
}
);
Hit a local API or service:
server.tool(
"check_bifrost_health",
"Checks if Bifrost gateway is healthy",
{},
async () => {
const res = await fetch("http://localhost:8080/health");
const text = await res.text();
return { content: [{ type: "text", text: text }] };
}
);
The sky’s the limit. Any tool you can wrap in a function, you can expose to your local AI.
What’s Next
We’ve now built a complete local AI stack from scratch:
- Bifrost as the gateway — routing, observability, fallbacks
- Ollama + qwen3.5 as the model — offline, free, surprisingly capable
- MCP as the tool layer — giving the model real access to your environment
The natural next steps from here are building HTTP-based MCP servers (so tools can be shared across machines or teams), exploring multi-agent patterns where one model orchestrates others, and looking at how this same architecture maps to production — because the patterns are identical, just at a different scale.
More field notes coming. In the meantime — go build something and break things.
Anthony Mineer is a Senior Manager of Cloud Engineering writing about cloud infrastructure, AI, and platform engineering at anthonymineer.me.