Filed under: Cloud Engineering · AI Infrastructure · Local Lab
Where Part 1 left off
In Part 1 we got Bifrost running locally, wired up Ollama with qwen3.5, and confirmed the stack end to end. Requests through the gateway, streaming, tool calling.
This post adds MCP, the Model Context Protocol. Part 1 gave the model a reliable connection. This part gives it tools. By the end you’ll have a local MCP server exposing real capabilities (system info, allowlisted shell commands, math) connected through Bifrost so qwen3.5 can run them. Still no cloud.
What MCP is
Model Context Protocol is an open standard, released by Anthropic in late 2024 and now under the Linux Foundation, that defines how models discover and call external tools and data sources.
Before it, every framework had its own tool format. OpenAI function calling, LangChain’s tool abstraction, Anthropic’s tool use spec all solved the same problem differently. Supporting multiple clients meant writing multiple integrations.
MCP collapses that to one. Write a server once and any MCP-compatible client can use it: Claude Desktop, your own agent, Bifrost, anything.
Two transport modes matter here:
- stdio. Local, process-to-process over stdin/stdout. The gateway spawns your script as a subprocess. No ports, no networking, no auth. This is what we’re building.
- HTTP/SSE. Remote, for servers others reach over a network. Another post.
The architecture
Your app → Bifrost (port 8080) → spawns MCP server via stdio → tool runs locally → result returned
Bifrost is both the LLM gateway and the MCP client. It spawns your server process, hands it tool requests from qwen3.5, and passes results back. The app just sends a chat request and gets a response that includes the tool results.
Prerequisites
Everything from Part 1 still running:
- Node.js 18+ (
node --version) - Ollama with qwen3.5 (
ollama serve) - Bifrost (
npx -y @maximhq/bifrost)
Plus the MCP SDK:
mkdir ~/my-local-mcp && cd ~/my-local-mcp
npm init -y
npm install @modelcontextprotocol/sdk zod
Add "type": "module" to package.json for ES module syntax:
{
"name": "my-local-mcp",
"version": "1.0.0",
"type": "module",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.11.0",
"zod": "^3.24.4"
}
}
Step 1: Build the server
Create server.js in ~/my-local-mcp/:
#!/usr/bin/env node
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { execSync } from "child_process";
import os from "os";
const server = new McpServer({
name: "local-host-tools",
version: "1.0.0",
});
// Tool 1: system info
server.tool(
"get_system_info",
"Returns CPU, memory, and OS info from the local machine",
{},
async () => {
const info = {
platform: os.platform(),
arch: os.arch(),
hostname: os.hostname(),
totalMemoryGB: (os.totalmem() / 1e9).toFixed(2),
freeMemoryGB: (os.freemem() / 1e9).toFixed(2),
cpus: os.cpus().length,
uptime_hours: (os.uptime() / 3600).toFixed(1),
};
return {
content: [{ type: "text", text: JSON.stringify(info, null, 2) }],
};
}
);
// Tool 2: safe shell commands (allowlisted only)
server.tool(
"run_command",
"Runs a read-only shell command and returns output",
{ command: z.string().describe("The shell command to run") },
async ({ command }) => {
const allowed = ["df", "du", "ls", "pwd", "whoami", "date", "uptime", "ollama"];
const base = command.trim().split(" ")[0];
if (!allowed.includes(base)) {
return {
content: [{ type: "text", text: `Command '${base}' not in allowlist.` }],
isError: true,
};
}
try {
const output = execSync(command, { encoding: "utf8", timeout: 5000 });
return { content: [{ type: "text", text: output }] };
} catch (err) {
return { content: [{ type: "text", text: `Error: ${err.message}` }], isError: true };
}
}
);
// Tool 3: calculator
server.tool(
"calculator",
"Performs basic arithmetic",
{
operation: z.enum(["add", "subtract", "multiply", "divide"]),
a: z.number(),
b: z.number(),
},
async ({ operation, a, b }) => {
const results = {
add: a + b,
subtract: a - b,
multiply: a * b,
divide: b !== 0 ? a / b : "cannot divide by zero"
};
return { content: [{ type: "text", text: String(results[operation]) }] };
}
);
// Critical: never use console.log() in a stdio MCP server.
// It writes to stdout, the same pipe the JSON-RPC stream uses, and corrupts it.
// Use console.error() for any debug output.
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("MCP server running on stdio");
Three tools: get_system_info reads OS, CPU, and memory stats; run_command runs allowlisted shell commands; calculator does arithmetic.
The one rule you can’t break with stdio servers: never
console.log(). It writes to stdout, which is the same pipe MCP uses for JSON-RPC. One stray log call corrupts the stream and breaks everything silently. Send all debug output to stderr withconsole.error().
Step 2: Test the server alone
Before touching Bifrost, verify the server works on its own with the MCP Inspector:
cd ~/my-local-mcp
npx @modelcontextprotocol/inspector node server.js
That opens a UI at localhost:5173 where you can call each tool and inspect the raw JSON. If get_system_info returns your stats and calculator does the math, the server is good.
Step 3: Register the server in Bifrost
With Bifrost running, register the server via the API:
curl -X POST http://localhost:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "local-host-tools",
"connection_type": "stdio",
"stdio_config": {
"command": ["node", "/Users/YOUR_USERNAME/my-local-mcp/server.js"],
"args": []
}
}'
Replace the path with your absolute path (run pwd in the my-local-mcp folder to get it).
Or use the web UI at http://localhost:8080:
- MCP Clients in the sidebar
- Add MCP Client
- Connection type stdio
- Command
node, args set to the full path ofserver.js
Confirm Bifrost registered it:
curl http://localhost:8080/api/mcp/clients
You should see local-host-tools in the response.
Step 4: Let qwen3.5 use the tools
Send requests that let the model reach for your local tools.
Ask about the machine
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3.5:latest",
"messages": [
{"role": "user", "content": "What are the system specs on this machine? Use the get_system_info tool."}
],
"mcp_servers": ["local-host-tools"]
}'
Run a shell command
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3.5:latest",
"messages": [
{"role": "user", "content": "How much disk space is available? Use the run_command tool with df -h."}
],
"mcp_servers": ["local-host-tools"]
}'
Do some math
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3.5:latest",
"messages": [
{"role": "user", "content": "What is 1,847 multiplied by 93? Use the calculator tool."}
],
"mcp_servers": ["local-host-tools"]
}'
qwen3.5 reasons about which tool to use, calls it through Bifrost, gets the result, and folds it into the response. All local.
What happened on that last call
- Your app sent a plain chat request to Bifrost
- Bifrost forwarded it to qwen3.5 via Ollama with the tool schemas attached
- qwen3.5 picked a tool and its arguments
- Bifrost spawned
server.jsand sent it the tool call over stdio - The server ran the tool and returned the result
- Bifrost handed the result back to qwen3.5
- qwen3.5 wrote a natural language response around it
- Your app got back a normal chat response
Every step ran on your laptop. Nothing left the machine.
Extending the server
The pattern scales to anything you can write a function for. A few that earn their place in a home lab:
List loaded Ollama models:
server.tool(
"list_ollama_models",
"Lists all locally available Ollama models",
{},
async () => {
const output = execSync("ollama list", { encoding: "utf8" });
return { content: [{ type: "text", text: output }] };
}
);
Read a local file (add import fs from "fs"; to the top of server.js alongside the other imports):
server.tool(
"read_file",
"Reads the contents of a local text file",
{ path: z.string().describe("Absolute path to the file") },
async ({ path }) => {
const content = fs.readFileSync(path, "utf8");
return { content: [{ type: "text", text: content }] };
}
);
Hit a local service:
server.tool(
"check_bifrost_health",
"Checks if Bifrost gateway is healthy",
{},
async () => {
const res = await fetch("http://localhost:8080/health");
const text = await res.text();
return { content: [{ type: "text", text: text }] };
}
);
Anything you can wrap in a function, you can expose to the model.
What’s next
The full local stack is built:
- Bifrost for routing, observability, and fallbacks
- Ollama + qwen3.5 as the model, offline and free
- MCP as the tool layer with real access to your environment
From here the natural directions are HTTP-based MCP servers so tools can be shared across machines, multi-agent patterns where one model orchestrates others, and mapping this same shape to production. The patterns hold at scale, the numbers just get bigger.
More notes coming.
Anthony Mineer · anthonymineer.me