The API in Front of the AI: Part 2

Filed under: Cloud Engineering · AI Infrastructure · Local Lab Where Part 1 left off In Part 1 we got Bifrost running locally, wired up Ollama with qwen3.5, and confirmed the stack end to end. Requests through the gateway, streaming, tool calling. This post adds MCP, the Model Context Protocol. Part 1 gave the model a reliable connection. This part gives it tools. By the end you’ll have a local MCP server exposing real capabilities (system info, allowlisted shell commands, math) connected through Bifrost so qwen3.5 can run them. Still no cloud. ...

March 31, 2026 · 7 min · Anthony Mineer

The API in Front of the AI

Filed under: Cloud Engineering · AI Infrastructure · Local Lab The problem You wire an Ollama model into an app and it works. A couple months later you’ve got five apps, a few models, and no idea what’s calling what, how often, or what it’s doing to your GPU. That’s the gap an LLM gateway fills. This is Part 1 of a two-part series. Here we cover what a gateway does, why it’s worth running, and how to get Bifrost talking to Ollama qwen3.5 on your Mac. Fully local. ...

March 24, 2026 · 5 min · Anthony Mineer