Building LLM Agents: Function Calling, Structured Output, and MCP

A language model by itself is a frozen snapshot of its training data — it can’t check the weather, run code, or query a database. Function calling (tool use) changes that by letting the model emit structured requests to external tools, receive results, and reason about what to do next. Stack this in a loop and you get an agent: an LLM that plans, acts, observes, and iterates until the task is done.

We’ll build function calling from scratch, implement the ReAct agent pattern, add structured output for reliable JSON generation, and explore the Model Context Protocol (MCP) — the emerging standard for connecting LLMs to tools.

Function Calling: Giving LLMs Hands

The core idea is simple: you describe available tools as JSON schemas, the model decides which tool to call and with what arguments, your code executes the tool, and you feed the result back to the model. The model never actually runs any code — it just emits structured text that your application interprets as a function call.

# Define tools as JSON schemas
TOOLS = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    },
    {
        "name": "calculator",
        "description": "Evaluate a math expression",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression"}
            },
            "required": ["expression"]
        }
    }
]

The model sees the tool descriptions in its system prompt and learns to emit calls in a specific format. The notebook implements tool execution, response parsing, and the full request-call-observe loop using a local model — no API keys needed.

The ReAct Pattern: Reason + Act

ReAct (Yao et al., 2022) is the most widely used agent pattern. The model alternates between reasoning (“I need to find the current population of Tokyo”) and acting (“call search_web with query=’Tokyo population 2024′”). After each action, it observes the result and decides whether to take another action or give a final answer.

def react_agent(question, tools, max_steps=5):
    """ReAct agent: Thought → Action → Observation loop."""
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.append({"role": "user", "content": question})

    for step in range(max_steps):
        response = generate(messages)

        if "FINAL ANSWER:" in response:
            return extract_answer(response)

        tool_name, tool_args = parse_tool_call(response)
        result = execute_tool(tool_name, tool_args, tools)

        messages.append({"role": "assistant", "content": response})
        messages.append({"role": "user",
                         "content": f"Observation: {result}"})

    return "Agent reached max steps without a final answer."

The key insight is that the model’s reasoning trace (the “Thought” step) dramatically improves tool selection accuracy. Without it, models often call the wrong tool or pass incorrect arguments. With explicit reasoning, the model breaks down the problem before acting — and you get interpretable logs showing exactly why each decision was made.

Structured Output: Reliable JSON from LLMs

Agents need the model to produce structured data — tool calls as JSON, not free-form text. But LLMs are trained to generate fluent prose, and even a single stray character breaks json.loads(). The structured output notebook covers four layers of defense: regex extraction with repair heuristics, schema validation, constrained decoding (where the model is forced to follow a grammar), and outlines/guidance libraries that guarantee valid output.

def extract_json(text):
    """Extract JSON from LLM output with repair heuristics."""
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass

    match = re.search(r'\{[^{}]*\}', text, re.DOTALL)
    if match:
        candidate = match.group()
        candidate = candidate.replace("'", '"')       # Single quotes
        candidate = re.sub(r',\s*}', '}', candidate)  # Trailing commas
        try:
            return json.loads(candidate)
        except json.JSONDecodeError:
            pass
    return None

For production systems, constrained decoding is the gold standard — it modifies the model’s sampling to only allow tokens that produce valid output. The notebook implements a finite-state machine approach and benchmarks it against regex-based extraction.

MCP: The Standard for Tool Integration

The Model Context Protocol (MCP) is an open standard that solves the N×M problem of connecting LLMs to tools. Without MCP, every LLM provider implements tool use differently, and every tool needs a custom integration for each provider. MCP defines a universal protocol: a Host (the LLM application) talks to Clients, which connect to Servers that expose tools, resources, and prompts over JSON-RPC 2.0.

The notebook implements an MCP server from scratch — a weather tool server that exposes tools/list and tools/call endpoints. It also covers MCP transports (stdio for local servers, SSE for remote), capability negotiation, and how production frameworks like LangChain, LlamaIndex, and CrewAI are adopting MCP as their tool integration layer.

Agent Patterns Beyond ReAct

ReAct is just the starting point. The notebooks also cover: multi-step planning (decompose a complex task into subtasks before executing), reflection (the agent evaluates its own output and retries if needed), tool chaining (the output of one tool becomes the input to the next), and multi-agent systems (multiple specialized agents collaborating on a task). Each pattern builds on the same foundation: an LLM in a loop with access to tools and memory.

What to Do Next

The three notebooks cover function calling from scratch, the ReAct agent pattern, structured output techniques, constrained decoding, MCP server implementation, and modern agent frameworks. All run on a free Colab T4.

Function Calling notebook | Structured Output notebook | MCP & Agentic Protocols notebook

Next in this series: GPU Fundamentals for LLM Engineers — CUDA, VRAM management, and what actually matters for training and inference performance.

This post is part of TheAiSingularity’s LLM Engineering Course — 64 notebooks, 20 capstone projects, fully open source.