The Best Way to Format MCP Tool Output for Multi-Call Prompting (MCP)

Multi-call prompting (MCP) orchestrates Large Language Models (LLMs) through multiple sequential or parallel tool calls to perform complex tasks. A critical element of successful MCP implementation is precise formatting of tool outputs. Inspired by Anthropic’s best practices, this guide provides comprehensive guidelines for generating structured, parseable tool-call outputs, ensuring smaller models effectively recognize when tool responses are complete.

Use Structured Formats and Defined Schemas

Structured formats like JSON or XML greatly reduce the ambiguity in tool outputs, offering clear start and end boundaries to prevent LLM misinterpretation or conversational drift.

JSON Output: Always specify the output as JSON objects with predefined fields. Clearly define schemas or provide example JSON snippets in prompts. Example instruction:
```
{"result": "value", "error": null}
```
XML or Tagged Output: Alternatively, wrap outputs within XML-like tags, providing explicit start and end points.
```
<tool_output>...content...</tool_output>
```

Structured formats prevent LLMs from generating extraneous text, reducing parsing errors and clearly marking tool outputs.

Clear and Immediate Delimitation of Tool Responses

Always position tool responses immediately following tool calls, clearly isolated from conversational context:

No intervening text: Present the tool output immediately and standalone without introductions or explanations to avoid confusion. Incorrect formatting example:
```
"Here are the results:" { ... }
```
Correct format:
```
Tool output: { ... }
```
Standardized markers: Consistently use phrases like “Tool output:” or “Observation:” to signify the start of tool results.
```
Tool output: {"temperature": 18, "unit": "celsius"}
```
Explicit terminators: For lengthy or multi-line outputs, wrap results in markdown blocks or XML tags to define explicit boundaries clearly.

Providing Examples and Pre-filled Output Templates

LLMs reliably mirror given examples. Include well-formatted inline examples or partially filled response templates in your prompts:

Inline Examples:

Assistant response example: {"value": 42, "unit": "degrees"}

Prefill Templates: Begin assistant responses with a partial JSON structure, guiding the model to complete the formatting:
```
Assistant (prefill): {"sentiment":
```

This technique leverages LLM autocompletion abilities, significantly improving format adherence.

Consistent Key Naming and Usage of IDs

Maintain uniformity in field names and include identifiers to manage multiple concurrent or sequential tool calls:

Standard Field Names: Always use consistent, lowercase field names such as “result,” “error,” and “tool_name” to ensure machine readability and parsing accuracy.
Include Tool IDs: Clearly associate results with unique identifiers (e.g., tool invocation IDs) to disambiguate multiple tool calls:
```
{"tool_use_id": "tool_123", "result": "15°C"}
```
Uniform Formatting: Consistency in structure reinforces predictable outputs, enhancing the reliability and effectiveness of smaller models.

Avoid Common Formatting Pitfalls

Watch out for common formatting mistakes that cause parsing or model hallucination issues:

Extraneous Commentary: Avoid narrative or explanatory text around tool outputs. Incorrect:
```
"The tool says: 42"
```
Correct:
```
{"result": 42}
```
Malformed Structures: Ensure strict adherence to the schema to avoid issues such as misplaced commas, missing quotes, or inconsistent brackets.
Hallucinated Content: Clearly isolate actual tool outputs using explicit delimiters to prevent LLM-generated extraneous content.
Ordering Errors: Always present tool outputs directly after tool calls without interruptions or unrelated text to maintain logical coherence.
Pattern Over-Repetition: Introduce slight variations in prompt phrasing to prevent autopilot repetition while strictly adhering to defined output formats.

Example of Ideal Tool Call Interaction

A clear and effective MCP example using structured formats:

User prompt: "What's the weather in London, answer in JSON."
Assistant tool call: <tool_call name="get_weather">{"location": "London", "unit": "celsius"}</tool_call>
Tool execution and response:
Tool output: {"temperature": 18, "unit": "celsius"}
Assistant final answer:
{"temperature": 18, "unit": "celsius", "location": "London"}

This example demonstrates immediate and isolated output presentation, precise schema adherence, and no extraneous content.

Conclusion

By adopting structured output formats, clearly delineating responses, providing explicit examples, and maintaining consistency, you create a robust and standardized communication framework between LLMs and external tools. These best practices ensure predictable, parseable outputs and significantly enhance the reliability and clarity of multi-call prompting, even when employing smaller language models.

✅ Do This

Use structured formats: Output as JSON or wrapped in XML/markdown tags (<tool_output>…</tool_output> or json ).
Place tool output immediately: No text or commentary before the result.
Label outputs clearly: Use consistent keys like “result”, “tool”, or prefix with “Tool output:”.
Prefill structure: Start model replies with { “result”: or similar to guide formatting.
Show examples: Include example outputs in the prompt to enforce schema.
Add IDs for multi-tool calls: Use tool_name or tool_use_id to tag which tool the output came from.
Stick to one format: Always use the same output schema and structure.

❌ Avoid This

✖ Adding commentary (e.g. “Here is the result:”).
✖ Mixing reasoning with output format.
✖ Inconsistent or malformed structure (e.g. unclosed JSON).
✖ Changing key names between calls (e.g. “answer” vs “result”).
✖ Omitting or misplacing tool results (keep them in order, no interruptions).