Multi-call prompting (MCP) orchestrates Large Language Models (LLMs) through multiple sequential or parallel tool calls to perform complex tasks. A critical element of successful MCP implementation is precise formatting of tool outputs. Inspired by Anthropic’s best practices, this guide provides comprehensive guidelines for generating structured, parseable tool-call outputs, ensuring smaller models effectively recognize when tool responses are complete.
Structured formats like JSON or XML greatly reduce the ambiguity in tool outputs, offering clear start and end boundaries to prevent LLM misinterpretation or conversational drift.
{"result": "value", "error": null}
<tool_output>...content...</tool_output>
Structured formats prevent LLMs from generating extraneous text, reducing parsing errors and clearly marking tool outputs.
Always position tool responses immediately following tool calls, clearly isolated from conversational context:
"Here are the results:" { ... }Correct format:
Tool output: { ... }
Tool output: {"temperature": 18, "unit": "celsius"}
LLMs reliably mirror given examples. Include well-formatted inline examples or partially filled response templates in your prompts:
Assistant response example: {"value": 42, "unit": "degrees"}
Assistant (prefill): {"sentiment":
This technique leverages LLM autocompletion abilities, significantly improving format adherence.
Maintain uniformity in field names and include identifiers to manage multiple concurrent or sequential tool calls:
{"tool_use_id": "tool_123", "result": "15°C"}
Watch out for common formatting mistakes that cause parsing or model hallucination issues:
"The tool says: 42"Correct:
{"result": 42}
A clear and effective MCP example using structured formats:
User prompt: "What's the weather in London, answer in JSON." Assistant tool call: <tool_call name="get_weather">{"location": "London", "unit": "celsius"}</tool_call> Tool execution and response: Tool output: {"temperature": 18, "unit": "celsius"} Assistant final answer: {"temperature": 18, "unit": "celsius", "location": "London"}
This example demonstrates immediate and isolated output presentation, precise schema adherence, and no extraneous content.
By adopting structured output formats, clearly delineating responses, providing explicit examples, and maintaining consistency, you create a robust and standardized communication framework between LLMs and external tools. These best practices ensure predictable, parseable outputs and significantly enhance the reliability and clarity of multi-call prompting, even when employing smaller language models.