How it works
Tool calling introduced us to the basic mechanics of how to define a tool function and supply it to the chat model, which can then use it to answer user prompts. In this article, we take a step back and outline how tool calling actually works under the hood, which is important to understand for getting the most out of it.
Models equipped with tools are also known as “agentic” models, because they can make decisions about when and how to use tools to gain context or perform actions. Agentic models are becoming the standard way to address LLM’s inherent limitations, such as performing verifiable computations or gaining important context1.
When making a chat request to the chat model, the caller advertises one or more tools (defined by their function name, description, and a list of expected arguments), and the chat model can choose to respond with one or more “tool calls”. These tool calls are requests from the chat model to the caller to execute the function with the given arguments; the caller is expected to execute the functions and “return” the results by submitting another chat request with the conversation so far, plus the results. The chat model can then use those results in formulating its response, or, it may decide to make additional tool calls.
Note that the chat model does not directly execute any external tools! It only makes requests for the caller to execute them. It’s easy to think that tool calling might work like this:
But in fact it works like this:
The value that the chat model brings is not in helping with execution, but with knowing when it makes sense to call a tool, what values to pass as arguments, and how to use the results in formulating its response.
Footnotes
Some go so far as to suggest tool calling is a superior approach to classic RAG (retrieval-augmented generation) for many use cases.↩︎