Function Calling
Function calling is a technique used in large language models (LLMs) and AI agents to enhance their capability to provide more accurate and relevant responses, especially when handling complex tasks or questions that require specialized knowledge or external data.
We already got to know function calling in chapter 3 of this course. There, we introduced agents, that already came with the ability to call predefined functions. In this chapter, we will go back to the basics of function calling using LLMs.
Code generation and function calling
The basic idea of function calling is to use an LLM to generate valid, executable code from the user input. That is, the user’s input is sent to the LLM, together with a prompt, urging it to return structured output in a specific format. This output can then be taken and executed. For this to work properly, of course, the generated output must be valid code (in our case python code). There are two approaches for that:
- Code generation: Here, we ask the LLM to generate a complete python script from the user input. This approach has the advantage of being simple and straightforward, but it can be prone to errors if the LLM does not fully understand the task at hand or if it makes mistakes in its code generation. It can also pose a severe security issue because this approach hinges on running generated code on your machine.
- Function calling: Here, we ask the LLM to generate a function call from the user input. This approach has the advantage of being more robust and accurate than code generation, as it is easier for the LLM to generate a correct function call than a complete python script. However, it requires that the functions that can be called are already defined and that they are properly documented.
Here, we will focus on function calling. Still the challenge is to get the LLM to generate valid output. There are two main strategies to facilitate that:
- using a large, generalized LLM (e.g. GPT-4) with good prompt engineering and
- using a smaller model fine tuned to generate function calls.
Function definition
The first step in using function calling is to define the functions that the LLM can call. This is done by providing a JSON schema that describes the name of the function, its arguments and their types. The JSON schema should be provided to the LLM in the system prompt. Here is an example: 1
1 Note, that this is not an executable implementation but just a description of the function for the LLM.
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"arguments": {
"location": {"type": "string"},
"unit": {"type": "string"}
}
}
Prompting
The second step is to provide a good prompt. The prompt should make it clear to the LLM to only generate valid output and that it should follow the JSON schema. Here is an example of a prompt that can be used for function calling:
You are a helpful assistant that generates function calls based on user input. Only use the functions you have been provided with.
{function definition as described above}
User: What's the weather like in Berlin?
Assistant: {
"name": "get_current_weather",
"arguments": {"location": "Berlin", "unit": "celsius"}
}
Try it!
- Open a notebook and connect to a local LLM using LM Studio.
- Define the function
get_current_weather
as shown above. - Write a prompt that asks the LLM to generate a function call based on user input.
- Test the prompt with an example input.
- Define other functions and try other inputs and see if the LLM generates valid output.
- Upload to Moodle.
Challenges, finetuned models and the influence of size
The main challenge is here to get the LLM to generate a valid answer. This is not always easy, as LLMs are not usually super safe coders 😃.
- They can hallucinate functions or arguments that do not exist.
- They can forget to call a function.
- They can forget to provide all required arguments.
- They can provide the wrong type for an argument.
- They can provide invalid values for an argument.
There are several strategies to mitigate these issues:
- Prompt engineering: A good prompt can help to guide the LLM towards generating valid output. This is especially true for larger models, as they have a better understanding of the world and can therefore generate more accurate responses.
- Finetuning: Finetuning a model on a specific task can improve its performance on that task. This is especially useful for smaller models, as they are less likely to hallucinate functions or arguments that do not exist.
- Size: Larger models are better at generating valid output than smaller models. However, larger models are also more expensive to run and require more computational resources.
Test it! (we can do it together, if your hardware does not allow you to run the model.)
As above, but this time
- use a very small model (e.g a small Llama model)
- use a model finetuned for the task (you could try this one)
- a larger model (a larger llama in this case)
Agents recap
We introduced agents already back in chapter 3. To give a quick recap, an agent is a wrapper layer, that takes the user input and pipes it to an LLM, together with a custom system prompt, that allows the LLM to answer the user request better. The agent has several modules at its disposal, the memory, some tools and a planning tool.
The memory function is what allows chat models to retain a memory of the past conversation with the user. This information is saved as plain text in the memory and given to the planning module (i.e. the LLM) along with the system prompt and the current user input.
The planning module then decides which tools to use, if any, to answer the user request. The output of the planning module is a response message containing one or several tool calls (or a final answer). The agent then executes the tool calls by first parsing the response, then executing the functions. Based on the tool outputs, a final answer is generated and sent back to the user.
React agents
There a several types of agent. Now, we want to fucus on the ReAct agent introduced by (Yao et al., 2023). The ReAct agent is a type of agent that uses the ReAct framework to solve complex tasks by reasoning in multiple steps. It is based on the idea of “thought-action-observation” loops. The LLM is given a task and it generates a thought, which is then used to decide on an action. The action is executed and the observation is fed back into the LLM. This process is repeated until the LLM decides that it has enough information to answer the question or if the maximum number of iterations is reached.
Llamaindex
LLamaindex is a framework that makes it easy to implement and use agents. In Llamaindex an agent consists a an agent runner and an agent_worker. Think of the agent runner as the agent core in the architecture schematics and the agent worker as the planning module. The tools are functions implemented in python that can be executed by the agent worker. Finally, the memory module consists of a simple text buffer, logging the conversation history between the user and the agent and between the agent and the tools.
Let’s have a look!
- Open a notebook and connect it with a local LLM using LM Studio.
- Define a function that can be called by the agent to get the current weather in a given location. (Implement it this time, it doesn’t need to work, just return random weather)
- Initialize a ReAct agent using LLamaindex (you can use this tutorial as the starting point)
- Have a look at the prompt, the agent gives to the LLM (you can find it using
agent.get_prompts()
) - Discuss the prompt with the group! What does it do? How does it do it?
- Ask the agent to tell you the weather in a given location.
- Watch in LM Studio how the LLM called by the agent creates the thought process, function calls and the final response.
- Try to break the agent by asking stuff it cannot answer. Be creative. (On one occasion I just said “Hi” and it went into an infinite loop because it did not need a tool for that and there wasn’t a “none” tool 😜.)
- Upload to Moodle
Further Readings
On the Llamaindex website on the “examples” page you will find a lot of helpful material: examples, notebooks, recipes and more. I recommend to have a look at them! For our case, check the “agents” section. For an even more in-depth dive, go to the “workflows” part.
- In this example you find an agent implementation that returns a step-by-step breakdown of its thought process.
- To go even more low level then that see this example that will walk you through setting up a Workflow to construct a function calling agent from scratch.
- Here is a very nice paper about generating structured output.