Towards Agents: chatbots and function calling
Chatbots
So far, we covered connecting with a local LLM and getting a single response. We also discussed the difference between base models and instruct-tuned models. On thing though is still missing, especially if you want to have something resembling a meaningful conversation with the LLM.
Let’s talk!
- Open a notebook and connect to a local LLM using LM Studio.
- Tell the LLM something about you.
- In a second call, ask the LLM what you just told it. It should know, right? You just told it.
As you have seen, the LLM cannot remember what we just told it. In technical terms, we say the LLM is stateless. To emphasize this: this is not what most people expect. It clearly is not how conversations with people work, it is also not how conversations with popular chatbots (i.e. chatgpt or claude) usually go. So, what is missing, here?
We should have a conversation.
In your notebook, write a chat function or method that does the following:
- saves the conversation history.
- adds the user’ message to the history.
- sends the system message plus conversation to the LLM.
- adds the response to the history
Test the chat function as above.
Can you see any downsides of this approach? What about long conversations?
(Optional) Implement a limit to the conversation length. Make sure the system message is always read.
Upload your results to Moodle.
Congratulations! You just build yourself a chatbot.
Here are this section’s take home messages:
- A single LLM call is stateless—it has no memory of previous interactions. Each request is independent.
- To enable multi-turn conversations, we need to
- Store the conversation history (user and assistant messages).
- Send the full history with each new request. (As long as it fits within the context window.)
Function Calling
Motivation
Sometimes you want your AI assistant to answer more complex tasks then the LLM can handle out of the box. For example, you might want to ask questions regarding documents or datasets on your hard drive, or ask about recent events. This is a hard task for an LLM, because, if it was not trained on this information, i.e. the information wasn’t part of its training data, it won’t be able to answer these questions correctly.
Try it!
- Open a notebook and connect to a local LLM using LM Studio.
- Ask the LLM about the current weather in your location.
(I mean, sure, you could just look out the window, but we are developers here, we don’t have windows!)
To solve this problem, we need to give the LLM assistant some tools it can use to access this additional information. This is where function calling comes into play. Function calling allows the assistant to call a function with specific parameters to get the required information. The function then returns the requested data or performs the necessary task, and the LLM can continue generating its response based on that information.
This is what we call an LLM agent or agent system. We will look deeper into agents (and alternatives) soon. For now, we will just say that we have an agent system or that a system is agentic if the system has access to tools or functions and if the LLM decides which tools to use and when (or if).
A basic agentic workflow may look like this:
- The user inputs a question or task.
- The LLM determines if it needs additional information or assistance from external tools.
- If needed, the LLM calls a function with specific parameters to retrieve the required data or perform the necessary task. This means the LLM generates an answer containing either some executable code or a JSON object containing the name of the function and its parameters. These functions have to be defined in advance.
- The LLM response is then scanned for these elements (code or JSON object) and these are executed if possible.
- The response is then fed back into the LLM (usually as user input) for further processing.
- The LLM uses the returned information to continue generating its response.
We obviously need the conversation history from earlier for this to work. We also need to give the system access to the functions.
Note: As stated above, the functions need to be predefined. In theory, we could just give the LLM its task and let it generate code to be executed. This is, however not the best idea for reasons of security.
There are two main ways to implement function calling:
- Structured output: Here, the LLM is tasked to generate a tool call in the form of a JSON object containing the name of the function and its parameters.
- Code generation: Here, we ask the LLM to generate the function calls in the form of executable python code. Usually, we still want to restrict the LLM to use only predefined functions. Nevertheless, this can pose a severe security issue because this approach hinges on running generated code on your machine.
Today, we will not implement the full agentic workflow outlined above. We will get there eventually, though. Here, we want to focus on step 3, the function call. The aim is to take a user request and translate it into structured output, either JSON or code. In the following, we will explore how to do it.
Structured output
The traditional way1 of doing function calling is to generate a JSON object containing the name of the function and its parameters. Until recently, all major agent frameworks (more on agents next time) used this approach. Here, the LLM response is scanned for a JSON object. Function name and arguments are extracted and the function is executed, if possible. We already talked about generating structured output before. Here, we will use it to generate a function call.
1 In this context, traditional means: people have been using it for more than a year.
Function definition
The first step in using function calling is to define the functions that the LLM can call. This is done by providing a JSON schema that describes the name of the function, its arguments and their types. The JSON schema should be provided to the LLM in the system prompt. Here is an example: 2
2 Note, that this is not an executable implementation but just a description of the function for the LLM.
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"arguments": {
"location": {"type": "string"},
"unit": {"type": "string"}
}
}
Function name and description should be as clear as possible to make it easier for the LLM to decide which function to use and how to properly use it. Argument names and types should be as precise as possible to avoid ambiguity in the function call.
Prompting
The second step is to provide a good prompt. The prompt should make it clear to the LLM to only generate valid output and that it should follow the JSON schema. Here is an example of a prompt that can be used for function calling:
You are a helpful assistant that generates function calls based on user input. Only use the functions you have been provided with.
{function definition as described above}
User: What's the weather like in Berlin?
Assistant: {
"name": "get_current_weather",
"arguments": {"location": "Berlin", "unit": "celsius"}
}
Another way of forcing the LLM to output structured format is to use pydantic classes as described last time.
Try it!
- Open a notebook and connect to a local LLM using LM Studio.
- Define the function
get_current_weatheras shown above. - Write a prompt that asks the LLM to generate a function call based on user input. Use prompt engineering as shown above or pydantic classes as shown last time.
- Test the prompt with an example input.
- Define other functions and try other inputs and see if the LLM generates valid output.
Challenges, fine-tuned models and the influence of size
The main challenge is here to get the LLM to generate a valid answer. This is not always easy, as LLMs are not usually super safe coders 😃.
- They can hallucinate functions or arguments that do not exist.
- They can forget to call a function.
- They can forget to provide all required arguments.
- They can provide the wrong type for an argument.
- They can provide invalid values for an argument.
There are several strategies to mitigate these issues:
- Prompt engineering: A good prompt can help to guide the LLM towards generating valid output. This is especially true for larger models, as they have a better understanding of the world and can therefore generate more accurate responses.
- Finetuning: Finetuning a model on a specific task can improve its performance on that task. This is especially useful for smaller models, as they are less likely to hallucinate functions or arguments that do not exist.
- Size: Larger models are better at generating valid output than smaller models. However, larger models are also more expensive to run and require more computational resources.
Test it! (we can do it together, if your hardware does not allow you to run the model.)
As above, but this time
- use a very small model (e.g a small qwen model)
- use a larger model pf the same model family (e.g. this one)
- use a model fine-tuned for the task (see the small tool symbol in LMStudio). You could try this one
- use a model of the same family not fine-tuned on this Task.
Code Generation
The exception mentioned above is the smolagents framework. Here, the default mode is code generation, but JSON mode is also supported. (We will talk more about agents and the smolagents framework soon.) When using this approach, the function definition and description will be given to the LLM as python code. Additionally, the LLM is expected to generate the function call also as valid python code. As with structured output, function name and description should be as clear as possible. Typing might also help.
Try it!
- In your notebook, define the weather function (and/or some other function of your choice) in python code.
- Write an appropriate prompt that makes it clear that you expect python code calling the defined function(s).
- Test your prompt with an example input.
As mentioned above (several times already), giving clear names and descriptions for functions, parameters, etc., will help the model generate more accurate code snippets. (PRO TIP: it will help your human coworkers as well in understanding your code.) Here, you have the opportunity to see the consequences in action in a save environment without angering fellow humans or yourself later on!
Try it!
- In your notebook, write a well written python function using clear names, description text and typing hints. You can use the one you wrote earlier, because of course you wrote clean code!
- Test the function with your prompt and example inputs.
- Now write a badly written python function, without clear names, descriptions or typing hints. Test it with your example inputs too. Are the results better or worse? Why do you think that’s happening?
- Upload your notebook to Moodle.
Further Readings
- Here is a very nice paper about generating structured output.
