Agent basics

What is an agent?

“An AI agent is a system that uses an LLM to decide the control flow of an application.” (“What Is an AI Agent?” 2024)

In the context of large language models, agents are LLM-based systems that can solve complex tasks. Imagine asking a question like:

“What were the key learnings from the Generative AI elective module in WiSe 24/25 at FH Kiel?”

Could you just ask an LLM that question and expect a correct answer?

It is in theory possible, that an LLM could answer that directly, but only if it was trained on this information, that is, if a text describing the module exists, is accessible from the web and was used in training the model. However, usually we can not expect the LLM to have this knowledge.

Let’s think for a moment how a human would answer that (one that did not attend the module). We would probably try to get a copy of the script, maybe we saved the script to our hard drive or other data storage. Maybe we could search the web for a description or text version of the module. Having obtained a copy of the script, we would probably read it. Then, we would try to distill the information hidden therein, to answer the question.

So, for our LLM to answer that question, it needs to be able to perform several tasks:

  • Searching the web for relevant documents
  • searching in a local file storage or other database
  • Reading and understanding a document
  • Summarizing the content of a document
  • Answering questions based on the summary of a document

This is where agents come into play. Agents are LLM-based systems that can solve complex tasks by performing several subtasks in sequence, using an LLM to decide which subtask to perform next. In our example, the agent would first search the web for relevant documents, then read and understand them, summarize them and finally answer the question based on the summary.

Agent framework

Architecture of the agent framework (LLM AgentsNextra, 2024)

To facilitate this, an agent system consists of several components:

  • Agent: the agent core acting as coordinator
  • Planning: Assists the agent in breaking down the complex task into subtasks
  • Tools: functions that the agent can use to perform a specific task
  • Memory: used to store information about previous interactions with the agent

We will describe each of them below.

Agent

This is a general-purpose LLM, that functions as the brain and main decision-making component of an agent. It determines which tools to use and how to combine their results to solve complex tasks. The agent core uses the output of the previous tool as input for the next tool. It also uses an LLM to decide when to stop using tools and return a final answer. The behavior of the agent and the tools, it has at its disposal, is defined by a prompt template.

Planning

Planning is the process of breaking down a complex task into subtasks and deciding which tools to use for each subtask. The planning module is usually also an LLM, it can be one fine-tuned to this specific task or receive a specialized prompt. It uses techniques like chain-of-thought (CoT) prompting to generate a plan of action (Wei et al., 2023). CoT prompting is a technique that encourages the model to explain its reasoning step by step, making it easier for us to understand and evaluate its answers. Other strategies include Tree-of-Thoughts (Long, 2023), (Yao, Yu, et al., 2023), (Hulbert, 2023) or ReAct (Yao, Zhao, et al., 2023). We will discuss these in more detail later.

Tools

Tools are functions that the agent can use to perform a specific task. They can be pre-defined or dynamically generated based on the user’s needs. Tools can be simple, such as a calculator, or complex, such as a web search engine. Tools can also be other agents, allowing for the creation of multi-agent systems. In our example, the tools would be a web search engine and a document reader. Other popular tools are a data store or a python interpreter.

Memory

Memory is used to store information about previous interactions with the agent. This allows the agent to remember past conversations and use this information in future interactions. Memory can be short-term, such as a conversation buffer, or long-term, such as a database. Memory can also be used to store the results of previous tool uses, allowing the agent to reuse them if necessary.

Chain-of-Thought prompting

Chain-of-Thought (CoT) prompting refers to the technique of giving the LLM hints in the user input on how to solve the problem step by step, similar to what we did above. In the original paper, this was used with few-shot prompting (giving the LLM examples in the prompt), see figure below. But it is also possible to use it with zero-shot prompting (i.e. without examples) by invoking the magical words “Let’s think step by step” (Kojima et al., 2023)1.

1 Note that these informations get old fast. Newer LLMs may have this functionality build in already

Chain-of-Thought prompting illustrated (Wei et al., 2023)

Tree of Thoughts

Tree of Thoughts (ToT) is a generalization on CoT prompting. The papers on ToT are somewhat complex, so we will not discuss them in detail here. In short, LLMs are used to generate thoughts, that serve as intermediate steps towards the solution. The difference to CoT is basically, that several thoughts are generated at each step, creating a tree-like structure. This tree is then searched using breadth-first search or depth-first search until a solution is found. A simplified example is given by (Hulbert, 2023):

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,
then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is...

ReAct

ReAct (short for Synergizing Reasoning and Acting) is a technique based on CoT, that updates its reasoning after each step of tool use. This allows the agent to react (pun intended) to unforeseen results during the step-by-step solution i.e. failed tool use. The agent can then follow a different chain of thoughts. This makes it very well suited to tool use. An illustration is given in the figure below.

Comparison of ReAct with other prompting techniques (Yao, Zhao, et al., 2023)

Examples of agent-frameworks (Llamaindex, LangChain & Haystack)

There are a lot of agent frameworks out there. In this module we will focus on three of them: LlamaIndex, LangChain and Haystack. They all have their own strengths and weaknesses, but they all share the same basic architecture as described above. We will describe each of them below.

  • Llamaindex: LlamaIndex is a data framework for your LLM applications. It provides a central interface to connect your LLMs and your data. It also provides a set of tools to help you build your own applications, such as a document reader, a web search engine, a data store, etc. - LangChain: LangChain is a framework for developing applications powered by language models. It provides a set of tools to help you build your own applications, such as a document reader, a web search engine, a data store, etc. It also provides a set of agents that can use these tools to solve complex tasks.
  • Haystack: Haystack is an open source NLP framework that enables you to build production-ready applications around LLMs and other models. It provides a set of tools to help you build your own applications, such as a document reader, a web search engine, a data store, etc. It also provides a set of agents that can use these tools to solve complex tasks.

📝 Task

Now it is your turn!

Each group is to use one of the following frameworks to build a small demo agent:

  1. Set up a local LLM (e.g. using Ollama or LM Studio) to be used by the agent.
  2. Choose a small task for your agent, e.g. answering questions about a specific topic, summarizing a document, etc. (use the one in the respective tutorial)
  3. Implement the agent using one of the frameworks listed above.
  4. Present your results and your experiences with the frameworks in a short presentation.
  5. Submit your code and report on moodle.

Further Readings

References

Hulbert, D. (2023). Using Tree-of-Thought Prompting to boost ChatGPT’s reasoning. https://github.com/dave1010/tree-of-thought-prompting.
Introduction to LLM Agents. (2023). In NVIDIA Technical Blog. https://developer.nvidia.com/blog/introduction-to-llm-agents/.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2023). Large Language Models are Zero-Shot Reasoners (arXiv:2205.11916). arXiv. https://doi.org/10.48550/arXiv.2205.11916
LLM AgentsNextra. (2024). https://www.promptingguide.ai/research/llm-agents.
Long, J. (2023). Large Language Model Guided Tree-of-Thought (arXiv:2305.08291). arXiv. https://arxiv.org/abs/2305.08291
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2023). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (arXiv:2201.11903). arXiv. https://doi.org/10.48550/arXiv.2201.11903
What is an AI agent? (2024). In LangChain Blog. https://blog.langchain.dev/what-is-an-agent/.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models (arXiv:2305.10601). arXiv. https://arxiv.org/abs/2305.10601
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models (arXiv:2210.03629). arXiv. https://doi.org/10.48550/arXiv.2210.03629